AMD Threadripper and (1-4) NVIDIA 2080Ti and 2070 for NAMD Molecular Dynamics

In my recent experience with AMD Threadripper 2990WX I was impressed with processor-based performance using the NAMD. Of course adding the NVIDIA GPU to the system gives a dramatic improvement since NAMD has a good GPU acceleration. I love NAMD for many reasons and one of them is that it makes a good standard renderer to look at the performance of the CPU / GPU. NAMD requires a balance between the CPU and the GPU to achieve the best results. It is also not very sensitive to the acceleration of AVX vector units. NAMD is measured well with lots of cores (or a lot of cluster nodes). After some discussion, I decided it was good to look at the performance of multiple graphics processing units with NAMD in the Threadripper program. The assumption is that there will be enough cores to keep up with the new GPU at the powerful NVIDIA.

Last Post My AMD Threadripper 2990WX 32-core vs Intel Xeon-W 2175 14-core - Linpack The NAMD and Kernel Build Time are good backgrounds for this post and have an interesting comparison with the Xeon-W Intel® 14-core system.

I spent a long afternoon on the same platform I used at the last post. I've been able to get a few tests using the 29 -Worked 24-core Threadbar program, but most of the results use a 32-core 2990WX processor.

I had a 2 Side Fan cooling NVIDIA RTX 2070 GPU. It is impractical to use more than two of these types of cards in a system because of the problems of heat choke (very bad), see the NVIDIA Dual-Fan GeForce RTX Coolers Ruining Multi-GPU Performance. Two days after the test, we got the first batch of RTX 2070 with blower fans! You should be able to configure the systems with these tools now.

We have fan blower versions of RTX 2080Ti so I managed to test 1 to 4 of these brilliant cards.

Test systems: AMD 2990WX and Intel Xeon-W 2175
The AMD Threadripper system you used was a test structure with the following key components,

AMD hardware
AMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (4.2GHz Turbo)
AMD Ryzen Threadripper 2970WX 24-Core @ 3.00GHz (4.0GHz Turbo)
Gigabyte X399 AORUS XTREME-CF motherboard
128GB DDR2 2666 MHz memory
Samsung 970 PRO 512GB M.2 SSD
Ubuntu 18.04 see, the best way to install Ubuntu 18.04 with NVIDIA drivers and any desktop flavor

NAMD, NAMD_2.13_Linux-x86_64-multicore, and NAMD_2.13_Linux-x86_64-multicore-CUDA

Test results
When I sat in front of the system it had 24-core TR 2970WX processors in that I did a few tasks however before I exchanged the 2990WX. The first job I run was the CPU only. The results were very satisfactory in that the measurement with increasing number of threads was very homogeneous. Interestingly, the performance of NAMD has been uniformly improved with SMT "super chains". This is not always the case and often you see that only "real" cores are improving performance.

CPU results

The graph shows how successful SMT threads are with NAMD. Note that Lower is better! What is reported is the default NAMD performance output on day / ns, ie, the days required to perform 1 nano-second of simulation. Yes, this is a very intense task. Large jobs can last for weeks or months. My job was run for 500 steps of simulation.


This is a very good CPU performance to run this post! In an earlier publication, NAMD performed on the Xeon-Scalable 8180 and 8 GTX 1080Ti processing units, using the dual Xeon 8180 system with a total of 56 CPUs, which was 2.93 days / day with 32 heart. These processors cost more than $ 10,000 each. So the 32-core Threadripper is a bargain compared. [Using all 56 cores on this Intel system I got 1.68 days / ns]. Note: If you look at that oldest share, you will see that I took the inverse of the natural NAMD output and the ns / n mentioned in the report. Keep this in mind if you make a comparison. (Sorry about that)

GPU acceleration results

The first thing I have to say about the GPU's results is that, even with the good performance of the 32-core of the 2990WX, it is not enough just to keep up with more than 1 or 2 NVIDIA's new RTX GPU. The worst result range with 1 2070 to the best result with 4 2080Ti is the only 1.6 acceleration.

I'm not saying that these results are bad! It is actually very good and clearly shows how much performance it gains by adding a "modest" graphics processing unit such as the RTX 2070 which provides speeds up to about 5 CPU results only. However, by the time you add 2 RTX of 2070 or 2080Ti you are restricted by the CPU.

In the earlier release I mentioned above, the Xeon 8180 dual-core processor provided sufficient CPU capacity for 0.438 days / second with 1 GTX 1080Ti and the use of 2 1080Ti gave 0.248 days / day. The additional GPU only made the performance improvement smaller, more limited again by the CPU. (I

