Just a few years ago, no CPU had 64 cores. Having 6 cores and 12 threads was amazing. There was the Xeon Phi though. Way back in 2006 Intel started it’s Larrabee project, eventually releasing the Intel Many Integrated Core prototype in 2010. According to Wikipedia, this processor had 32 cores and supported 4 threads per core, but it ran at 1.2GHz. Fast forward to Q4 2017 and the last Xeon Phi’s were released, the fastest of which was the 7295 with 72 cores running at 1.5GHz and a 320W TDP. One thing that the Phi’s have is AVX-512, i.e. 512 bit vectorised floating point operations. AMD’s EPYC CPUs only have AVX2 support, which uses 256 bit vectorised operations. I do find it ironic that AMD is now winning the core-count race, even though Intel took the first stab at it. If Intel had focussed instead on a general purpose CPU with 32 cores back in 2010, even if it ran at slower clock speeds than other CPUs of the time, they could have been miles ahead today.
So, how fast, theoretically, is the top Rome CPU, the 7H12, compared to a Xeon Phi? For the Rome CPU, several sites state that the architecture is capable of 16 floating point operations per clock cycle (see here and here). Therefore, 64 cores times 16 floating point operations times a base clock speed of 2.6GHz gives 2662.4 GFLOPS. While I would have liked to compare against the Xeon Phi 7295, there isn’t a lot of information about its floating point speed. The last Xeon Phi for which such info is readily available is the 7290, which achieves 3456 GFLOPS and it seems like the 7295 is supposed to be similar, but with additional instructions geared towards machine learning.
So, AMD has now advanced to the point of having a CPU that is 77% of the speed of the fastest Xeon Phi’s. While I was expecting this to happen eventually, I wasn’t expecting it to be AMD and AMD got there a lot sooner than I expected. Of course, now one could by a dual processor AMD system that beats the Xeon Phi. Since EPYC is a normal CPU, it’s able to accelerate any multithreaded or parallel processing software, rather than requiring recompiling for the Xeon Phi (which I don’t think would support Python). Well done AMD.