Without such additional hardware, Arctic Sound tops out at 80 TFLOPS FP16, well below even the V100 using Tensor Cores. This (comparison) does not include accelerator hardware such as Tensor Cores. Hence, this means Arctic Sound will deliver double the performance within a year or so of A100. This would result in 40 TFLOPS of ‘classical’ performance. At a relatively conservative 1.3GHz, this would yield 10.4 TFLOPS (FP32 precision) per chiplet. Architecture: FLOPSĪrctic Sound is leaked to consist of four chiplets, each with 512EUs. Intel claims that the combination of both units can improve performance by over 2x in certain workloads. Intel’s reasoning is that, by including SIMD units, it can cover a wide range of vector widths. One novelty in Ponte Vecchio is that it also contains SIMD units, the acceleration hardware as implemented in CPUs. So, while it lacks ultra-low precision support, for large-scale commercial deployments, this should not be an issue as INT8 likely remains the standard for inference.
Ponte Vecchio supports INT8 up to FP64 as well as BF16. It also adds Google’s ( GOOG) ( GOOGL) BF16, which Intel is also backing, and further adds Nvidia’s new TF32 format. Architecture: numeric format supportĪmpere's third-gen Tensor Cores support INT1 all the way up to FP64, a much wider dynamic range than Volta. Here’s how Intel’s data center GPU chips might stack up against Ampere. While Intel’s work on the Xe HP Arctic Sound GPU and Xe HPC Ponte Vecchio is still underway, and final specifications are unknown, some preliminary comparisons can already be made. Though, performance per watt is likely substantially higher than Volta, given the even larger increased performance. This (likely) makes it a bit larger than the Xilinx ( XLNX) Versal Premium, which is about 50 billion transistor.įurthermore, while Intel ( INTC) has been criticized in recent years for the power draw of its 14nm chips, the A100 does not fare much better with its 400W TDP, despite the 7nm process. Nvidia has used pretty much all available silicon real estate possible for the 826mm2 chip with 54 billion transistors, as it is close to the so-called reticle size limit. Its newest GPU architecture will be featured first in the A100 data center GP-GPU. Nvidia ( NASDAQ: NVDA) recently announced Ampere after three successful years with Volta in the data center.