Many people are wondering what Pascal’s GPU Architecture is and if it will have a significant impact on the future of computer graphics. Pascal is a new architecture by NVIDIA that was designed to significantly increase performance for deep learning, high-precision computations, artificial intelligence, and virtual reality applications. Pascal uses 16nm FinFET process technology to deliver up to 2x the performance per watt compared with previous-generation GPUs.
Introduction to Pascal GPU’s Architecture
Pascal GPUs consist of multiple Graphics Processing Clusters (GPCs), Streaming Multiprocessors Pascal (SMM), and memory controllers. Each GPC includes a dedicated raster engine and six TPCs, which are the basic scheduling units on Pascal GPUs with 48 FP32 CUDA cores per TPC that support floating-point and integer operations.
Pascal GPUs utilize a single SMM per GPC, each with 64 FP32 CUDA Cores that share access to the L0 cache. Pascal also features an improved Polymorph Engine Four independent tessellation units on Pascal support drawing of new triangles in one clock cycle instead of two cycles found on Maxwell-based GPUs.
What are the Different Pascal Architectures?
There are two Pascal architectures: (GP100, GP102) and GP104.
The first is the full-fledged version of Pascal that was released in April 2016. This GPU has 15 SMs for a total of 28,672 CUDA cores, which delivers around twice as many floating-point operations per second compared to its predecessor, Maxwell GM200 architecture found on NVIDIA’s Titan X graphics card. The GP102 on the other hand is used on the GeForce GTX 1080 Ti and has 11 SMs for a total of fewer than half as many CUDA cores as GP100 with 35840 cores.
The next Pascal chip variant is GP104 used by GeForce GTX 1080 and 1070 cards with only eight streaming multiprocessors instead of fifteen SMMs like the bigger Pascal chips such as GP100 (found on Quadro P6000 & Tesla P40). While GeForce GTX 1080 uses the GP104, the RTX 2070 series contains a slightly cut-down version with 56 SMs and 1920 CUDA cores for around 16% higher performance compared to GeForce GTX 1070.
What Pascal Features are Replaced on Pascal GPUs?
In Pascal, NVIDIA replaced FP64 hardware found in Fermi, Kepler, and Maxwell architecture GPUs with fast half-precision floating-point operations (FP16) that are performed by Pascal’s new PolyMorph Engine Four. NVIDIA also added two important features with Pascal:
- The ability to perform 32-bit floating-point arithmetic and integer operations on the same register (FP16xINT32). This enables a single Pascal CUDA core to handle mixed FP16 and INT32 operations, which is particularly important for machine learning and deep learning applications.
- Pascal also has a new Unified Memory architecture that enables it to access the CPU memory via a high-speed bus, known as an interconnect fabric or crossbar, which increases the effective bandwidth between CPU and GPU memory by reducing both power consumption and latency when accessing this shared storage space. In addition, Pascal GPUs have four times the memory capacity of Maxwell, which is another feature that benefits deep learning and AI applications.
Want to learn about GPUs? Here are some articles you may want to explore.
How Pascal GPU’s Architecture Improves Deep Learning & Numeric Computing?
Pascal improves FP32/FP64 performance with up to twice as many floating-point operations per second (FLOPS) compared to previous generation GeForce GPUs. Pascal architecture GPUs also include a new mixed-precision FP16xINT32 half-FP16, half-integer core that delivers up to four times the performance/watt compared with Maxwell GPUs for deep learning inference and offers “unmatched” INT32 data precision at full speed without any impact on high floating-point accuracy.
Pascal Architecture Benefits Over Kepler?
One significant benefit from Pascal GPUs over previous generation products is their enhanced memory compression and shared memory technologies that offer Pascal GPUs up to 20% higher effective bandwidth compared with Maxwell. Pascal also supports second-generation delta color compression technology, which improves performance by reducing bandwidth/power consumption for frame buffer accesses as well as improving cache utilization.
Pascal Architecture Benefits Over Volta?
Volta is scalable computer architecture and is the successor to Pascal. Volta has a new type of core architecture called Tensor Core, which adds mixed-precision matrix math for accelerating deep learning inference workloads by up to 12x compared with Pascal GPUs. Additionally, NVIDIA’s NVLINK GPU interconnect offers five times more links running at 20 GB/sec bidirectionally (50 GB/sec total) compared to Pascal, which enables much faster communication between the CPU and GPU.
What is Pascal’s Computer Architecture?
The NVIDIA Pascal Computer Architecture consists of the following main components:
- PolyMorph Engine Four, which is Pascal’s new Streaming Multiprocessor that delivers faster and more efficient processing.
- Next-Generation Memory Controller (NGMC), which increases overall memory bandwidth through increased clock speeds for high-speed GDDR memory like the fast GDDR-X memory on Pascal GPUs and the use of high-speed NVLink.
- Pascal CUDA Compute Capability, which increases double-precision performance from one to two times that of Maxwell GPUs.
- Unified Memory, which enables Pascal GPU to access CPU memory that is stored in traditional DDR/GDDR (unified) or external nonvolatile HBM modules (hybrid). This allows Pascal GPUs to manage a larger virtual address space than Maxwell by using a page-table scheme. Pascal GPUs are able to access both CPU and CACHED memory directly through the Unified Memory Architecture, which dramatically increases effective bandwidth between Pascal GPUs and CPUs.
- Further improvements in power efficiency for high-performance graphics with Pascal GPU’s enhanced clock gating technology that turns on/off clocks of functional units on a core by core basis. Pascal GPUs are also designed with new transistor technologies that allow for over two billion transistors to be packed into a 20-nanometer die space, which is an increase of more than 75% compared to Pascal GPUs and reduces power leakage by more than 70%.
So there you go, that’s just about everything we know related to Pascal’s GPU Architecture. Pascal GPUs have four times the memory capacity of Maxwell, which is another feature that benefits deep learning and AI applications.
As far as the architecture is concerned, the Pascal architecture GPUs also include a new mixed-precision FP16xINT32 half-FP16, half-integer core that delivers up to four times the performance/watt compared with Maxwell for deep learning inference. It also comes with “unmatched” INT32 data precision at full speed without any impact on high floating-point accuracy.
Pascal architecture GPUs also have second-generation delta color compression technology, which improves performance by reducing bandwidth/power consumption for frame buffer accesses as well as improving cache utilization in Pascal GPU’s Architecture.