{"id":1067,"date":"2021-10-31T09:34:30","date_gmt":"2021-10-31T09:34:30","guid":{"rendered":"https:\/\/pcgearhead.com\/?p=1067"},"modified":"2022-05-24T14:39:04","modified_gmt":"2022-05-24T14:39:04","slug":"understanding-the-pascals-gpu-architecture","status":"publish","type":"post","link":"https:\/\/pcgearhead.com\/understanding-the-pascals-gpu-architecture\/","title":{"rendered":"Understanding the Pascal’s GPU Architecture"},"content":{"rendered":"

Many people are wondering what Pascal’s GPU Architecture<\/strong> is and if it will have a significant impact on the future of computer graphics. Pascal is a new architecture by NVIDIA <\/strong>that was designed to significantly increase performance for deep learning, high-precision computations, artificial intelligence, and virtual reality applications. Pascal uses 16nm<\/strong> FinFET process technology to deliver up to 2x<\/strong> the performance per watt compared with previous-generation GPUs.<\/span><\/p>\n\n

Introduction to Pascal GPU’s Architecture<\/h2>\n

Pascal GPUs consist of multiple Graphics Processing Clusters (GPCs), Streaming Multiprocessors Pascal (SMM), and memory controllers. Each GPC includes a dedicated raster engine and six TPCs<\/strong>, which are the basic scheduling units on Pascal GPUs with 48 FP32 CUDA<\/strong> cores per TPC that support floating-point and integer operations.<\/span><\/p>\n

\"pascal<\/p>\n

Pascal GPUs utilize a single SMM<\/strong> per GPC<\/strong>, each with 64 FP32<\/strong> CUDA<\/strong> Cores that share access to the L0 cache. Pascal also features an improved Polymorph Engine Four independent tessellation units on Pascal support drawing of new triangles in one clock cycle instead of two cycles found on Maxwell-based GPUs.<\/span><\/p>\n

What are the Different Pascal Architectures?<\/h2>\n

There are two<\/strong> Pascal architectures<\/strong>: (GP100, GP102<\/strong>) and GP104<\/strong>.<\/span><\/p>\n

The first is the full-fledged version of Pascal that was released in April 2016<\/strong>. This GPU has 15<\/strong> SMs for a total of 28,672 CUDA<\/strong> cores, which delivers around twice as many floating-point operations per second compared to its predecessor, Maxwell GM200<\/strong> architecture found on NVIDIA’s<\/strong> Titan X graphics card. The GP102 on the other hand is used on the GeForce GTX<\/strong> 1080 Ti and has 11 SMs for a total of fewer than half as many CUDA cores as GP100<\/strong> with 35840 cores.<\/strong><\/span><\/p>\n

The next Pascal chip variant is GP104<\/strong> used by GeForce GTX 1080<\/strong> and 1070<\/strong> cards with only eight streaming multiprocessors instead of fifteen SMMs<\/strong> like the bigger Pascal chips such as GP100<\/strong> (found on Quadro P6000 & Tesla P40<\/strong>). While GeForce GTX 1080<\/strong> uses the GP104, the RTX 2070<\/strong> series contains a slightly cut-down version with 56 SMs and 1920 CUDA cores for around 16% higher<\/strong> performance compared to GeForce GTX 1070<\/strong>.<\/span><\/p>\n

What Pascal Features are Replaced on Pascal GPUs?<\/h2>\n

\"<\/p>\n

In Pascal, NVIDIA replaced FP64<\/strong> hardware found in Fermi, Kepler, and Maxwell architecture GPUs with fast half-precision floating-point operations (FP16<\/strong>) that are performed by Pascal’s<\/strong> new PolyMorph Engine Four. NVIDIA also added two important features with Pascal:<\/span><\/p>\n