{"id":1067,"date":"2021-10-31T09:34:30","date_gmt":"2021-10-31T09:34:30","guid":{"rendered":"https:\/\/pcgearhead.com\/?p=1067"},"modified":"2022-05-24T14:39:04","modified_gmt":"2022-05-24T14:39:04","slug":"understanding-the-pascals-gpu-architecture","status":"publish","type":"post","link":"https:\/\/pcgearhead.com\/understanding-the-pascals-gpu-architecture\/","title":{"rendered":"Understanding the Pascal&#8217;s GPU Architecture"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Many people are wondering what <strong>Pascal&#8217;s GPU Architecture<\/strong> is and if it will have a significant impact on the future of computer graphics. Pascal is a new architecture by <strong>NVIDIA <\/strong>that was designed to significantly increase performance for deep learning, high-precision computations, artificial intelligence, and virtual reality applications. Pascal uses <strong>16nm<\/strong> FinFET process technology to deliver up to <strong>2x<\/strong> the performance per watt compared with previous-generation GPUs.<\/span><\/p>\n\n<h2>Introduction to Pascal GPU&#8217;s Architecture<\/h2>\n<p><span style=\"font-weight: 400;\">Pascal GPUs consist of multiple Graphics Processing Clusters (GPCs), Streaming Multiprocessors Pascal (SMM), and memory controllers. Each GPC includes a dedicated raster engine and <strong>six TPCs<\/strong>, which are the basic scheduling units on Pascal GPUs with <strong>48 FP32 CUDA<\/strong> cores per TPC that support floating-point and integer operations.<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1070\" src=\"https:\/\/pcgearhead.com\/wp-content\/uploads\/2021\/08\/Pascal-GPUs-Architecture.jpg\" alt=\"pascal gpu's architecture\" width=\"740\" height=\"416\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Pascal GPUs utilize a single <strong>SMM<\/strong> per <strong>GPC<\/strong>, each with <strong>64 FP32<\/strong> <strong>CUDA<\/strong> Cores that share access to the L0 cache. Pascal also features an improved Polymorph Engine Four independent tessellation units on Pascal support drawing of new triangles in one clock cycle instead of two cycles found on Maxwell-based GPUs.<\/span><\/p>\n<h2>What are the Different Pascal Architectures?<\/h2>\n<p><span style=\"font-weight: 400;\">There are <strong>two<\/strong> <strong>Pascal architectures<\/strong>: (<strong>GP100, GP102<\/strong>) and <strong>GP104<\/strong>.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The first is the full-fledged version of Pascal that was released in <strong>April 2016<\/strong>. This GPU has <strong>15<\/strong> SMs for a total of <strong>28,672 CUDA<\/strong> cores, which delivers around twice as many floating-point operations per second compared to its predecessor, Maxwell <strong>GM200<\/strong> architecture found on <strong>NVIDIA&#8217;s<\/strong> Titan X graphics card. The GP102 on the other hand is used on the <strong>GeForce GTX<\/strong> 1080 Ti and has 11 SMs for a total of fewer than half as many CUDA cores as <strong>GP100<\/strong> with <strong>35840 cores.<\/strong><\/span><\/p>\n<p><span style=\"font-weight: 400;\">The next Pascal chip variant is <strong>GP104<\/strong> used by GeForce GTX <strong>1080<\/strong> and <strong>1070<\/strong> cards with only eight streaming multiprocessors instead of fifteen <strong>SMMs<\/strong> like the bigger Pascal chips such as <strong>GP100<\/strong> (found on <strong>Quadro P6000 &amp; Tesla P40<\/strong>). While GeForce GTX <strong>1080<\/strong> uses the GP104, the <strong>RTX 2070<\/strong> series contains a slightly cut-down version with 56 SMs and 1920 CUDA cores for around <strong>16% higher<\/strong> performance compared to GeForce <strong>GTX 1070<\/strong>.<\/span><\/p>\n<h2>What Pascal Features are Replaced on Pascal GPUs?<\/h2>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1071\" src=\"https:\/\/pcgearhead.com\/wp-content\/uploads\/2021\/08\/OldRoadmap.jpg\" alt=\" Fermi Kepler and Maxwell architecture\" width=\"740\" height=\"555\" \/><\/p>\n<p><span style=\"font-weight: 400;\">In Pascal, NVIDIA replaced <strong>FP64<\/strong> hardware found in Fermi, Kepler, and Maxwell architecture GPUs with fast half-precision floating-point operations (<strong>FP16<\/strong>) that are performed by <strong>Pascal&#8217;s<\/strong> new PolyMorph Engine Four. NVIDIA also added two important features with Pascal:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">The ability to perform <strong>32-bit <\/strong>floating-point arithmetic and integer operations on the same register (<strong>FP16xINT32<\/strong>). This enables a single Pascal CUDA core to handle mixed <strong>FP16<\/strong> and <strong>INT32<\/strong> operations, which is particularly important for machine learning and deep learning applications.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\"><strong>Pascal<\/strong> also has a new <strong>Unified Memory architecture<\/strong> that enables it to access the CPU memory via a <strong>high-speed<\/strong> bus, known as an interconnect fabric or crossbar, which increases the effective bandwidth between <strong>CPU<\/strong> and <strong>GPU memory<\/strong> by reducing both power consumption and latency when accessing this shared storage space. In addition, <strong>Pascal GPUs<\/strong> have four times the memory capacity of Maxwell, which is another feature that benefits deep learning and AI applications.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Want to learn about GPUs? Here are some articles you may want to explore.<\/span><\/p>\n<p><a href=\"https:\/\/pcgearhead.com\/how-to-reduce-gpu-temperature\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">How to Reduce Your GPU Temperatures?<\/span><\/a><\/p>\n<p><a href=\"https:\/\/pcgearhead.com\/how-to-tell-if-your-gpu-is-functioning-properly\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">How to Tell If Your GPU is Working Properly?&#xA0;<\/span><\/a><\/p>\n<p><a href=\"https:\/\/pcgearhead.com\/what-is-gpu-memory-clock-how-does-it-affect-gaming-performance\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">What is GPU Memory Clock &amp; How Does it Affect Gaming?&#xA0;<\/span><\/a><\/p>\n<h2>How Pascal GPU&#8217;s Architecture Improves Deep Learning &amp; Numeric Computing?<\/h2>\n<p><span style=\"font-weight: 400;\"><strong>Pascal improves FP32\/FP64<\/strong> performance with up to twice as many floating-point operations per second <strong>(FLOPS)<\/strong> compared to previous generation <strong>GeForce GPUs<\/strong>. Pascal architecture GPUs also include a new mixed-precision <strong>FP16xINT32 half-FP16<\/strong>, half-integer core that delivers up to four times the performance\/watt compared with Maxwell GPUs for deep learning inference and offers &#8220;unmatched&#8221; INT32 data precision at full speed without any impact on high floating-point accuracy.<\/span><\/p>\n<h2>Pascal Architecture Benefits Over Kepler?<\/h2>\n<p><span style=\"font-weight: 400;\">One significant benefit from <strong>Pascal GPUs<\/strong> over previous generation products is their enhanced memory compression and shared memory technologies that offer <strong>Pascal GPUs<\/strong> up to <strong>20%<\/strong> <strong>higher effective<\/strong> bandwidth compared with Maxwell. Pascal also supports second-generation delta color compression technology, which <strong>improves<\/strong> performance by reducing bandwidth\/power consumption for frame buffer accesses as well as improving cache utilization.<\/span><\/p>\n<h2>Pascal Architecture Benefits Over Volta?<\/h2>\n<p><span style=\"font-weight: 400;\"><strong>Volta<\/strong> is scalable computer architecture and is the successor to Pascal. Volta has a <strong>new type of core architecture<\/strong> called <strong>Tensor Core<\/strong>, which adds mixed-precision matrix math for accelerating deep learning inference workloads by up to <strong>12x<\/strong> compared with Pascal GPUs. Additionally, <strong>NVIDIA&#8217;s NVLINK<\/strong> GPU interconnect offers five times more links running at 20 GB\/sec bidirectionally <strong>(50 GB\/sec total)<\/strong> compared to Pascal, which enables much faster communication between the CPU and GPU.<\/span><\/p>\n<h2>What is Pascal&#8217;s Computer Architecture?<\/h2>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1072\" src=\"https:\/\/pcgearhead.com\/wp-content\/uploads\/2021\/08\/gp100_SM_diagram.png\" alt=\"Pascal's Compute Architecture\" width=\"740\" height=\"537\" \/><\/p>\n<p><span style=\"font-weight: 400;\">The <strong>NVIDIA<\/strong> Pascal Computer Architecture consists of the following main components:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\"><strong>PolyMorph Engine Four<\/strong>, which is Pascal&#8217;s new Streaming Multiprocessor that delivers faster and more efficient processing.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\"><strong>Next-Generation Memory Controller (NGMC)<\/strong>, which increases overall memory bandwidth through increased clock speeds for high-speed GDDR memory like the fast GDDR-X memory on Pascal GPUs and the use of high-speed NVLink.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Pascal CUDA Compute Capability, which increases double-precision performance from one to two times that of Maxwell GPUs.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\"><strong>Unified<\/strong> Memory, which <strong>enables<\/strong> Pascal GPU to access CPU memory that is stored in traditional DDR\/GDDR (unified) or external nonvolatile <strong>HBM<\/strong> modules (hybrid). This allows Pascal GPUs to manage a larger virtual address space than Maxwell by using a page-table scheme. Pascal GPUs are able to access both <strong>CPU<\/strong> and <strong>CACHED<\/strong> memory directly through the Unified Memory Architecture, which dramatically increases effective bandwidth between Pascal GPUs and CPUs.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Further improvements in power efficiency for high-performance graphics with Pascal GPU&#8217;s enhanced clock gating technology that turns on\/off clocks of functional units on a core by core basis. Pascal GPUs are also designed with new transistor technologies that allow for over two billion transistors to be packed into a <strong>20-nanometer<\/strong> die space, which is an increase of more than <strong>75%<\/strong> compared to Pascal GPUs and reduces power leakage by more than <strong>70%.<\/strong><\/span><\/li>\n<\/ul>\n<h2>Conclusion<\/h2>\n<p><span style=\"font-weight: 400;\">So there you go, that&#8217;s just about everything we know related to Pascal&#8217;s GPU Architecture. Pascal GPUs have four times the memory capacity of Maxwell, which is another feature that benefits deep learning and AI applications.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As far as the architecture is concerned, the Pascal architecture GPUs also include a new mixed-precision <strong>FP16xINT32 half-FP16<\/strong>, half-integer core that delivers up to four times the performance\/watt compared with Maxwell for deep learning inference. It also comes with <strong>&#8220;unmatched&#8221;<\/strong> INT32 data precision at full speed without any impact on high floating-point accuracy.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Pascal architecture GPUs also have second-generation delta color compression technology, which improves performance by reducing bandwidth\/power consumption for frame buffer accesses as well as improving cache utilization in <strong>Pascal GPU&#8217;s Architecture.<\/strong><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Many people are wondering what Pascal&#8217;s GPU Architecture is and if it will have a significant impact on the future of computer graphics. Pascal is &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"ast-button\" href=\"https:\/\/pcgearhead.com\/understanding-the-pascals-gpu-architecture\/\"> <span class=\"screen-reader-text\">Understanding the Pascal&#8217;s GPU Architecture<\/span> Read More \u00bb<\/a><\/p>\n","protected":false},"author":1,"featured_media":1069,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_seopress_robots_primary_cat":"none","site-sidebar-layout":"default","site-content-layout":"default","ast-global-header-display":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"default","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":""},"categories":[46],"tags":[],"acf":[],"_links":{"self":[{"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/posts\/1067"}],"collection":[{"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/comments?post=1067"}],"version-history":[{"count":1,"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/posts\/1067\/revisions"}],"predecessor-version":[{"id":6223,"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/posts\/1067\/revisions\/6223"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/media\/1069"}],"wp:attachment":[{"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/media?parent=1067"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/categories?post=1067"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pcgearhead.com\/wp-json\/wp\/v2\/tags?post=1067"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}