The feeling of novelty after a long downtime, and the most positive presentation, new technologies, and the price aspect did their job. Opinions, reviews and “expert opinions” are pouring in for a month and a half, and only now the fire under the saucepan has weakened and the boil has subsided. It's time to summarize what you saw, heard and read, while making up your own opinion about the product.
About light, reflections and smoothingWe can say that the GTX 2080Ti is a truly expected video card. Still, for almost two years from NVIDIA there has not been a single truly strong product in the desktop segment. The only thing worth noting is the release of the GTX 1070 Ti for comfortable competition with the Vega 56 introduced last year. And so, in August, new video cards based on the Turing architecture are announced. They say that the first batch never fell on store shelves, the whole part was sold on pre-orders.
There are currently three video cards - RTX 2080 Ti, RTX 2080 and RTX 2070. The balance of power has remained unchanged - the most productive 2080 Ti, based on the older TU102, is followed by 2080 on the TU104 and closes the triple 2070 on the TU106. Where the letters TU come from in the code name of the GPUs is immediately clear (short for Turing). Changing GTX to RTX is getting harder, even a long presentation by Jensen Huang, president of NVIDIA, doesn’t clarify all the details. A sea of beautiful pictures, pleased Mr. Huang and a jubilant audience. The hall is especially emotional when the price is announced - 499 bucks for 2070 (we have about 40,000 rubles with all taxes). What is the money for?
Mostly for real-time hardware-based ray tracing. Actually, R is taken from it (ray tracing). Rays are responsible for lighting (its realism) and reflection in the 3D gaming world. This is not a novelty; they were “invented” back in the last century, but their application is limited by the complexity of implementation - miscalculation requires significant computing power. In modern games, rasterization is used to create realistic lighting, reflections are “drawn” using environment maps. The picture is reliable, but the transition to the calculation of the rays will make it possible to make lighting and reflections just like in real life. Jensen's presentation abounded with comparisons, but instead of the words “it was - it became” they wrote “RTX ON - RTX OFF”.
The difference is huge. In existing games, which, according to the developers, are able to calculate coverage by the “new” method (no), you have to look for the difference.
If you do not focus on lighting, I would not notice anything. Yes, I played Shadow of the Tomb Raider on the GTX 2080 Ti and did not experience the wow effect. The miscalculation of light is more realistic, but nothing more. Failure? Not. Video cards have just come out, and igrodelov need time to engage new opportunities in their projects. NVIDIA, of course, understands this and promises support for ray tracing in Assetto Corsa Competizione, Atomic Heart, Battlefield V, Control, Enlisted, Justice, JX3, Mechwarrior V: Mercenaries, Metro Exodus, Project DH.
Metro looks promising, and I, as a fan of the series, liked the implementation of ray tracing in Mechwarrior.
To boost your imagination, you can watch the Reflections Real tech demo on the Unreal Engine.
In general, it’s too early to draw conclusions about the rationality of investing in new video cards right now for the sake of rays, but almost photorealistic games have promised. If I illustrate my personal impressions of what I saw at the presentation with one picture, I will not hesitate to choose this:
Together with the rays, a new full-screen anti-aliasing algorithm, DLSS, also arrived. The abbreviation stands for Deep Learning Super Sampling, but this thing has nothing to do with real supersampling. Nearly. The bottom line is that the new GPU is well suited for working with artificial intelligence, which in games can enhance images after prior training. Training is reduced to downloading a multitude (thousands of counts) of images for which honest supersampling is used, after which the AI smooths stairs and makes small details of the image clearer.
There is even a special DLSS 2x mode, when the entire frame is rendered at a lower resolution, and then the AI is brought to the "original". A kind of supersampling on the contrary, identical to the natural. The plus is obvious: less resources are spent on rendering each frame, the frame rate increases without loss of image quality. This algorithm has one minus for today - for correct work, support from the game developer is required. They promise to support smart anti-aliasing in 25 games, including Atomic Heart, Darksiders III, Final Fantasy XV, Hitman 2, Mechwarrior V: Mercenaries, Serious Sam 4: Planet Badass, Shadow of the Tomb Raider, PUBG
Turing in detailNow a little about the GPU itself, which provides the work of all the bells and whistles with rays, AI and pseudo-smoothing.
By the number of transistors, the TU102 differs significantly from the GP 102 (1080 Ti) - 18.6 billion against 12, which led to an increase in the crystal area by 60%.
At first glance, the general architecture of the GPU has not changed much - all the same CUDA cores are grouped into multiprocessors (SM), which include 12 GPCs (Graphics Processing Cluster). It also includes six TPCs (Texture Processing Cluster).
The main "brick" of this nesting doll is multiprocessors (SM). Each of them has 64 CUDA cores (they are also called stream processors or universal processors), 8 tensor cores, a ray tracing unit (RT Core), 4 texture blocks, 96 Kbytes of cache, 256 Kbytes of register file.
For comparison - the same unit from GP102. 128 CUDA cores, 4 texture units, 48 Kbytes of cache, 256 Kbytes of register file.
One SM has twice as many CUDA cores, but one GPPC in the GP102 holds significantly less SM. So it gets a larger total number of CUDA cores with the same number of GPCs. Here is the GP102 block diagram for clarity.
In general, the SM of the GP102 and TU102 are somehow very different. And if you put next to SM from TU102 and GV100 (Titan V), then the adaptation of a professional solution for the "end user" becomes more understandable.
Removed most of the blocks for operations with double precision (FP64), added RT Core. Everything? Yes, but not really. CUDA-cores from GP102 to TU102 (and initially in the GV100) are “divided” into FP32 and INT32 blocks, therefore twice as many corresponding instructions and operations are performed per clock cycle. In GP102, it was not possible to execute integer operations (INT) and floating point (FP) at the same time. This should affect the performance in games in the most direct way, since, according to NVIDIA, there are 36 integers per 100 floating-point operations.
We finalized the caching algorithm by combining the L1 cache and shared memory. This allowed to reduce delays, increased throughput and capacity of the cache.
There was a L0 cache for instructions, in each TPC (Texture Processing Cluster, combines two SM) doubled the cache volume L2. All these improvements, according to Nvidia, increased gaming performance by 50 percent or more.
RT Core blocks are specifically designed to accelerate ray tracing. Below is a schematic comparison of the calculation of lighting by the method of rasterization and ray tracing.
RT Core determines the intersection of polygon rays, using the BVH (Bounding Volume Hierarchy) algorithm. Its essence is that the scene is divided into limited volumes, which are "processed" in turn.
GP102 can also "process" the rays the same, but much more slowly, spending more CPU time in the absence of a specialized unit.
The performance of 2080 Ti was determined to be 10 gigabytes per second, but so far there is nothing to compare with, and this parameter seems to be as theoretical as the IOPS in benchmarks - the parameter jumps from processor architecture to architecture almost several times, but in practice we need highly specialized tasks for repeating such a “success”. But to demonstrate such graphics is certainly very nice for the manufacturer. So Nvidia could not resist.
The most pronounced effect of the transition to the trace when creating soft shadows, global lighting, reflections. Especially reflections of those objects that are “off-stage” and not visible to the player; they can’t be calculated at all without tracing. The opportunities for developers are huge, the picture in games can change beyond recognition by one “correct” work with light. But these games, as already mentioned, need to be done, but for now the RT unit has nowhere to prove itself. Demos do not count, there is no coopchik to drive, a single does not go through, once not chopped.
A significant part of each SM is occupied by tensor nuclei (Tensor Core), designed to accelerate the work of neural networks, or, in a simple way, artificial intelligence (AI). Tensor kernels specialize in operations of multiplying data matrices, floating point calculations. If operations are carried out in whole numbers, then the productivity of the cores increases by 2-4 times.
It is important that in this case the remaining blocks are practically not used, and the tensor nuclei themselves occupy a relatively small area of the crystal. The latter is especially true because of the specificity of tensor kernels - they perform other operations (except for working with matrices) slowly. According to the first estimates, it turns out that tensor kernels are idle in games, while AI works, the rest. But no. If the topic with DLSS (which uses just the tensor kernels) doesn’t work, then you can adapt the AI to calculate the AI, and the NPC (they are bots) in games will be smarter than the ram for some time. Plus, the cores can help to “refine” the picture after calculating the best. The principle is the same as in DLSS - let's count a few rays, go through the filter and it will be beautiful.
All Turing GPUs support GDDR6 video memory. Its bandwidth is 14 Gb / s, it is 20% better in energy efficiency than GDDR5X. The memory bus on the TU102 is 384 bits. The 2080 Ti, however, has been cut to 352 bits due to one disabled controller. In order not to load calculations, I will give the theoretical memory bandwidth for 1080 Ti and 2080 Ti - 484 and 616 GB / s, respectively. The difference is 27%, which should positively affect productivity. Nvidia video cards of some past generations lost in high resolutions to AMD / ATi classmates, including due to the narrow memory bus. I hope this is in the past, and the wide bus, two independent 16-bit channels and stable operation at high frequencies will not “choke” the GPU with data shortages.
The modified NVENC video encoder supports H.265 compression and is capable of processing an 8K video stream at 30 frames per second. Other improvements can save up to 25% bitrate when working with H.265 and 15% when working with H.264 while maintaining quality.
You can display the image on two 8K screens at a frequency of 60 Hz each, the same 8K can be obtained from the VirtualLink connector. HDR is supported with both standard dynamic range and extended.
Perhaps this theoretical block is enough for a general understanding of the situation. I briefly described only the main points, adding to my comments. The developments on DirectX12, the shader model mesh shaders, Variable Rate Shading (VRS), NVLink remained unlit. You can make your own hotel material with practical examples about each innovation, there is something to see. Next - about the hardware, about the GeForce RTX 2080 Ti.
Video card RTX 2080 TiNvidia decided to radically change the design of its Founders Edition cards and moved from the traditional cooler “long heatsink with turbine” to the quieter “slow-moving fans on the heatsink”.
The color scheme is the same - a combination of black and silver, according to the materials - metal and plastic.
On a black background, the name of the model gleams.
The inscription “GeForce RTX” glows bright green when the system is turned on.
In the hand, the video card lies in a monolith, which is an indirect sign of a good fit of all parts to each other. The reverse side is protected by a metal plate with sides.
They almost completely hide the printed circuit board, preventing contamination and mechanical damage.
Additional power is supplied through two eight-pin connectors - the appetites of the 2080Ti are considerable, the manufacturer claims 280 watts of TDP
The mounting plate looks unusual due to its matt black finish.
Of the video outputs, there were three DisplayPort (1.4a) and one HDMI (2.0b), and instead of DVI, VirtualLink now flaunts. It is much smaller and looks like USB 3.1 Type-C. Through this connector, it is proposed to connect a virtual reality helmet. For this, in addition to USB, four DisplayPort lines (High BitRate 3) are output to the connector. He did not begin to dismantle the cooling system due to lack of time. However, there is nothing fundamentally new there: the heat from the GPU is transferred to the evaporation chamber, which distributes it over several tens of fins. The ribs are brazed across the board, and the airflow pumped by a pair of fans is blown out from above and below the board.
For some reason, they refused to eject heated air outside the housing through the grill in the mounting bracket, although they did not forget to stamp the holes between the connectors.
Two fans with a diameter of 85 mm are blown through the fins
Memory chips, power elements of the power circuits are cooled through the aluminum base of the cooler, which covers the entire printed circuit board and removes most of the mechanical stresses from the PCB. The casing on the back of the PCB is also in contact with the hot sections of the board through thermal pads, "helping" the cooler on the front to cool the electronic components.
Test resultsFor comparison, the RTX 2080 Ti took the most powerful motherboard of the thousandth series - GTX 1080 Ti. The latter is from ASUS, a version with a hybrid cooling system. Separate material has been released on it, so I will not dwell on its characteristics.
The performance of the new product was evaluated in the eight most popular games.
The first one that came to hand was an executable of a medieval gop-stop simulator, For Honor.
He and 1080 Ti shows good results in 4K resolution, and 2080 Ti has thrown another 25%. As a result, more than a hundred frames per second with average quality settings. On the "Ultras" - 90 avg FPS. When switching to QHD, the gap narrowed to 21%, and in FullHD fell to 17% at maximum quality settings. It is clearly not a matter of the processor - the i7-8700K running at 5 GHz can handle the loading of both cards, which is clearly seen in the balance of power: with a decrease in the quality of 2080 Ti, the gap between 1080Ti also increases by 3-5%.
Next up is one of the most unoptimized and difficult games, Tom Clancy's Ghost Recon Wildlands.
The case when 50 average fps in 4K on "ultras" is a success for the most powerful gaming single-chip video card. The difference between the past and current flagships in 4K is from 14 to 20%, in QHD from 6 to 13%, in FullHD from 10 to 13%. The lower the resolution, the greater the increase in frames per second from one movement of the quality slider. Subjectively, there is no difference between the two Ti: that the first, that the second is equally good (or not quite as good as in 4K).
Similar results in FarCry 5, only an increase in fps from a change in picture quality is not so noticeable.
RTX is ahead of GTX by 25% in 4K, 13-18% in QHD, and only 3-4% in FullHD. It seems that here they still rested on the processor, but only 4K resolution turned out to be really difficult for video cards, regardless of the picture quality.
PLAYERUNKNOWN'S BATTLEGROUNDS has the most erratic results.
The difference in results on 4K varies 31 to 81%! Perhaps this is a consequence of the features of the game process, when the image slows down for half a second-second, and the refresh drops to a fixed 30 fps. The demo for taking the results was selected by the minimum probability of such feints, but in the heavy mode for the video card, they cannot be completely avoided. Not everyone is visible with the eye, it is best to look at them on the chart in the Fraps bench viewer.
In QHD and FullHD, the results are more stable, the difference in the performance of video cards ranges from 15 to 20% and from 5 to 18%, respectively. The emphasis in the processor happened on the average settings in FullHD, this ceiling of 218-220 frames is clearly visible.
The next game - Assassin's Creed Origins - is known for its ability to work with modern multi-core processors.
In this case, “work” means a clear reaction not only to the clock frequency, but also to the shutdown of the cores. Nevertheless, no explicit ceiling can be seen by analogy with PUBG, but the structural advantages of Turing are well manifested. The 4K 2080 Ti is 18-25% more productive than its predecessor, and the maximum gap is in the average image quality settings. In QHD, the difference is smaller, 12-14%, and in FullHD the situation is similar again: 1080 Ti is weaker by 9% at medium quality settings and by 6% at high. The reason, apparently, lies in the "factory" sets of settings for each quality preset.
The "processor ceiling" seems to be noticeable in Battlefield 1, it actually shows a limit of 200 fps.
The approach to the limiter is also visible in QHD - 4% of the gap at medium settings and as much as 24 at high. The “true” alignment of forces appears in 4K, where neither the processor nor other limiters interfere. 24-25% of the difference between the flagships of two generations, regardless of the quality settings.
At Tom Clancy's The Division, refresh changes dramatically with increasing resolution, while a change in quality in 4K - the heaviest resolution for video cards - has affected noticeably less.
2080 Ti is 19% faster in 4K, 17-21% faster in QHD and 15-16% faster in FullHD. The indicators are stable, not a single “suspicious” pass during the tests. There’s nothing to add.
Tom Clancy's Rainbow Six Siege's third game on the list has unexpectedly high performance in all resolutions.
After the heavy Division and Wildlands, such a picture seems strange. And the performance of video cards differs significantly: 34-40% in 4K, 30-40% in QHD, 12-26% in FullHD. Apparently, 382 frames and 12% happened due to the fault of the processor, and if it were quicker, four hundred avg fps would submit at a time. The repeatability of the results from passage to passage is high, the spread of indicators is less than 3%, so we consider the figures to be fair.
As a result, it turns out that the RTX 2080 Ti is 10-25% faster than the GTX 1080 Ti depending on the game. A good increase with the change of generations of video cards, Intel processors such "abyss" can only dream of.
The frequencies under load were as follows: 1725 MHz GPU, 1750 (14 000) MHz memory. Auto acceleration under load works well, 90 MHz above the official maximum.
The GPU temperature rose to 83 degrees, the radiator and the back wall warmed up to 66 degrees.
The heating is smooth, without sharp jumps, and the decrease in the clock frequency of the video processor is also noticeable with increasing temperature. The conclusion is simple: if you want maximum performance, use effective cooling. You can take advantage of the GPU Boost 4.0, which allows you to either manually or automatically search for the limit of the stable frequency of the GPU.
The new cooling system, as expected, is quiet. Behind a heavy wall of 0.85 mm metal, the video card is barely audible under load. On an open stand, already from three meters, the rustle of the blades merges with the general noise background of the processor cooler and other fans. Judging by the specifications for the latter, they are not capable of spinning more than 2450 revolutions, and an evil turbine of tens of two to three watts, capable of gaining 5-6 revolutions, is more suitable for efficient purging of such a radiator. But then it will not be so quiet, and the user is ready for much for the sake of silence, even to reset the frequencies.
The RTX 2080 Ti is definitely a success and can honorly hold the flagship title for at least a year. Useful innovations in Turing give developers new opportunities that they will have to master more than one year, bringing the quality of 3D images in games closer to photorealistic. But at the moment, many of these features (primarily ray tracing) are useless, since they are not used anywhere. With the new Tomb Rider, there are more questions than answers, and I will not presume to talk about the implementation of Ray Tracing in it.
At the moment, we have a 15-25% increase in productivity with a price difference of more than 65% (51,000 rubles for 1080 Ti and 85,000 rubles for 2080 Ti at the time of writing). Buying the flagship of the two thousandth series is unprofitable ... For games.
If we ignore fun and consider other applications of the computing power of the new product, then tensor kernels may come in handy in a “working” environment, and there is only one competitor in the 2080 Ti –Titan V, which is several times more expensive. Yes, and the cash balances are already different, and tens of thousands of rubles invested in the new product can quickly beat off with its effective use. This is good, because the deadlock is already felt well - the year is replaced by the year, the numbers are changing next to the Windows inscription, but there are no really breakthrough technologies. Just as we sat behind a flat screen 15 years ago, stomping Claudia, we are sitting. Perhaps, such “iron” opportunities will give acceleration to “everyday” progress? ..