The Notebook Review forums were hosted by TechTarget, who shut down them down on January 31, 2022. This static read-only archive was pulled by NBR forum users between January 20 and January 31, 2022, in an effort to make sure that the valuable technical information that had been posted on the forums is preserved. For current discussions, many NBR forum users moved over to NotebookTalk.net after the shutdown.
Problems? See this thread at archive.org.
 Next page →

    nVidia admits Maxwell can't handle async compute well in developer do's and don'ts

    Discussion in 'Gaming (Software and Graphics Cards)' started by Ethrem, Oct 13, 2015.

  1. Ethrem

    Ethrem Notebook Prophet

    Reputations:
    1,404
    Messages:
    6,706
    Likes Received:
    4,735
    Trophy Points:
    431
    jaybee83 and i_pk_pjers_i like this.
  2. i_pk_pjers_i

    i_pk_pjers_i Even the ppl who never frown eventually break down

    Reputations:
    205
    Messages:
    1,033
    Likes Received:
    598
    Trophy Points:
    131
    I know I shouldn't be happy about this because I have so many NVIDIA GPUs but honestly this does make me happy. I want NVIDIA to crash and burn a little bit so AMD can stay in business and then we won't have to suffer with a monopoly.
     
    TomJGX likes this.
  3. HTWingNut

    HTWingNut Potato

    Reputations:
    21,580
    Messages:
    35,370
    Likes Received:
    9,877
    Trophy Points:
    931
    Well, I'd rather not go backwards in performance, but forwards. If Nvidia were to "crash and burn" we'd be stuck with lesser performance just so AMD could catch up. What we really need is another company to buy the graphics unit of AMD/Radeon whose single focus is the GPU market and live or die by success instead of just being a subset of a larger corporation where any failures can be offset by other division's profit. AMD doesn't care much about gaming graphics any more, except for consoles. And they like the server market too. Otherwise desktop and mobile GPU's are a hindsight.
     
  4. octiceps

    octiceps Nimrod

    Reputations:
    3,147
    Messages:
    9,944
    Likes Received:
    4,194
    Trophy Points:
    431
    If anything, AMD's graphics division has been one of its few profitable areas in recent years, and it's AMD's failures on the CPU side that have affected their investment in graphics and hamstrung progress, particularly mobile graphics where historically they've always had much lower market share anyway.
     
    Kent T likes this.
  5. n=1

    n=1 YEAH SCIENCE!

    Reputations:
    2,544
    Messages:
    4,346
    Likes Received:
    2,600
    Trophy Points:
    231
    ^this. The GPU division pretty much single-handedly kept the company afloat after the Faildozer disaster.

    Also nothing to be happy about. If there's a lack of competition it just means things will stagnate. If you ever need a reminder why a monopoly is bad, just look at what happened with CPUs after Sandy Bridge. Although it won't be 5% annual improvement bad. One because Jen-Hsun's ego won't allow it, and two because people will just stop buying GPUs if nVidia pulled something like that. Regardless, if AMD folded nVidia will definitely slow down, and charge more for less. Remember how every single SKU in their lineup pretty much doubled overnight with the release of 680? I have to applaud nVidia though, charging big die price for medium die but still have people gobbling them up like crazy. The trick? Just call it x80 instead of x60 Ti, most suckers buyers won't know the difference anyway.
     
  6. i_pk_pjers_i

    i_pk_pjers_i Even the ppl who never frown eventually break down

    Reputations:
    205
    Messages:
    1,033
    Likes Received:
    598
    Trophy Points:
    131
    Well I obviously wouldn't rather performance go backwards, but I don't want performance to increase to the point where AMD simply cannot keep up. I would rather performance just kind of taper off so AMD can catch up. Right now, AMD is just being killed, and I REALLY don't want them to go under. I would absolutely love for there to be a company that buys AMD but I just don't see that happening.
     
  7. PrimeTimeAction

    PrimeTimeAction Notebook Evangelist

    Reputations:
    250
    Messages:
    542
    Likes Received:
    1,138
    Trophy Points:
    156
    I would not count AMD out yet in GPU market. I have seen alot of "Budget Gaming Desktops" recommending AMD GPUS due to performance to price ratio. And going by their track record, they usually dont have any idea what their hardware can or cannot do until it is released. It is quite possible that they are working on something fantastic currently and they dont have a clue about it. But unfortunately its equally possible that the next big thing from them is a complete flop in real world. And yes Pascal will be a huge challenge for them.
     
    Last edited: Oct 14, 2015
    i_pk_pjers_i likes this.
  8. thegreatsquare

    thegreatsquare Notebook Deity

    Reputations:
    135
    Messages:
    1,068
    Likes Received:
    425
    Trophy Points:
    101
    That's a scary thought. That it's like waiting around for monkeys to produce Shakespeare.
     
    TomJGX and TBoneSan like this.
  9. octiceps

    octiceps Nimrod

    Reputations:
    3,147
    Messages:
    9,944
    Likes Received:
    4,194
    Trophy Points:
    431
    A monkey could write more intelligible English than Shakespeare
     
    Player2 likes this.
  10. D2 Ultima

    D2 Ultima Livestreaming Master

    Reputations:
    4,335
    Messages:
    11,803
    Likes Received:
    9,751
    Trophy Points:
    931
    There's limits to how much someone can defend or praise a company, and AMD is failing because they aren't making anything new. Let's see what AMD's created in the GPU market since late 2011 to early 2012 with its original 7000 series launch:
    - Hawaii (R9 290, 290x, 390, 390x, 295x2)
    - Tonga (R9 285, 380, soon-to-launch 380x; is a flat downgrade in power, vRAM and memory bandwidth from Tahiti)
    - Fiji (R9 Fury, Fury X)
    - Various APUs still based on IPC of Piledriver chips

    Repurposing stuff and dropping prices only goes so far.

    nVidrosoft's line is actually in a far worse shape than AMD's current line all things considered but at the least their technologies are fully improving, and heat has gone down as OCability has risen. Even though power consumption has (without voltage adjustment tricks) gone UP, we're still not at the level of AMD's "midrange" 250W GPU that can barely hit 1100MHz over its 1000MHz base.

    I mean, I recommend their GPUs most of the time now because most of their lineup is generally a better card for an average user, but middle of 2014 I was telling people on forums that nVidia should actually be considered instead of constant "AMD AMD AMD" the whole time. nVidrosoft however has decided to:
    - make unstable cards for their current line
    - charge $1000 for a plain gaming GPU because people don't know better
    - stagger release their cards, again (unlike apparently with the Titan card where they needed to power-revise it like what they did going from 400 series to 500 series.. can't guarantee the truth there either)
    - remove features from SLI and make it less stable
    - allow their drivers to go to absolute crap and start pushing out a bunch of WHQL drivers that crash and interfere with and all sorts of crap to systems, with no actual WHQL license protection going around
    - sweep as many problems under the table as possible

    As I said in another post: AMD needs to get their act together, BAD, and fast. AMD is SLOWLY climbing a hard mountain, but they're only being considered because nVidrosoft has jumped off the top and waved at them as they headed for rock bottom. And instead of healthy competition, we're left with a choice between:
    - A company with broken, terrible drivers, awful anti-consumer business practices, unstable, broken, badly designed specifications, overpriced cards, where just about two cards in the entire lineup are worth the $$ (980Ti and 750Ti), that seems to be going backwards with their multi-GPU features and support.
    - A company with also broken, slowly-updated, DX11 CPU-heavy drivers (that happen to be more stable), hot, power hungry, tessellation-crippled cards that can barely overclock as they're designed near their limits already, with multi-GPU configurations that can't work in any title that's not fullscreen after over 10 years.

    This is a terrible time to be a consumer.
     
    triturbo, TomJGX and i_pk_pjers_i like this.
  11. Talon

    Talon Notebook Virtuoso

    Reputations:
    1,482
    Messages:
    3,519
    Likes Received:
    4,695
    Trophy Points:
    331
    SLI problems aside, my GTX 970 was a great card. I don't remember ever having driver issues. That card boosted, and overclocked great at very low temps. When I decided to SLI it though, it was rarely supported it seemed or had terrible utilization. BF4 was the exception not the rule.

    My 980 Ti is an absolute champ. Its rock stable, boosts a crazy amount on stock (1404mhz out of the box) and hasn't crashed or had any driver issues.

    I think the driver issues you're referring to are more related to laptops and older Nvidia GPUs. That is some shady practice on Nvidia's part if they are purposely reducing performance of older cards to sell more current gens. For that reason I would love to see AMD make a huge comeback. I think Nvidia makes some great GPUs, but they need to be kept in check.
     
  12. ryzeki

    ryzeki Super Moderator Super Moderator

    Reputations:
    6,547
    Messages:
    6,410
    Likes Received:
    4,085
    Trophy Points:
    431
    I'm kinda sad about the SLI situation to be honest. I used to be single GPU precisely to avoid issues, but I was tempted to try high end SLI, and it seems like it was not the best way to go hahaha :p
     
  13. n=1

    n=1 YEAH SCIENCE!

    Reputations:
    2,544
    Messages:
    4,346
    Likes Received:
    2,600
    Trophy Points:
    231
    FTFY

    Like I said, it's almost as if the industry wants to push us towards consoles.
     
    TBoneSan, TomJGX and D2 Ultima like this.
  14. Raidriar

    Raidriar ლ(ಠ益ಠლ)

    Reputations:
    1,708
    Messages:
    5,820
    Likes Received:
    4,311
    Trophy Points:
    431
    I really do think (and hope) that AMD is waiting for the 14nm node transition to roll out any major revamping of their architecture. They dragged things along with TeraScale, they are dragging things out now with GCN. Maybe Intel should purchase AMD's graphics division and make it something great.
     
  15. TomJGX

    TomJGX I HATE BGA!

    Reputations:
    1,456
    Messages:
    8,707
    Likes Received:
    3,315
    Trophy Points:
    431
    Lol that would be the end of AMD... For an AMD fan, that's a pretty dumb comment...
     
  16. J.Dre

    J.Dre Notebook Nobel Laureate

    Reputations:
    3,700
    Messages:
    8,323
    Likes Received:
    3,820
    Trophy Points:
    431
    AMD exists because of Intel. They wouldn't be legally allowed to purchase them. If they merged, Intel would control more than 67% (legal maximum) of the processor market, making it a monopoly, even though it pretty much already is. That's what us business folk think of AMD. ;) It's a bit different on the other side of the coin.
     
    TomJGX likes this.
  17. Raidriar

    Raidriar ლ(ಠ益ಠლ)

    Reputations:
    1,708
    Messages:
    5,820
    Likes Received:
    4,311
    Trophy Points:
    431
    I'm not an AMD fan lol. I'm just a neutral observer. nVidia has its pros and cons, as does AMD. I just think in the shape AMD is in right now, they can't afford to hire the right engineers to get themselves back on their feet. Intel could remedy that in a heartbeat and further both dedicated and integrated graphics departments. I do see AMD ending up on the chopping block with different companies snatching up different portions.
     
  18. TBoneSan

    TBoneSan Laptop Fiend

    Reputations:
    4,460
    Messages:
    5,558
    Likes Received:
    5,798
    Trophy Points:
    681
    I've been reading around that AMD could still sell off their CPU division as they see fit and the Intel x86 agreement not mean squat since it still results in them having a monopoly if enforced. Thus Intel can't do much about it. Wendel on Tek Syndicate goes into detail about it.
     
  19. Zymphad

    Zymphad Zymphad

    Reputations:
    2,321
    Messages:
    4,165
    Likes Received:
    355
    Trophy Points:
    151
    It's not a huge issue since hardcore gamers who care about DirectX 12 will upgrade their GPU once games that actually use DX12 are released. If Pascal also proves to not be optimized to fully take advantage of DX12.1 in all it's glory, then that will be devastating.

    But we also have to see how the performance difference will be. I will be curious for example how well say Star Citizen or Deus Ex in DX12 vs DX11.

    But it is disappointing to read NVidia didn't do their homework with Maxwell. I'm assuming they assumed that DX12 games won't be ready for Maxwell and hoped folks wouldn't notice as they upgraded to Pascal.
     
    hmscott likes this.
  20. D2 Ultima

    D2 Ultima Livestreaming Master

    Reputations:
    4,335
    Messages:
    11,803
    Likes Received:
    9,751
    Trophy Points:
    931
    That's pretty much it. nVidrosoft makes cards to suit the times. If you look at the functionality of GPUs, Fermi is the best. Kepler removed double precision from all but two cards. Maxwell doesn't have it AT ALL. Cuda performance went down since Fermi except using double-precision on Titans. Using double-precision on Titans reduces gaming performance. The cards became all-in for current-gen gaming at the time of their release, and disregarded anything else. AMD kept everything in, that's all.
     
    TomJGX likes this.
  21. TBoneSan

    TBoneSan Laptop Fiend

    Reputations:
    4,460
    Messages:
    5,558
    Likes Received:
    5,798
    Trophy Points:
    681
    I was counting on Star Citizen to be leading the pack with DX12 but have since been guttered by their lack of enthusiasm. It seems like DX12 isn't exactly on the cards.
     
  22. sniffin

    sniffin Notebook Evangelist

    Reputations:
    68
    Messages:
    429
    Likes Received:
    256
    Trophy Points:
    76
    Well they didn't gut DP, it's there but there are less FP64 capable units than there were on Fermi. Making GPUs to suit the times has worked pretty well for Nvidia so you can hardly fault them for doing it. Nobody who buys Radeon/Geforce cares about DP so all it ends up doing for AMD is wasting die space. Honestly everybody will buy Pascal anyway so why should Nvidia care about Maxwell's DX12 capabilities? People whinge and moan but bend over anyway.

    And Fermi was an abomination in all honesty. It was one of those moments where AMD actually had somebody by the balls, shame it didn't last.
     
  23. D2 Ultima

    D2 Ultima Livestreaming Master

    Reputations:
    4,335
    Messages:
    11,803
    Likes Received:
    9,751
    Trophy Points:
    931
    Bitcoin miners
     
    TomJGX likes this.
  24. sniffin

    sniffin Notebook Evangelist

    Reputations:
    68
    Messages:
    429
    Likes Received:
    256
    Trophy Points:
    76
    Sorry I should have specified that people don't care :p
     
  25. D2 Ultima

    D2 Ultima Livestreaming Master

    Reputations:
    4,335
    Messages:
    11,803
    Likes Received:
    9,751
    Trophy Points:
    931
    "Most gamers" is the term you're looking for =D
     
  26. n=1

    n=1 YEAH SCIENCE!

    Reputations:
    2,544
    Messages:
    4,346
    Likes Received:
    2,600
    Trophy Points:
    231
    It's not so much AMD kept everything in as they designed an architecture that would allow them to expand into the professional/HPC segment with the highest margins. VLIW was very good at graphics but didn't handle compute too well. So AMD's philosophy with GCN was to make it a "flexible architecture" to be good at both graphics and compute. Basically you could say AMD tried to make Fermi 2.0, and on a raw computation power/TFLOPS level the Fury X does absolutely demolish the Titan X. AnandTech has a great writeup on GCN, and there's also a tl;dr version as well.
     
    TomJGX and D2 Ultima like this.
  27. Zymphad

    Zymphad Zymphad

    Reputations:
    2,321
    Messages:
    4,165
    Likes Received:
    355
    Trophy Points:
    151
    But does it crush Quadro? I haven't been reading about AMD making big gains and sales in professional market with Radeon GPUs.
     
  28. D2 Ultima

    D2 Ultima Livestreaming Master

    Reputations:
    4,335
    Messages:
    11,803
    Likes Received:
    9,751
    Trophy Points:
    931
    Quadro and GeForce have been mostly the same cards since Kepler.

    AMD cards technically destroy quadro with OpenCL etc.
     
  29. n=1

    n=1 YEAH SCIENCE!

    Reputations:
    2,544
    Messages:
    4,346
    Likes Received:
    2,600
    Trophy Points:
    231
    FirePros are to Quadro as Radeons are to GeForce. So no Radeon would not be crushing Quadro in the professional segment.
     
  30. D2 Ultima

    D2 Ultima Livestreaming Master

    Reputations:
    4,335
    Messages:
    11,803
    Likes Received:
    9,751
    Trophy Points:
    931
    Yeah, mainly because Quadro/FirePro is ALLLLLLLLLLLL about drivers. Since the cards are almost exactly the same, you're literally paying 4 figures+ for drivers and nothing else.
    And let's be honest. nVidrosoft's drivers are so far beyond AMD's it's a joke. For the Quadros. They gave up having better drivers in the GeForce cards for no reason.

    I find it disgusting though that paying so many thousands extra for drivers when your card can barely do anything but FP32 compute is stupid though. But there's really no choice for professionals at this point. You grab nVidrosoft's crappy cards or you deal with artifacting and various crashing issues, not to mention heat in the last few years (and this is coming from people I know who work in places that have basically render farms who dumped AMD because they were too unstable).
     
    Apollo13 and TomJGX like this.
  31. nipsen

    nipsen Notebook Ditty

    Reputations:
    694
    Messages:
    1,686
    Likes Received:
    131
    Trophy Points:
    81
    ..pretty sure those general guidelines hold true for all cards where you don't have unlimited amounts of separate compute cores :) And that being conscious of whether a separate "compute" command will stall "graphics" or shaders that aren't completed or not is another extremely general and very obvious and good piece of advice for any architecture.

    But since Maxwell collapses the number of "smx" units to fewer "devices", so to speak, it's obvious that you have fewer options to randomly add compute routines without having to rely on some internal scheduling, that then results in context shifts. So likely the results from running compute by pre-emption (that in theory is really a way to reduce the number of context shifts) might have adverse results, simply because it will cause the internal scheduler to create a context shift, since it needs to reassign for example one "smx" that are already in use.

    And.. anyone who programs compute and feels this is a tremendous surprise probably isn't really paying attention. And note that you get similar problems if you pre-empt with any shader code on any number of cards, and need to rely on the internal scheduler. I mean, it's very basic stuff that holds true for any computer core (that you might create thread starvation and multiplying context shifts for a very long time by splitting tasks into many concurrent tasks), even if it's common to teach people to disregard overhead since single operations are "so quick on modern architectures we can generally ignore overhead", etc.

    But sure - context switching on nvidia cards in general is slow. And that's not really what they're optimized or made for either. Even Quadro cards aren't, even if they do have more discrete cores that agree better with compute tasks.
     
  32. Ethrem

    Ethrem Notebook Prophet

    Reputations:
    1,404
    Messages:
    6,706
    Likes Received:
    4,735
    Trophy Points:
    431
    The problem is that up until Maxwell 2 it was like a one lane highway... Maxwell 2 has a whopping 2 lanes while AMD has what, 16 in GCN? It quickly becomes clear why nVidia takes such a performance hit. It's like trying to shove all the traffic on the interstate into two lanes... I live on the I-25 corridor and I can tell you what rush hour is like when you're going up north of Denver.
     
    Apollo13, TomJGX, n=1 and 1 other person like this.
  33. sniffin

    sniffin Notebook Evangelist

    Reputations:
    68
    Messages:
    429
    Likes Received:
    256
    Trophy Points:
    76
    Maxwell 2 only has a single queue called a Work Distributor and all commands are stuffed into this single queue. The problem is that queuing more than 31 compute commands can actually completely block graphics commands. It's fine up to a point, then it falls over. GCN does not have this problem. Nvidia will probably manage this by encouraging developers to minimize the amount of compute commands issued. They'll say something like use it sparingly.
     
  34. Ethrem

    Ethrem Notebook Prophet

    Reputations:
    1,404
    Messages:
    6,706
    Likes Received:
    4,735
    Trophy Points:
    431
    That's what they did lol. What annoys me is that nVidia knew this which is why GM2* has two lanes instead of one but they didn't bother to actually address it and won't until Pascal. Planned obsolescence...
     
    TomJGX, TBoneSan and D2 Ultima like this.
  35. Ramzay

    Ramzay Notebook Connoisseur

    Reputations:
    476
    Messages:
    3,185
    Likes Received:
    1,065
    Trophy Points:
    231
    cough...Micro$oft...cough
     
  36. nipsen

    nipsen Notebook Ditty

    Reputations:
    694
    Messages:
    1,686
    Likes Received:
    131
    Trophy Points:
    81
    :D Right, but still - it's not a huge surprise that nvidia wouldn't care about creating an internal scheduling system for a very high amount of concurrent tasks. As in, allowing a single thread to feed the graphics card a million tasks at once - you're supposed to schedule the runs externally and take advantage of limited simd capability for the tasks on the graphics card that can be automatically allocated in the immediate context... that's how a peripheral card on an external bus works..

    I mean, it's the exact same thing on AMD. The only difference is that the proximity to the cpu cores means that a context switch and a revert is much, much faster.

    Just saying that even if Nvidia wrote in fifty "lanes" for access towards the work queue - there's really no bus-architecture or transport back and forth off the card that would take advantage of it in any way. Imagine having to wait for IO on the bus, just in case it's going to be possible to assign a different set of concurrent tasks later, that would allow better smx utilisation - potentially, in 100ms, etc. Not going to be much point.

    It's the same proposition as adding cpu-calculations into a graphics context manipulation each frame, for example. It's not going to happen because of the external bus IO waits. So you might want to get around that by using compute - but, no surprise, compute on non-programmable simd is really, really slow and resource hungry.

    Brilliant that people seem interested in compute all of a sudden. :) But this isn't exactly news, is it? That Nvidia cards are not optimized for deep concurrent parallelism of infinite amounts of complex tasks? Over being super-fast for limited simd execution for immediate graphics context tasks/simple pixel operations, etc., that always exist, and where slow and cheap ram is fast enough to still be useful. That's... practically what the business was made from.

    Hell, I've been told for a decade that no one cares about asynchronous parallelism, and that it's all utterly idiotic and a waste of time. ;)
     
    jaybee83 likes this.
  37. Zymphad

    Zymphad Zymphad

    Reputations:
    2,321
    Messages:
    4,165
    Likes Received:
    355
    Trophy Points:
    151
    NVidia's Quadro is still the most powerful compute GPU. AMD's top tier FirePro is nowhere close to being as fast. AMD can spout all the BS they want about how much powerful their GPU is in synthetic benchmarks and double precision BS. When it counts, when professionals use REAL tools to do their work, Quadro crushes AMD.

    Don't even bother comparing AMD FirePro to NVidia's Tesla. AMD doesn't have an answer at all for Tesla.

    AMD had to create GCN to compete with Quadro and they still haven't succeeded. FirePro still consistently has more issues. OpenCL still hasn't become an industry standard and it's still not as well developed as CUDA.

    To say NVidia is not known for compute is nonsense. NVidia set the standard with CUDA and Fermi well before AMD began spouting their BS about teraflops of power in synthetic benchmarks.

    Also the reason why AMD financially is in the dumps despite their healthy consumer sales, it frankly is not profitable. NVidia nearly has a monopoly in the professional market and that's where the money is. NVidia does have the monopoly with US Defense research and supercomputers.

    Top 100 SuperComputers are dominated by Intel E5. And 15 of them use NVidia Tesla. The top three most powerful SuperComputers that are in development in US all will use NVidia Teslas. NVidia not known for compute parallelism. AMD wish they had Tesla.
     
    Last edited: Oct 26, 2015
  38. n=1

    n=1 YEAH SCIENCE!

    Reputations:
    2,544
    Messages:
    4,346
    Likes Received:
    2,600
    Trophy Points:
    231
    Tesla is the dedicated compute GPU as it doesn't even have any display outputs, is completely passively cooled so needs to be mounted in a rack with tremendous amounts of airflow in order to not burn up.

    Quadro for the most part is just GeForce with ECC memory and certified drivers. Except for a few top of the line Quadro parts, all of them are FP64 gimped just like the non-Titan GeForce parts (excluding the entire Maxwell lineup obviously since they can't FP64).
     
    Last edited: Oct 26, 2015
    D2 Ultima likes this.
  39. D2 Ultima

    D2 Ultima Livestreaming Master

    Reputations:
    4,335
    Messages:
    11,803
    Likes Received:
    9,751
    Trophy Points:
    931
    It really isn't.

    It really is. When people were mining bitcoin like crazy using OpenCL on AMD, there were dudes who wrote a CUDA miner app for nVidia cards. All the best Kepler cards were still multiple times slower than AMD, while using their beloved CUDA.

    This is because of drivers. AMD cannot driver. Quadro/Firepro class cards, you are LITERALLY paying for the drivers. Calling companies like adobe for support will hang up on you without a Quadro or Firepro card in your rig, because consumer drivers are not guaranteed. Grab a GTX 680, pop off a resistor and have the card show up as a Quadro K5000 (which allows drivers to install) and BAM! Instant help from those people. For a small percentage of the cost. It's all drivers, excepting maybe one or two cards that are slightly different in say... amount of memory, etc (but the architecture is the same).

    Tesla is an entirely different ballgame to Quadro/Firepro.

    They still have issues because drivers and programs. It doesn't matter if OpenCL is used or not, the fact is that OpenCL on AMD cards is faster than any current or last generation CUDA-capable GeForce or Quadro card could ever HOPE to compute using either CUDA OR OpenCL.

    nVidia has been long known for CUDA. But as everyone has noticed, since Kepler CUDA performance has gotten worse, and to an extent they've even removed capabilities for it from drivers which require modified drivers to work. If Maxwell could compute as well as Fermi could, the raw benefit of Maxwell's increased core counts and clockspeeds and architecture should theoretically make compute much faster than it currently is.
     
  40. n=1

    n=1 YEAH SCIENCE!

    Reputations:
    2,544
    Messages:
    4,346
    Likes Received:
    2,600
    Trophy Points:
    231
    Yeah GeForce/Quadro/Tesla all run on the same silicon, the differences are mainly driver-side, and of course GeForce also has some extra hardware to ensure it stays a GeForce.

    That said, I swear I remember reading a publication from HP that talks about the differences between Tesla and GeForce cards, and one thing they noted was the Tesla cards were unbelievably oversoldered at every joint in order to cope with the stress of running full tilt 24/7. This is why cards used for mining often die prematurely, not because the GPU starts to degrade (as long as it's kept cool), but because the components on the PCB very quickly wear out since gaming cards were never designed with 24/7 100% load operation in mind.
     
  41. nipsen

    nipsen Notebook Ditty

    Reputations:
    694
    Messages:
    1,686
    Likes Received:
    131
    Trophy Points:
    81
    Still.. typically the advantage quadro and tesla cards have is an increased number of smx'es.. Streaming Mollusc X-treme processor..? something like that? Along with a grid management unit for mapping an increased incoming number of concurrent jobs, to execute the code as efficiently as possible.

    And my point was just that it doesn't make sense to have multiple pipelines to the graphics card (hardware differences), or a semi-intelligent way to schedule incoming jobs (in firmware/software on the chip), when you don't have that increased number of cores. Because the ones you have can and.. should.. really only be programmed with a few tasks at the same time.

    In the same way, it makes very much sense to focus on fast and limited simd on gtx type cards, to perform typical and very simple shader and pixel-operations on higher clocks, at burst speed - rather than cram these cards full of smxes that often won't be utilized. That's just cost-efficient when thinking about the tasks they're supposed to perform.

    Meanwhile, some of us would dearly like to see more OpenCL programming in graphics. But on current bus-technology, the response after each operation is simply too slow. And the area of use for an increased number of SMs, stream-processors on peripheral cards becomes high-latency jobs such as.. Folding@home, bitmining, etc. That agree with a distributed model in the first place.

    I mean, we're talking about just a series of extremely simple processors with limited capability put in an array here. And for the tasks you're typically going to have in a game, and so on, you're going to favor fewer but faster cores. For economic reasons, and for practical reasons as well (a huge array of processors draws a lot of power for one). Meanwhile, you actually do have examples where compute code does execute faster on fewer but faster cores, in the way that for example on the laptop market, you have fairly decent performance in practice on a gtx card compared to a quadro card in a very large amount of typical usage-examples.. that's.. another thing.

    Of course, ideally we would have a million cores that could be put in a graphics card and for example clocked invidually, or disabled when they're not in use, that sort of thing. So we could have massive "compute" performance on demand, with some sort of fairly cheap power-budget whenever it's going to execute. And I'm guessing that better and more compact chip manufacture and ever cheaper hardware, along with better scheduling and control over active cores (like turned up with kepler and specially maxwell) is an approach towards that. To get more compute performance in a small watt-budget. Since apparently compute is all the rage now, and I didn't even notice, but never mind.

    Meanwhile, again the difference with AMDs approach with the apus is the bus proximity. And how they take care of the scheduling for that, or make it possible to use some convenient api for allocating compute tasks, that's really a secondary concern over the actual hardware capability. Where an automatic or completely plain round-robin allocator would just instantly perform, as it's been shown already, will have comical compute performance compared to your average peripheral card, from amd or nvidia. Even if there were huge pitfalls and massive amounts of locked threads and wasted time in the process-diagram.

    Just pointing that out - that code specifically written for a limited number of compute capable cores, that has no demand for instant response or completion, where most of the task can be run asynchronously and it is easily parallelizable. Measuring performance differences between AMD and Nvidia cards in that case can be interesting in some ways. Discussions about how wise it is to use more general purpose cores over having specific shader-units and specific pixel-operation units, is also perhaps interesting. From a cost-efficiency standpoint and a practical standpoint, when talking about performance for specialized tasks.

    And it sure could be pointed out that keeping "gpu-cores" and "cpu-cores" as two separate devices when hardware is as cheap as it is nowadays is a stupendously idiotic thing to do, and a complete anachronism only kept in place by industry conventions, purely for marketing reasons. In the same way that having that design limits the potential of bus-proximity, for all the performance increase it will give compute performance tasks that typically are written now.

    But regardless of that - you wouldn't benefit from a solid scheduler, multiple pipes, etc., if you didn't have a very large number of cores. And you only need it then, and benefit from increased numbers of cores, if you have relatively high latency tasks to complete towards system ram. Or if you wanted to perform somewhat simple math on graphics card memory, without having to return IO first (and this is where the entire compute pre-emption and VR dimension comes in - limited "compute" for dealing with occlusion detection and deformation is more common though. And very likely the biggest use of the bus-proximity on an apu will be to simply perform standard shader-operations faster, or at a similar speed as before. Rather than anything interesting).

    So saying that the only difference going on is drivers, and which overall api you use, and how efficient dx12 or whatever is going to be, and things like that. It's not untrue as such. After all, like people point out, specialized drivers or very specifically programmed tasks can make certain hardware very fast for those specialized tasks.

    But it skirts the actually interesting part, about the bus-proximity of compute-capable cores and cpu-cores, towards system ram. This is important.
     
  42. D2 Ultima

    D2 Ultima Livestreaming Master

    Reputations:
    4,335
    Messages:
    11,803
    Likes Received:
    9,751
    Trophy Points:
    931
    Because gaming doesn't use near 100% of a GPU, as most of us don't know.

    Well, for Quadros that's not true. They are, quite literally, almost exactly the GeForce cards. It's why as I said before, popping off a resistor in a certain location in some GeForce cards change them into the Quadro cards and they perform exactly the same. Teslas... I don't claim to know a whole lot about, really. My understanding is that they're in another class entirely, even though the same basic architecture is present (maxwell is still maxwell, of course).
     
  43. nipsen

    nipsen Notebook Ditty

    Reputations:
    694
    Messages:
    1,686
    Likes Received:
    131
    Trophy Points:
    81
    Well, then there goes that. It really is just driver differences? Certain functions just are slower.. or implemented without the grid management unit activated, something like that, so the run time for certain functions are basically multiplied by the queue depth, in the worst case..? I thought they had the same chip, but at least had different config options on the internal bus and the ram, and so on.

    So.. any qualified guesses on whether or not compute pre-emption being so expensive on Maxwell is also because of driver capability...? That compact allocation of smx devices could be a problem, and so on. But that the substantial performance hit really comes when the internal scheduler croaks? And that you would have a similar problem on quadro and tesla cards if their internal scheduler enhancement wasn't there?
     
  44. D2 Ultima

    D2 Ultima Livestreaming Master

    Reputations:
    4,335
    Messages:
    11,803
    Likes Received:
    9,751
    Trophy Points:
    931
    For about 95% of Quadros and GeForce cards, it's JUST driver differences. Teslas I am certain are a different beast, but I don't know enough about them. Prema once told me that SOME of the quadros weren't the same for Maxwell, but not all are different, and any sort of hardware limitation Maxwell has is present in the Quadros. If the Quadro can do something the GeForce cannot (due to as you said, grid management unit activated or something) then it's artificially turned off via drivers in the GeForce card and turned on via drivers in the Quadro card.

    I have no guesses as to whether DX12's async compute fails are driver-resultant or not. I can say however that if CUDA simply never uses the parallel processing methods that supersede the card's hardware designs (as CUDA is handled by the driver after all; so it's no surprise that it would calculate in a card-friendly manner) it would never show up in compute-related functions. Again I don't know about Tesla cards.

    But honestly, if the quadro cards could do things the GeForce couldn't, then all we'd need is a maxwell quadro user to run the Ashes of the Singularity benchmark and compare to the GeForce equivalent of the card. If the quadro does better on a quadro driver, then that means the cards are physically capable of better and drivers are the limiter. If it does not do better, then it means the cards are flat out incapable, and CUDA simply works around the cards' downsides.
     
    i_pk_pjers_i likes this.
  45. sniffin

    sniffin Notebook Evangelist

    Reputations:
    68
    Messages:
    429
    Likes Received:
    256
    Trophy Points:
    76
    Honestly why would you come into a technical discussion waving your arms around and rambling about Nvidia's professional market success? This discussion is based on facts and reality, not Jen-Hsun's dreams.

    This is true. The whole point of Nvidia's professional market strategy is that you are already making GPUs in volume to sell to consumers.You take some of these GPUs, put them through additional validation, and rebadge them and sell them for 10 times as much, and lock features to drivers specific to them.

    If Tesla and Quadro were based on different GPUs it would defeat the purpose. GK210 was the first case of professional cards using a different GPU. Whether this is a blip or the start of a trend I guess we'll found out.
     
    Last edited: Oct 27, 2015
  46. n=1

    n=1 YEAH SCIENCE!

    Reputations:
    2,544
    Messages:
    4,346
    Likes Received:
    2,600
    Trophy Points:
    231
    lol I was simply pointing out it's probably the solder joints that are the first to fail instead of the GPU itself crapping out.

    As far as Quadro/Tesla go, this is my understanding:

    Quadro is basically just a GeForce with ECC memory. They still have display outputs, and are actively cooled by a fan, and in a pinch can still be used for gaming.
    Tesla is a dedicated compute GPU. It comes with ECC memory of course, but has NO display outputs, so you can't even hook them up to a display. Some Tesla cards come in both active/passive cooling variants, but some (like the GK210 based K80) only comes in passive form, meaning they're most certainly intended to be mounted in racks with lots of airflow, and not to be used as a standalone card.
     
  47. octiceps

    octiceps Nimrod

    Reputations:
    3,147
    Messages:
    9,944
    Likes Received:
    4,194
    Trophy Points:
    431
    Because gaming workloads have traditionally tilted more toward graphics than compute. Up until 9 years ago we didn't even have the hardware to do GPU compute, or widespread API support until 6 years ago. Although I'm still not sure why compute has suddenly become such a hot and divisive topic recently, considering it's already been used in popular games for a number of years now (since the inception of DX11) for everything from deferred lighting to AO to DoF to realistic hair/fur/particle physics.
     
  48. nipsen

    nipsen Notebook Ditty

    Reputations:
    694
    Messages:
    1,686
    Likes Received:
    131
    Trophy Points:
    81
    Mm. I'm guessing that when you run into situations where one piece of hairy code in Directcompute runs perfectly fine on one platform, and somehow has immense penalties on another, you get these "this platform is ****" articles.

    By the way, 3dfx had Glide :) Collections of routines with semi-complex math that completed fairly fast on immediate memory. Few titles ever used it in a way that put lighting effects on moving objects controlled by core engine logic.. can't really think of anything except Lander by Psygnosis.. (because it was pretty complicated to do it - I've experimented a tiny bit on a hobby-basis, but I can see why people would give up. It's.. still simpler than trying to gracefully incorporate graphics code with access to system ram on mobile phones, though..), but that sort of compute has been turning up once in a while. Memory hacks, more or less.

    What's turning up (again now?) is that demand for general compute performance for unoptimized high-level code. Maybe more people tire of having to be restricted to one specific proprietary "tech" to create effects. That you get developers of games wanting to keep their code between projects, that they want their artists to have some predictability about what they can create, and people who write UI backends and so on want to have an easier standard way of dealing with graphics code. That even the super-gurus who would sit on compact shader-code that did extremely specific effects are getting old and tired, that sort of thing.

    But I mean, the "asynch compute performance crash" thing is from the Oculus Rift guys, no? I'm (randomly) guessing they want to put in per-frame correction of some sort that has to complete before rendering. And that pre-emption compute seemed like a good way to do it (in spite of the context shifts that will happen on all platforms. That then unfortunately are crashing on Nvidia gtx cards.. because of the way the internal scheduler works..?).
     
  49. Zymphad

    Zymphad Zymphad

    Reputations:
    2,321
    Messages:
    4,165
    Likes Received:
    355
    Trophy Points:
    151
    As indicated by Oxide statement that alluded NVidia does support it at hardware level, but not fully implemented in their drivers.
    - The queue process is software unlike AMD that has a queue compute engine. Will be interesting the results when NVidia actually implements DX12.

    What is ludicrous is NVidia published papers discussing Async/Parallelism years ago, when FERMI was being developed. Also NVidia hardware has been used in every MS DX12 presentations. Why NVidia hasn't fully developed this feature in their drivers or emphasized it is curious.

    My guess why AMD emphasized it is not because of DX12 or PC gaming, but for consoles where async seems to shine.

    BTW AMD doesn't support Conservative Rasterization and Raster Order Views DX12 features, only NVidia does. It could be that NVidia actually has more complete DX12 feature than AMD and it is just coincidence AMD has hardware queue compute engine because of consoles, not because of DX12.

    Just rambling thoughts.
     
    Last edited: Oct 29, 2015
    jaybee83 likes this.
  50. Apollo13

    Apollo13 100% 16:10 Screens

    Reputations:
    1,432
    Messages:
    2,578
    Likes Received:
    210
    Trophy Points:
    81
    I thought you were referring to AMD at the start of that, until the anti-consumer business practices. Are nVIDIA's drivers really that bad these days? I've been using ATI graphics for years now (most recent nVIDIA is from 2007), and back when I last researched driver comparisons, nVIDIA's were generally recommended. I know they did have the overclocking fiasco early this year, but the quality is down, too?

    I also remember when you could install Quadro drivers on GeForce cards... did that on my 8600M GT a couple times to play around and see the differences. Though it's been long enough that I'm not entirely sure those weren't modded drivers.

    At any rate Ethrem made the key point of the thread in post #32. Pretty much sense the first in-depth analysis of why Maxwell was doing so poorly compared to Radeons on the Ashes of Singularity benchmark, it's been known why and that it's in the hardware. Probably before that for those who were better-versed on the technology. I'm sure nVIDIA will be addressing that with next year's cards, but for the sake of competition I'm glad AMD does have a slight head start here.
     
    TomJGX likes this.
 Next page →