The Notebook Review forums were hosted by TechTarget, who shut down them down on January 31, 2022. This static read-only archive was pulled by NBR forum users between January 20 and January 31, 2022, in an effort to make sure that the valuable technical information that had been posted on the forums is preserved. For current discussions, many NBR forum users moved over to NotebookTalk.net after the shutdown.
Problems? See this thread at archive.org.

    Dual Core GPU?

    Discussion in 'Gaming (Software and Graphics Cards)' started by stevenator128, Oct 10, 2006.

  1. stevenator128

    stevenator128 Notebook Evangelist

    Reputations:
    8
    Messages:
    384
    Likes Received:
    0
    Trophy Points:
    30
    Do these exist?

    It would seem smart. Intel makes dual core rather than actually putting two entire processors in one computer, so why does nVidia stick two entire graphics cards and just bridge them together? Why not make a dual core GPU? Will they ever? It just seems logical.
     
  2. TedJ

    TedJ Asus fan in a can!

    Reputations:
    407
    Messages:
    1,078
    Likes Received:
    0
    Trophy Points:
    55
    The biggest hurdle that comes to mind is the astounding transistor counts you find on modern GPUs... they put mainstream CPUs to shame.

    Closest thing I've seen to what you're thinking of is the nVidia 7950 GX2; while not dual core, it's dual GPU on a single card.
     
  3. sheff159

    sheff159 Notebook Deity

    Reputations:
    77
    Messages:
    880
    Likes Received:
    0
    Trophy Points:
    30
    There is a special 6800 card that has 2 GPU's on one card that Ive seen before. It was a while ago, but I dont know if it still exists.
     
  4. sionyboy

    sionyboy Notebook Evangelist

    Reputations:
    100
    Messages:
    535
    Likes Received:
    0
    Trophy Points:
    30
    Modern graphics cards are all technically all multicore. Vertex processing 'core' into a rasterising 'core' into a pixel processing 'core' and so on. The most common means of improving a GPUs performance is to increase the number cores for each section. GPUs are much more bigger and much more complex than CPUs, and putting two cores into one chip would be very expensive to do. As such its just easier to have two cores on seperate cards.
     
  5. Dustin Sklavos

    Dustin Sklavos Notebook Deity NBR Reviewer

    Reputations:
    1,892
    Messages:
    1,595
    Likes Received:
    3
    Trophy Points:
    56
    It's my understanding that GPUs don't even have cores in the conventional sense, and their design is largely modular. For example, a desktop 7600GT is exactly half a desktop 7900GTX. Also note that the desktop 78xx and 79xx series have their vertex shaders clocked at a different speed than the rest of the "core."

    There's no point in trying to make a GPU "dual core" when the nature of the chip design itself is as extendable and expandable as it is.
     
  6. sionyboy

    sionyboy Notebook Evangelist

    Reputations:
    100
    Messages:
    535
    Likes Received:
    0
    Trophy Points:
    30
    GPUs can be referred to as multi core as the Pixeul units, Vertex units and now geometry units are all sperate parts that have their own instruction set and the ability to operate independently. As you say a 7600GT is 'half' a 7900, its nothing more than a few more pixel/vertex 'cores' tacked on, so you could argue that the 7900 is a dual core 7600. Course that is not strictly the case. Its just geeky semantics really when it comes to multicore and GPUs.
     
  7. ccbr01

    ccbr01 Matlab powerhouse! NBR Reviewer

    Reputations:
    448
    Messages:
    1,700
    Likes Received:
    0
    Trophy Points:
    55
  8. Jalf

    Jalf Comrade Santa

    Reputations:
    2,883
    Messages:
    3,468
    Likes Received:
    0
    Trophy Points:
    105
    As said above, there's really no point.
    GPU's are inherently parallel.

    Dualcore CPU's are not efficient. They're not a good thing.They are a hack because we can't currently make a single CPU run any faster, we glue two of them together. And then it's up to the programmer to exploit this.

    Terribly inconvenient, primitive and all around awkward and inefficient. It depends on the programmer putting in extra effort, which is almost always the wrong assumption to make when designing hardware.

    GPU's don't have that problem. The data they work on can be trivially parallelized, so they don't need the artificial "core" distinction.

    They have one chip, with *lots* of processing units on it, all working in parallel. And unlike dualcore chips, they can exploit this without forcing the programmer to tear out his hair trying to wrestle the code into a shape it doesn't really fit into.

    My GF6800 has 16 pixel pipelines, roughly corresponding to 16 cores.
    The X1900 has 48.
    The upcoming GF8800 has 128.

    Sure, you could then glue another core in next to it, and get two cores, but it wouldn't be as efficient as if you just expanded the existing core, because suddenly it would require two separate data streams, one for each core to work on. That would lower system RAM throughput, and since the two cores would be unable to distribute the load as efficiently as a single "core" can, one or both would end up idling part of the time.

    In an ideal world, you wouldn't ask why GPU's aren't dualcore. YOu'd ask why CPU's are dualcore, when they could do like GPU's, and have over a hundred processing units on a single "core", all working together dynamically and sharing the load efficiently, without having to bother the programmer at all.
     
  9. stevenator128

    stevenator128 Notebook Evangelist

    Reputations:
    8
    Messages:
    384
    Likes Received:
    0
    Trophy Points:
    30
    That was beautiful Jalf. *Tear*
     
  10. Dustin Sklavos

    Dustin Sklavos Notebook Deity NBR Reviewer

    Reputations:
    1,892
    Messages:
    1,595
    Likes Received:
    3
    Trophy Points:
    56
    I'm really not sure that I agree, and that's largely because there's a major distinction to be made between CPUs and GPUs, and it's right there in the names: Central Processing Unit and Graphics Processing Unit.

    GPUs are specialized and are designed in such a way that they and their API have evolved in this fashion. CPUs, on the other hand, are designed to be general all-purpose monsters.

    While I don't necessarily disagree that the move to dual core processors (and soon quad core ad nauseam) was precipitated largely by hitting a clock speed wall, and that it was initially basically a kludge, there are two things to keep in mind:

    1. Parallelism is NOTHING new and has been around for some time, albeit more on the enterprise end. (And in Apple products.)

    2. Parallelism is a major boon for people like me. Multimedia geeks have been using multithreaded programs for years; an affordable dual core processor is a major convenience for us. Now I don't have to quit Premiere Pro when I open After Effects, and with a healthy amount of RAM I can run them both with ease. This is HUGE.

    And as for your concern about programmers, do I need to call the WAAAH-mbulance? Technology is changing constantly. They will ALWAYS have to learn how to do new things with their code, it's not always going to be frigging QBASIC 1.1. They have to adapt to DirectX 10's unified shaders and all the new features coming with that marchitecture. Being a programmer and expecting coding to stay static is just plain dumb.

    GF6800 has 16 full pixel pipelines.
    X1900 has 16 full pixel pipelines, not 48. It has 48 pixel shader units, but a pipeline consits of a pixel shader and a raster operator. This is the same kind of misnomer as saying the X1600 is a 12 pipeline part. It's not, it's really only four.
    Where did you get your numbers for the GF8800? When you switch to unified shaders (which admittedly the 8800 sort of doesn't have) you throw the whole "pipeline" idea out the window.

    I think if there were an efficient way to do this without going multicore someone would've done it by now.

    We're spectators. These people are engineers. I seriously doubt the geniuses who put together the Core 2 Duo architecture would've missed something like that.
     
  11. Jalf

    Jalf Comrade Santa

    Reputations:
    2,883
    Messages:
    3,468
    Likes Received:
    0
    Trophy Points:
    105
    It's nothing new on PC's either. Everything since the Pentium Pro has done it. If you relax your definitions a bit, the 486 did it too.
    We can easily agree that parallelism is **** important, and it's key to unlocking huge performance potential.
    It is just thread-level parallelism that's a very awkward solution.
    The Pentium Pro was a superscalar architecture. It was able to run more than one instruction in parallel. That's why it kicked so much ass compared to the 486 or non-Pro Pentium.

    The 486 was the first PC to use a pipelined architecture. Again, this is a kind of pseudo-parallelism (it allows you to execute instructions before the previous ones have finished), and again, it unlocked huge performance potential.

    Parallelism is essential. Without it, we'd be stuck with 386'es.
    And of course, GPU's are masters of parallelism.
    But it works on all sorts of levels, is my point. Parallelism is nothing new, and it doesn't require multiple cores at all.

    But thread-level parallelism is a patchwork solution, it's the second-best solution, and it's only used after the other solutions have run out of steam.
    You can only pipeline so much before you start losing performance instead of gaining it. You can only process so many instructions in parallel before the die size and complexity blows up without offering more than a few percent extra performance.

    So once all the elegant ways of doing parallelism have failed, it's time to pull out the ugly duckling, thread-level parallelism. It's nowhere near as efficient, and it requires the software developer to 1) pay a lot more attention to certain bugs that didn't even exist before, and 2) pay a lot more attention to exploiting the potential performance.

    That's why it's not the holy grail. It's just about the worst possible kind of parallelism. It's just all that's left by now.

    The PS3's Cell tries to redefine the way we write code, to make it implicitly parallel. Google use a home-made programming language that again allows code to be trivially run in a parallel manner.

    But for the rest of us, making use of multi-core processors is just a PITA.
    True, it's about the only potential improvement that's left, but it's still not an elegant or efficient one.

    Yes, but any kind of parallelism could in theory support this. My point is simply that thread-level parallelism is best avoided until the other, better alternatives have been used up.

    I'm a programmer, I know technology is changing constantly. I also know that some things change for the better, some for the worse, and some just change to something that's just not viable in the long run. And as it is now, thread-level parallelism will only scale so much. Some apps might be able to make use of, say, 8 cores, but most will even out after 2 or, at best, 4. With current programming languages and existing tools, there's just no way to efficiently distribute code on more than a couple of cores. Furthermore, with current tools and programming languages, there's just no way to avoid getting bogged down in heisenbugs and nondeterministic behavior once you reach a certain level of complexity in your software, and a certain number of cores.

    Yes, parallelism is here to stay. And yes, even thread-level parallelism isn't going away again. But it's something that has limited potential until our development tools improve drastically.

    Yep, but I wasn't talking about pipelines in particular, rather about "processing units", which you can equate to anything you like.
    The 6800 only has 8 vertex shaders, not 16, so it doesn't have 16 "full" pipelines.
    I loosely defined the number of processing units as "the highest number of units that are able to work in parallel". The 6800 has 16 pixel shader units and 8 vertex shaders. The 8 vertex shaders can run in parallel, and the 16 pixel ones run in parallel (Give or take a few technical limitations). But all 24 can't work in parallel on the same data, so I define it as having "only" 16 units.
    The X1900 is basically the same. 48 pixel shaders, 16 vertex ones.
    The 8800 has 128 unified shaders (At least according to the information released on Dailytech a few days ago). So it has 128 units that can, in principle, all work in parallel on the same data. Of course, there might be architectural limitations, and only some of them are truly unified or whatever, but it doesn't matter. My point was simply that GPU's are far beyond the levels of "dual/quad-core" like CPU's currently have.

    Oh yes, "all that can be invented, has been invented". Someone said that a hundred years ago.
    And you're right, it has been done. As I said, a CPU core uses lots of parallelism internally. And there are programming languages that hide the thread-level parallelism by making it implicit in your code, so the compiler can split it up into "threads" as it please, without bothering the programmer. SO yes, some progress has been made on this. But there's no reason to believe it can't improve further.

    Missed? Who said they missed anything?
    I just said that thread-level parallelism is not the first, second or third choice. It's there on the list, and once you've squeezed all the performance you can get from the first, second and third choices, you might just have to dig into this if you still want to improve performance.
    I'm not saying they shouldn't go multi-core on CPU's. The parallelism model used by GPU's simply does not apply to CPU's, so they have to make use of more limited and less scalable techniques.
    And while thread-level parallelism is a very awkward solution, it's still neccesary to raise performance at the moment. There's a difference between saying it's an awkward and inefficient solution, and saying it shouldn't be done at all.
     
  12. sionyboy

    sionyboy Notebook Evangelist

    Reputations:
    100
    Messages:
    535
    Likes Received:
    0
    Trophy Points:
    30
    I think he might have been referring to ROPs as pixel output when talking about the X1900. It does have 48 pipelines but it can only output 16 ROPs per clock cycle, same with the X1600. Has 12 pixel pipelines, but 4 ROPs output. Course it doesn't stop the card from processing 12 pixels at a time, and ROPs do not necessarily hinder a graphics card performance.

    Also to discuss G80 a bit, if the specs are to be believed and there are 48 PS/ 16 VS / 16 GS then the G80 could be a 144 ALU part. With the rumoured power requirements, I wouldn't be suprised!!
     
  13. Jalf

    Jalf Comrade Santa

    Reputations:
    2,883
    Messages:
    3,468
    Likes Received:
    0
    Trophy Points:
    105
    If which specs are to be believed? The ones publicizied by Dailytech mentioned 128 unified shaders, which means it wouldn't make sense to talk about PS/VS/GS distinctions.

    In any case, like I said above, the exact numbers don't really matter. My point was simply that GPU's already have lots of "cores" in a single core.