Since some people like to argue about which features and visual effects have an effect on performance, whether or not feature X is possible on card Y, or whether DirectX version Z can, but might not know all about the actual rendering, and (many?) others seem to be interested in just a general understanding of what determines your overall performance, I thought I'd try to make sure these discussions are at least based in reality. if people know some of the facts involved, they might not resort to namecalling and flamewars so quickly. (A naive hope, but hey, worth a try)
Would be fun if we could one day have these discussions without having the thread locked.![]()
I'll try to describe what goes on as the game renders what you see, and also (to serve the latter group), point out how it affects performance, and what parts of your computer becomes the bottleneck at each step, to give an idea of which parts you should upgrade if you run into performance problems in a game).
WARNING: This is going to be a long read. Don't read it if you don't have a lot of time to kill...![]()
Now, so you know how seriously (not) to take me, a few words on my qualifications.
I am not a professional game developer.
I have not shipped any games, commercial or otherwise. So I can't tell you everything about how "real" games work inside.
I am a Computer Science student. Got my bachelor's degree a year or so ago, aiming for a masters at the moment.
I have implemented many graphical effects in an engine. (Including fragment (pixel) shaders for most common effects, as well as a few more advanced ones, like bloom lighting), and vertex shaders (for animating clouds or grass waving in the wind, as well as skeleton-based animation with blending between multiple animations, and scheduling of transitions between animations)
I have written a software renderer (which means I had to code everything myself, instead of just relying on the GPU to do all the hard work)
So I'm not a graphics guru, but I know much of what goes on inside a game engine. With that out of the way, you should know not to take my word for gospel, but also that I have *some* experience, and am not just making this up. So correct me if you think you know better than me.
I know I'm not the only programmer here, and I'm sure some people will be able to add, clarify or correct some of the following, and I hope they do. At worst, it means more correct information for you, and at best, it means I'll learn something too.![]()
And of course, ask questions if there's anything you're curious about, or that I didn't explain properly.
With that out of the way, let's get started:
Preparing for rendering
Resource loading:
When you load a level in a game, a lot of resources have to be read from the harddrive and stored in RAM. This includes geometry (meshes), textures, shaders, sounds.
Depending on the game, only part of the level might be loaded when you start, and other parts loaded on the fly while the game is running. (Look at Oblivion. The installation is what, 3-4GB? The game is one huge level. If they had to load everything at once, it'd 1) take tens of minutes, and 2) use a huge amount of RAM.
Still, the data that is going to be needed has to be loaded sooner or later.
This is responsible for the often long loading times, and also for the stuttering that may occur when you move between different areas in the game (Oblivion divides the world into rectangular cells, and it's pretty easy to see when you leave one cell. Suddenly everything in front of you becomes more detailed, and the game might stutter for a moment. That is one, rather simple, way to do loading on the fly. You can even run along these cell boundaries (which are just straight lines), and watch the effect as you occasionally cross over it.)
So, this only accounts for the occasional *loading* time, which is usually easy to recognize. Performance implications: The harddrive is what matters here. A faster harddrive will do a lot to decrease loading time. What may be more surprising is that often, a RAID-0 setup will *not* make a significant difference here. Games often load resources that are not located near each others, which means the harddrive's seek time becomes important, and the actual transfer speed less so. Also, sufficient RAM is vital. If you don't have enough RAM, some of the loaded data ends up in the pagefile, which results in more harddrive thrashing, and lousy performance.
Initializing data in memory
A second part of this is to actually structure the data. Sure, we might have loaded the "table" mesh, and the "evil terrist with turban" textures, but we also need to decide how many of these should be rendered, and where and when.
This information is often kept in some kind of scene graph. That is, we build a big tree structure, where the root is the scene itself, and then add child nodes for each object in the world, keeping track of object data like the position, as well as which texture and shader(s) should be used. Some objects might "belong" to other objects (such as the ak-47 the terrist is carrying. We want to indicate that this object should follow the guy when he moves), so we make that a child of the relevant object. This way, we can fit the entire world into one big tree, where each branch indicates "objects that belongs to the the one we branched off from". It also helps us keep track of how many instances of each model must be rendered. (There might be dozens of cars on the level, so there are dozens of nodes in this tree, each storing different world coordinates, but pointing to the same mesh)
So now we know how to render (we have the necessary meshes, shaders, textures and everything else we need), and we know where to render as well (our scene graph contains information about which objects we have on the level, and where they are)
Now, our imaginary game has got off the ground, it's finished loading, and the first frame has to be rendered.
Rendering a frame
If our level is really small and simple (like most tech demos, which contain only a few dozen objects), we can just run through the entire scene graph, and ask the GPU to render each object. The GPU will happily render each object, pixel by pixel, and then when it finds out something wasn't visible after all, it just draws over it when it comes to the object that was in front of it. So this works fine, and gives us the expected image, even though we're actually asking the GPU to render a number of objects that aren't visible (for example, the ones that are behind the camera, or ones that are hidden behind other objects). It's safe to do so, the GPU will make sure they're removed properly, but it still renders them, which takes time.
Space partitioning, visibility graphsOn larger scenes, this doesn't really work. (Think of Oblivion. How many tens of thousands objects are there in the game world? I don't know, but more than a few. Or even something as primitive as a level in Doom (1 or 2). Just try to count the number of barrels + monsters + doors + weapons + ammo + health kits. Now compare that number of objects to how many are actually visible on the screen. (let's say 4 monsters, if it's a crowded area, any maybe a handful of barrels). And this is for a game that had to run on 15 year old computers! Obviously, today's games have vastly bigger levels with far more objects in them. In other words, rendering everything in the scene graph is horribly inefficient for anything bigger than a tech demo.
Doom's big innovation (the one single feature that made it possible to run the game at all back then) was to partition the level into smaller chunks, based on what can be seen at each point in the level. The actual way this is done might get a bit hairy, but the idea is that for any position on the map, you can easily compute which partitions are guaranteed *not* visible. (Think if you have a big wall intersecting the level, with only a small doorway in it, it's fairly obvious that unless you're looking at the door, you can never see anything on the other side of the wall. So lets save that information. Then every frame we just check "are the player looking at the door? If yes, render everything. If no, render only the half of the level that we're in". (And of course, this can be done much more fine-grained. This example is just to show the general idea)
And as the above example also shows, this method is a lot easier if we have lots of walls obscuring the view. Indoors areas are wonderful to render because you can never see much more than a single room, and since walls don't often move around, we can figure out in advance which parts of the map are visible from where. And most importantly, we can do these calculations *in advance*, when the map is generated, since the terrain and walls are static and never move around. So while rendering, we don't have to generate all this data, we just have to read it from the pre-generated data structure.
Outdoors areas are a lot tougher, and can't be partitioned as easily. There are still methods for it, and they work ok, but they're not as good as the ones for indoor areas. (Which is one reason why games that take place indoors tend to have more detailed scenes. They can be made much more efficiently, so a higher detail level is possible) Same goes for destructible terrain. That screws with our space partitioning, making more things visible than we'd like, so that's a pain to do efficiently as well, and hardly any games have even attempted it.
Using space partitioning, we can discard most of the scene *before* involving the GPU. We will still end up rendering some hidden objects, but at least we've gotten rid of a lot of them. This is done on the CPU, which isn't particularly fast (compared to the GPU), so we only do this if we're sure it's worth it. It almost always is, though, so it's almost always done.
CPU culling
Another less universal trick is to perform another pass with the CPU, looking at each indidual object, and testing if *any* part of it intersects with the viewing volume (The part of the world that you can see). this is done very roughly (because it has to be fast, to avoid hogging expensive CPU time), using some kind of bounding box or sphere to represent the model (imagine a huge box around an enemy character. Assuming the character keeps all limbs inside the box, we can simply check whether the box comes near the viewing volume. if it does, we *might* be able to see part of the character, so we decide to keep the entire character for rendering. If no part of the box is anywhere near the viewing volume, we can safely throw away the entire model. This further eliminates objects that aren't visible, but at the cost of more CPU time. That's why this isn't used as often. In some games, it'd only lower performance to perform this extra pass. Performance implications: All this is done on the CPU. We're doing a lot of work on the CPU to avoid having to do even more work on the GPU. (Every object that is sent to the GPU has to be rendered, whether or not it's actually visible). So a slow CPU will give us problems here. On the other hand, the GPU doesn't even come into question. This part would be unaffected if you were running a Voodoo 2 card, because the GPU isn't used.
Now we've gotten rid of a lot (but not all) of the objects that fall outside the currently viewed area of the scene, but the CPU isn't done yet.
Preparing data for the GPU
We still need to perform a lot of other processing to prepare everything for the GPU.
We need to move and rotate every object to their current positions. (Or more specifically, we have to compute how much the object should be moved and rotated. Actually doing this to every single vertex in the mesh is done later, by the GPU)
And we need to do at least part of the animation. We might have a character model, which has a skeleton of, say, 100 bones (this is used to simplify the animation. Instead of the animator having to move every single vertex that makes up the arm, he can just move the arm bone, and the actual mesh will follow). But this means that for each bone in each character, we have to compute how the attached vertices should be moved.
This in itself might be a major operation (think Total War, where you might have 3000 soldiers on screen, and each of them has maybe 15 bones (They have to use such simple skeletons to get any kind of decent performance)). that means close to 50,000 bones transformations that have to be computed every frame. And some of these bones depend on the transformation of others (your hand should follow if the arm moves, *in addition* to the hand's own animation).
And of course, we also want the animations to be smooth, which means we might have to blend multiple animations (the character is swinging a sword while running, so we have to mix the run and attack animations), which means that every bone has *two* positions it wants to go to, which is already twice as much work *plus* the extra work of actually blending them (we may not just want an average. If I'm just starting running, we only want the run animation to have a little influence, while we mostly use the 'stand still' animation).
And because it looks like crap if we instantly switch animations, we want to schedule it too. (Don't switch from run to walk while my feet are in the air. Wait for them to hit the ground before gradually fading into the walk animation).
All this has to be done by the CPU, which *then* passes everything to the GPU's vertex shader, which, for every single vertex, in every single (animated) model, has to apply the computed transformations.
All in all, animations are a ton of work, for both the CPU and GPU.
Performance implications: Mainly the CPU suffers here. The GPU also has to do a ton of work in the vertex shaders, but 1) these are pretty fast to begin with, and 2) the GPU is usually held back by the fragment (pixel) shaders, which means the vertex shaders may be able to afford this extra work (if they're just waiting for the fragment shaders to catch up anyway, they might as well do something useful in the meantime)
Finally, we're moving to the GPU side.
Vertex Shader
The vertex shader moves all the vertices into position, according to the transformation data computed by the CPU above.
GPU culling
Afterwards, the GPU can perform yet another pass of culling. We now know where each polygon has ended up, which means we can discard individual polygons if they're outside the viewing area.
Fragment/pixel Shader
Finally, the fragment (pixel) shader kicks into gear. This has to figure out the color of every pixel on the screen. For each polygon, it runs through every pixel (or fragment, technically speaking), and computes a color based on whatever information we're interested in (for example, the distance and angle to a light source, plus the color of the texture at this location, and any "default" color of the polygon.
When we do this, we might decide to output the result directly to the screen, or we might want to save it to a texture. If we do the latter, we can then run the process again (maybe from a different viewing angle and/or using a different shader), and use the previously generated texture. This trick may be used for all of the following:
- Generating shadows (render the scene from the point of view of the light source, to figure out what the light can "see", and then in the following pass, use this info to generate shadows)
- Post-processing (bloom lighting, motion blur and such. Render the scene to a texture, then render that texture using another fragment shader to generate the bloomy highlights, or blend it with the textures generated from the last 3 frames to make motion blur)
- In-game cameras (HL2 has cameras placed around the world, which render the scene from their point of view, and then put the result onto the ingame monitors)
Performance implications: Pain... This can easily bring any GPU to its knees. At 1600x1200, we have millions of pixels on screen. Each pixel might be rendered a dozen times (because some polygons overlap) and some may not even lie inside the visible area at all but still have to be rendered, and *then* we might decide to start all over on a second pass to generate shadows or motion blur! And we want to do this 60 times per second!
Resolution plays a big role here (lower resolution means each polygon consists of fewer pixels, so we don't have to run the shader as many times (this is also a good way to test if your performance is bottlenecked by the fragment shaders. Lower the resolution and see if it makes a difference. If it does, fragment shaders are the problem. If it makes no difference, you're held back by the CPU and/or fragment shaders in the previous steps instead)
So now we've rendered *one* frame. And we can start all over again.
In between frames
Before we render the next frame, the character has probably moved, which means different objects might be visible, which means we have to run through our space partitioning tree to find out which partitions are visible, then do all the culling on CPU and GPU, and generate all the movement/rotation info again, and the animations and everything.
Usually, most of these are smallish changes (we can still see *mostly* the same objects as before, so we don't have to rebuild our render graph, we can just take out a few objects and add a few new ones, and reuse the rest.
But if something significant changes (you're teleported to the other end of the level, or even apparently small things like you turn around to look at a doorway (which means entire new areas might be *potentially* visible, we have to make a *lot* of changes, and process a ton of new objects. So initially, we have to spend time making large changes to the render graph, and afterwards, we might have a much bigger graph to consider (because more objects are now potentially visible, and have to be at least considered during culling)
So moving around may have a large impact on performance in a game. Not so much because of the movement itself, but because it might change which objects have to be considered during rendering, and which can be ignored completely. This is easily noticeable if you walk out of a building. In most games, the framerate will dip noticeably because suddenly, we can no longer just ignore huge parts of the scene. You might see a short freeze (as the game rebuilds render graph and shuffles around data in general to get everything organized, and maybe it even has to load a few things off the harddrive), and after that, framerates will be lower than before because we now have to process more objects every frame.
So there you have it. A vastly simplified game engine (or at least, the rendering part of it. I haven't even touched on sound, AI, the actual gameplay code and so on)
The point of this was primarily to show that moving around on large scenes *may* have a big impact on performance, and that small scenes are never representative of performance (because as you might have noticed if you read the above, most of the work is centered around trying to throw objects away, which doesn't really matter (and is easier to do) in a small scene)
We also don't have the actual game code, which requires most of the CPU time, which may cause bottlenecks that didn't occur in our test scene with 4 cars and a dozen soldiers on a 5x5m area.
No matter how amazing graphical effects you cram into this scene (HDR, physics-based animation, soft shadows, radiosity lighting and so on), it is still in no way comparable to a real game. It might run at 500 frames per second, and *still* be too slow for use in actual games. Or it might run at 80 frames per second, and actually be acceptable for a game (because it stresses other hardware parts than the rest of the game does)
It's just not a valid benchmark, and can't be used to show that "this effect is possible in games on this card using this version of DirectX/OpenGL".
3DMark suffers from the same thing to a certain extent. They use bigger, more detailed scenes to try to simulate games more accurately, but they're still missing a fairly essential part... The game. When they have no game running in the background, their results become inherently skewed. A lower 3dMark score might translate into better game performance, and vice versa.
But at least they make an effort to simulate games performance. Tech demos like Nvidia, ATI and the DirectX team make don't even try to do that. They're all about showing off eye candy, and *not* about comparing performance. (Nor do they claim to be valid performance indicators)
Hope someone will find this useful and/or interesting...![]()
Cheers,
Updates: (As people comment, point out weak explanations or ask questions, I'll add stuff here, to make it easier to find)
- .pak filesA lot of games use a few huuuuge files to store all their data (usually with the extension .pak or .wad or .gcf (Isn't that what Steam uses?), and some just use regular .zip files even)
The purpose with this is mainly to speed up loading. Instead of having to pick through 300 small files each containing a single sound effect, mesh or texture, it's all bundled into one huge archive which serves as a virtual file system. That is, inside the file, they store a bunch of files, complete with folder paths and everything. This is mainly to avoid fragmentation. If the game has one big 3GB file, any defragger will try to place it continuously, so that anything we read from the file comes from the same region on the harddrive. If we'd had 4000 individual files of size 4K to 20MB, they'd easily get scattered all over the disk, and a defragger would only make sure that each individual file is not fragmented, but wouldn't care much about whether all 4000 files are located near each others. (mujtaba, is that what you meant? Otherwise just gimme a yell with a correction)
- BSP trees are generated off-line, before the game is started. That's the clever thing Doom did (Wolfenstein didn't, as far as I know, so it had to settle for far more limited graphics.) When you generate the level (in a level editor), one of the things it produces is a bsp tree containing all this visibility information, telling the game which areas are visible depending on where the player's camera is.
Games generally still do this. If you look at the output from Source's Hammer level editor, you'll see that one of the things it does when compiling the level is... building bsp trees. And that means games are still stuck with the limitation that the actual map geometry has to be static. You can't destroy walls or make craters in the ground. It's the most efficient way there is to render a big 3d map, but you have to live with your geometry being static. If you want everything to be destructible, this optimization becomes useless, and you have to deal with lower performance.. But bsp trees are the only reason Doom was possible so long ago. They're also the reason why FPS games tend to have better graphics than most other genres. They tend to take place indoors, which means we can partition the scene very efficiently and only have to render a tiny area around the player.
-
Very good Jalf
You did extremely well I think you should just add a small bit about the huge game packs [the one's used to store the mesh,texture data]. -
Well I for one find it interesting. I always wondered how stuff was rendered. Thanks Jalf.
By the way, why did Pitabred edit your post? -
I think to clean up typo's and whatnot
A long read...Good post though
-
And thanks, all of you, glad you like it. -
I think he means packs as in the .PAK or .WAD files from Doom/Quake, large compressed files.
The only thing you might note is about BSP trees, in which Doom and such pre-calculated the visible objects, rather than current methods of on-the-fly calculation like you noted. That's the only reason that Doom/Wolfenstein3D, etc. could work as fast as they did with the level sizes they did on the hardware of the time. -
Charles P. Jefferies Lead Moderator Super Moderator
I've love to sticky this wonderful info for you Jalf, but we have too many stickies as it is and I they are all ones I don't want to take down. Do you mind if I link your article from the GPU guide and give you credit for the link?
-
Works for me (and I agree, I hate when there are more than 2 or 3 stickies).
-
Nice post, thanks!
-
usapatriot Notebook Nobel Laureate
Great informational guide!
-
Great info, so that means for games like crysis BSP is not used and performance would be really poor? Or did they use other methods?
-
They surely are using the BSP (because it's a major design issue) but the graphics are so detailed and the effects are so realistic that the GPU gets strangled.
-
Look at the size of a full map,and then compare it to the size of, say, a typical room or corridor in Half-Life 2 for example. That's how much more work there'd be if they didn't use any kind of space partitioning. It'd easily be a 100x slowdown.
But BSP only really works when you have lots of walls everywhere, automatically chopping the scene up into smaller parts. Without them, you have to use other, more complicated, tricks, for example chopping the world up into smaller quadrants, and then looking at the position of the player, and the direction he's facing to determine which quadrants lie partially or completely inside the character's field of view. Of course, you still have to render quite a lot because there's blocking your view, typically, but at least it gets rid of all the stuff behind you or to the sides. -
Very nice guide, though, correct me if i'm wrong, haven't model transformations been run on the GPU for years? It's just I remember there being a big fuss about hardware transform and lighting (T&L) being a big advantage of the (now venerable) GeForce 256.
-
Not all of them are run on the GPUs though, because the scene node transformation data are needed for the other parts [like physics,AI,etc]
-
Yes, everything from GF256 and up has been able to do this. However, one key difference is that on old cards, that was part of the fixed-function pipeline, which is essentially the opposite of today's shaders.
Imagine an automated production line. You get a polygon (or a vertex making up a third of a polygon) in at one end, and out at the other comes a bunch of pixels.
The problem back then was just that, it was an automated production line. You could pull a few levers and supply it with different data to work with, but you couldn't tell it to do anything *new*.
Sure, you could tell it which matrix to use to transform the polygon into world space, and then which matrix should be used to transform from there into the screen's 2d space, and you could make a few settings like, should fog type A, B or C be used, and what lighting color should be used?
With today's shader-based cards, that production line is scrapped. Instead, you upload a small program that does, well, whatever it likes with the vertex or pixel. Sure, it might opt to just do like in the old days and apply the two matrices to figure out where the polygon is in relation to the screen. But it might also decide to move it a bit further to simulate grass waving in the wind. Or it might choose that the pixel color should also depend on the angle to these three light sources, and that we should then read from the other end of the texture, just because, and maybe finally fade the color a bit. The key is, it's a program written by the graphics programmer, and it can do anything that programmer can think of.
Ok, another answer is that yes, the GPU can certainly apply the transformations, but it can't compute the data used to transform. It can't tell how much the model should be rotated, scaled and moved, but if you tell it those things, it can perform the rotation, scaling and movement. So now the obvious question might be "Ok, so what have we actually saved here, if the CPU still has to compute all those things?"
The nice thing is that these transformations can be reused for an entire model.
If you take a non-animated object like a fighter plane, the entire plane uses the same transformation. The CPU only has to compute *one* rotation and scaling and so on, and then the GPU can perform it for every vertex in every polygon making up the plane.
In an animated character, you can still do the same with each limb, but of course, different limbs might have to be moved and rotated differently to make up the correct animation (The left leg should be moved forward, and the lower leg has to be rotated relative to the knee joint, and so on). What's worse is that these might be dependent on each others. (We can't compute the lower leg's transformation until we know where the upper leg is. And we can't compute that until we know where the hip is, which depends on the torso and so on) Vertex shaders help a bit there (because we can just upload a bunch of transformations like the rotation for the lower leg *relative to the upper leg*, and then a separate transformation to position the upper leg correctly relative to the hip and so on, and then just ask the GPU to combine them), but there are still an awful lot of different transformations that have to be computed by the CPU.
So yes, the GPU can apply the actual transformations to the models (and have been able to do so for ages), but the CPU still has to compute *how* each model (or model part) should be transformed. -
This is great Jalf! Thanks for sharing.
Some thoughts on paks - it is better to have all small files in a big archive (like FEAR does it) but again it can be troublesome to unpack and even worse it is very prone to damage (many FEAR players suffer and need to reinstall the game). FarCry is an example of the 15000 files game that load sequentially when they are needed. You can change whatever you need in that game by simple change of the original file. Just mentioning.
And since you are so into this stuff - do you have any thoughts on procedural textures and models - used in those little 64k and smaller demos? How do you see the future of the procedural graphics? Why it isn't the mainstream already? What is so hard about it (I have a few ideas though).
Have you ever done anything with it?
Cheers,
Ivan -
Ok, before we get started, lemme just say what I understand by procedural content, just to make sure everyone's one the same page.
Procedurally generated content is content (3d models, textures, terrain, characters or even AI behavior or quests) that are not pre-made by an artist, but are constructed by the program.
It's been used a lot for terrain (because people suck at making random-looking terrain. Humans always want to make patterns, and terrain just doesn't look like that. Also, hand-drawing 5x5 miles of terrain is rather boring) in many games, and it's so common for clouds and other "fluffy" eyecandy effects that we don't even think about it. So it's not new.
There are two main approaches though. The content can be generated procedurally by the game, while it's running, or it can be generated for the devs during development.
The former is what random level generators do, and it's how clouds are often rendered). The downside to this is obviously that the artist has very little control over the result. He has no way to check whether the clouds look ok, because they're going to look different every time you start the game. On the other hand, it's a very powerful technique for some things. As long as you can describe the process of generating the content in question, it can be done procedurally.
Take clouds, for example. Start by placing x cloud textures at random positions in the sky. Then for each millisecond, move them y units in the wind's direction (with some random variation, possibly), and when a cloud disappears off the sky, spawn a new one at the other edge. And now we have clouds that are always going to be different to look at, which is nice. Or in the case of level generators (think Worms, or Transport Tycoon or countless other games), you get a new map every time you play.
Map generation can be a bit tricky though. As long as you can come up with a recipe for how the map should be generated, it can be done on the fly. But you usually want to make sure that the map actually works before you ship the game (the NPCs can find their way around, you don't get stuck or fall into deep pits, and so on), so you might random-generate the map during development, and then just save that particular map (or just the data needed to regenerate that particular map). This way, it's still randomly generated, but it's done during development, so the dev team can look it over and make sure it's good enough.
But there are also things where all this falls short.
Try to describe how an armor texture should be generated. Or the logo for the evil megacorp. Or the cover of a book. All these things can easily be made by an artist in Photoshop, but describing the exact process ends up being so difficult that you might as well just include the damn texture. "So, it needs to have a 4x4 pixel green area near the top right corner, with this exact green shade, and a line going from here to there, gradually fading from this color into that color, and then it needs a picture of an elf swinging a sword and...."
To generate a texture actually showing a detail (as opposed to just random whitish clouds or random greenish grass on the terrain, or random pinkish skin on the NPC, even), you need a fiendishly detailed recipe. So detailed that it'd probably take up as much space as the texture itself. (and would take far more work to create).
That's also why Spore is going for the organic look. Everything looks, well, random-generated or organic, with no actual detail. (There are no emblems or logos or pictures. Your creature is basically one color, and you can give it stripes of these other colors, and specify a few rules about how far apart they should be, and then the system can easily generate some plausible-looking stripes. But it wouldn't be able to generate the picture of a racing car, for example. Because that requires us to exactly specify each individual pixel. With spots or stripes, we're happy to say "put some yellow strips on it, starting here, and down to there... No, make them a bit thinner... thinner still, yeah, that's it. And maybe make them a bit lighter down near the tail", and that's all the detail we require.
But if we want something precise, it still has to be generated by hand by a human artist.
The above has been a bit focused on textures, but the same applies for the 3d models themselves.
As Spore shows, it's possible to make sort of organic-looking creatures. But making specifically a cat or a human is virtually impossible. There are so many details we need to get *exactly* right, that in the end, it's easier to just get an artist to make the model and stick it in the game.
And if you look close at those 64k demos, it'll quickly becomes obvious to you how few objects and textures they actually have. In the hugely popular demo released a few weeks ago (what was it, 120kb or something?), there are places where you can see the same texture tiled 5 or 6 times just in the area visible on the screen. The problem is, they can't generate new building textures because there are too many details in those. Instead, they have to recycle the same texture over and over. It still looks cool but their shortcomings are also plainly visible.
And another everyday example: Shaders. Shaders are used in pretty much every game today, and they're really nothing more than procedural graphics. They allow the game to take a texture or a 3d model made by an artist before the game shipped, and then modify it procedurally... We're in the future? Fade the colors on the photo and give it a yellow tint to make it look like old parchment.
Or an even more trivial example: The sun went down. We'll just render without this light source, making everything look darker. Yes, this sounds simple, but it's really procedural graphics at its finest. The devs never had to make a "daylight texture" and a "nighttime texture" for the character. Or for that matter, a "light is coming from the left" or "light is coming from behind" texture.
They just have to make one base texture, and then define the approriate shader, and this gets done dynamically by the GPU. The developers save a lot of time, and we save disk space for storing an awful lot of textures. And of course, we get nice smooth transitions between darkness and light.
In short, procedural content allows you to mix and blend existing content, but it doesn't really let you do anything *new* (Diablo is a good example here. You looted endless combinations of the same items, some rare, some unique, some ordinary and useless, with different modifiers and so on, but in the end, no matter how long you play, you'd never find anything other than combinations of the parts put in by the developers. Yes, that might sound obvious, but the same applies to procedural content in general. Take Spore again, you can modify existing body shapes, you can pick between x different types of feet, and modify each of them, but you're not going to see a wheeled creature, unless the developers put it in)
And for the same reason, you might see a photo gradually fading with age, but you're never going to see new people appear on the photo unless the devs added the necessary images.
So to summarize, I guess you could say that it's an obvious tool for simulations (Spore, various Tycoon games, Elite, old as well as new games have used procedural content for large parts of the game), but less so for story- or detail-heavy games.
The more control the developers need (over visuals, or the world or the story or anything else), the more they have to produce all the content themselves, manually.
Procedural gameplay has been with us literally since the first computer games. (random-generated mazes, or Elite's procedural galaxy), and within the last 5 years or so, procedural graphics has really caught on as well, in the form of shaders.
And as GPU's get more and more powerful, and get more and more features, developers can do even more clever tricks there. But it still can't solve every problem. We'll always need artists to do all the precision work, the bits that need to look exactly how the developer says, by hand.
If I have to say one thing about the future impact of procedural content, it'll have to be about the size of games.
Quite simply, I don't think games are going to get much bigger. Why would they need to? In the past, we've had to make endlessly duplicated data. In the 2d days, you needed individual sprites for each character for each frame in each animation, seen from each possible angle.
With 3d, you make one model, and it can be rotated freely. For animation, you define the movement on one skeleton, which can be used by any number of models, so you only need to store *one* walk animation, even though you have dozens of different characters using it.
In the early 3d days, you had to duplicate textures and models too. To get decent graphics, you had to use a lot of textures blended together. This might more or less be used to simulate lighting, for example.
Today, you can write a shader that makes much better lighting, without having to store all those extra textures. Same goes for shadows. We no longer need to store them as textures, because we can compute them on the fly (ok, so that's not entirely true. Shadows are still expensive to compute, so we often settle for the old solution using textures and pregenerated data which takes up more space).
Or that nifty transformation when the happy lush green kingdom gets burned to the ground? A lot of that can be achieved just by swapping out a few shaders, where in the old days, you'd have had to basically make an entirely new scene.
The point is, we're approaching a point where the only stuff we have to store on the game DVD is the *unique* content. All the specific details that can be used to construct everything else.
And when we get there, why would we need more space for our games? That would only be necessary if we started using more unique content, but that'd translate into bigger, longer games, and that would be more expensive to make. Think back on the games of the last 8 years or so. Are games today noticeably *bigger* (in terms of amount of "unique content. So Oblivion's endlessly cloned NPC's doesn't count here)
Do you see more *unique* buildings, are there more special NPC's who look different from all others, have unique dialogue and so on?
I don't think so. It seems to me that this has been pretty much constant for years now. There are far more combinations of these unique building blocks today, yes (Oblivion is a great example), but they're all derived from a basically constant amount of content.
And within a couple of years, only these basic building blocks will need to be stored on the game DVD, because everything else can be made much better on the fly (think realtime lighting or shadows)
Sure, these unique "building blocks" could then get more detailed, but really, would there be a point?
Do we need a higher polycount? We can blend and adjust and smooth things out procedurally, so even a 2000 poly model today actually looks pretty smooth and highly detailed. Do we need higher resolution on our textures? Well, do they look blocky to you? They're already very detailed, and again, we can procedurally blend them with other textures or apply various shader effects to smooth things out or highlight things (like bloom lighting does)
-
Hey Jalf - that is a long answer - thanks.
I understand procedural stuff is with us for a long time. Yes Elite, Frontier actually were coming on single floppy with a huge universe waiting to explore. And thanks for clarifying the shader thing since I believe shaders actually can make handmade textures aging or changing with the time while you play the game. Worlds can change depending on your actions (for example the kitchen tiles are more dirty every time you walk on it.
And you are true about the demos - although they look amazing for the first time they seem to be repetitive both in models and textures (well they have packed a lot in those 64 kb anyway).
And the size of the games - I think that 90% of the size today is actual textures and models. I saw some company that has developed a software tool that is able to change actual bitmap textures into procedual ones with a high level of detail and quality. The result is almost the same level of the artistic details, but only at the fragment of the previous size. I can't find that page on Google now, but I remember clips of an wooden fence and a cardboard box that is aging, and I think an image of a bathroom completely done with this tool. Well it looked pretty real to me.
And I guess games are recycling the old stuff - looking better, but not advancing too much. i was hoping to see the physics processors in computers - but I guess we will have to wait a bit more. Do you remember the game where you have to build a bridge made of different types of material? Pontiflex or something - that was a great little game based on simple physical rules.
Anyway - thanks again for the great thread,
Ivan -
AlexOnFyre Needs to get back to work NBR Reviewer
Nice guide, I like the premise, but:
In terms of performance comparison, I would love to see a chart [or list] with specs and how they effect rendering (I know most of them, but you probably know more):
fillrate (MT/s)
Bus Width/memory interface (in bits)
Dedicated VRAM (in MB)
Core Clock (in MHz)
You covered RAM and the CPU, but the relationship of the FSB and memory timings and clockrate.
Also the scalability of these units (i.e. a 7200 RPM HDD almost always correlates to almost 30 percent increase in performance over 5400 [when it is the bottleneck] whereas 512 VRAM usually only correllates to an increase of 20-30 percent over 256, though this does vary quite a lot due to architecture.)
Perhaps we need more of a computer engineer than a computer scientist for that though? I dunno, but good job so far! -
I think I can help Jalf here
FillRate : The rate that GPU send pixels to the video memory.(Crucial for high resolutions and also I think some framebuffer effects like AA)
Bus Width/memory interface (in bits) : They help mostly on textures.
Core Clock (in MHz) : The overall capabilities (including fillrate and the speed at which it does different operations will be affected).
Dedicated VRAM (in MB) : The size and detail of the textures and also the resolution.
FSB is the gate that connects the CPU to the outer world, basically (for Intel CPUs that don't have a memory controller) the bandwidth of the FSB should be more than the PCI-Express lanes , the memory bandwidth and PCI, basically all the stuff.
Memory timing is the time that it takes to find a particular location in the memory.The clockrate indicates the speed at which the transfer speed.(after finding that memory location) -
If there's enough interest, I'd be happy to write such an article as soon as I have a bit of time though.
We might think of shaders as "just the new graphics buzzword", but they really *are* a great example of procedural graphics, and they can certainly do the things you mention. That's why they're more than just "the next buzzword". -
Hey, a lot of excellent, technical information on the graphics pipeline here. Bump!
-
Thanks Jalf!
What happens when, say, your computer is rendering 42.3FPS and your monitor refresh rate is 100Hz? (Maybe a desktop question)
Does the monitor refresh using the last completed frame? And does that cause any complications?
And if your computer can go 130fps, a few frames are wasted? -
If you do, it will only render at multiples of the refresh rate. (Because it waits for the monitor's vertical sync (that is, on a CRT screen, when the electron gun has reached the bottom of the screen, and begins moving up to the top again, so it will only render when the monitor is actually ready to start on a new frame)
With it disabled, you might end up rendering at 42.3 FPS or some other weird number.
Basically, what the monitor does, is read one pixel at a time from the video ram, and put that on the screen. It starts from the top left corner, and works its way through the entire set of pixel values, pixel by pixel, row by row. Without vsync, this table it's reading from might suddenly be switched while the monitor is, say, halfway through it.
In that case, it does the straightforward thing, which is to continue reading.
So in that case, the upper part of the screen will be rendered from one frame, and the lower part will be rendered using the following one.
That's why you may get tearing with vsync disabled. The frame contents might literally change when only half of it has been put on the screen. -
Very fascinating thread. I was just recently trying to figure out what information the graphics card actually received from the CPU. Let me confirm I understood this correctly, please correct me if I am wrong. Basically the CPU decides what the camera is viewing and not viewing in terms of basic skeletal structure and orientation. This information is then sent to the GPU, which adds all the verticies attached to the skeleton, stretches a texture over it, makes changes to the texture relative to shading and lighting, then converts it to a signal the monitor can read. Is this basically correct?
Is this also the reason collision boxes are relatively larger and more primitive than the actual model, as the CPU is chiefly concerned with rough positions rather than the finer model details which the GPU handles?
Thanks again for the thread, this is so much more interesting than the english paper I should currently be working on instead. -
Yep, that sounds about right.
True about collision boxes too. Collisions have to be computed on the CPU, and the CPU isn't fast enough to deal with fully detailed models, so it usually uses some kind of simpler version of the models (sometimes simple collision boxes or spheres, sometimes just a low-poly version of the model) -
Great read! Thanks for all the info
-
stevenxowens792 Notebook Virtuoso
Jalf, Awesome thread. Thanks so much for the information... Now, I am not a programmer, but I have worked in IT for over 10 years... Here is my question. Considering VISTA or maybe even XP, do the games themselves load up within the operating system swap or cache file, or do they normally load into memory outside of operating system. My thoughts are to continue to identify ways to streamline Vista for playing games. So if it is beneficial to limit memory utilization so that Games can utilize it, then fine. If it is better to maxmize memory if the games run through the Vista paige file, that is fine too. I understand that each game will be different. Some may utilize each or both but I am trying to get a better understanding of when a game runs, what allows for better FPS. I am not interested in load times, once a map is loaded, it is loaded.
Again thanks so much for your post... Very informative.
Best Wishes,
SXO792 -
Hope it answers your question.
If it isn't clear (I wrote it over 5 hours, with a concert and a few beers in between), just ask for clarification. -
Great thread Jalf.. Thanks for alll the info & consistent replies
Ps. Thanks Chaz for linking me here -
Glad you liked it.
How games are rendered - A "few" words on comparing performance
Discussion in 'Gaming (Software and Graphics Cards)' started by Jalf, May 9, 2007.