Okay, I suppose the previous post could be interpreted to mean "Don't worry about allocation." Not what I meant, of course. Why am I always posting and then backpedaling? It's agile blogging, I guess - get something onscreen, and then iterate to increase value.
Okay, for fuck's sake, yes, worry. (Hmm...when my daughter's old enough to read am I going to have to go through all my blog entries and take the swear words out?) Have a plan. But that plan should involve dynamic allocation on preferably just one heap - and (did I not mention this before? Sorry.) a way to track those allocations. Which you don't need to implement right off the bat--that's what I meant by being agile in the previous post--but you will need to implement tracking before you ship.
And your plan will also probably have some kind of a budget for how much memory you're going to spend where. For example, you might do some kind of streaming city game, where each block that gets loaded from the disk can take this much K and will be placed in their own fixed areas of memory and all the remaining memory will be dynamically allocated on a single heap, but you're predicting around this many K for textures and that many K for whatever.
Those individual blocks, by the way, aren't fixed pools of textures or meshes - within each block you give the artist the flexibility to decide whether they want to use mostly textures, mostly meshes, or what.
Although fixed pools are nice for cache coherency they suck for the exact thing that the human race invented heaps for - being flexible about how we use our memory. Just as an example, you may have one level or mission that uses a whole lot of one kind of asset. And another level or mission that uses a whole lot of another. Maybe you have flying missions and interior missions, for example. With fixed pools, you're screwed - your levels/missions will have a fixed sameness in the asset allocation. You'll only be able to have n clouds for your sky mission and m walls for your interior mission - even though you have no walls in your sky mission and no clouds in your interior mission.
We had a system like that on one game I worked on, and if we blew out one of the budgets for one of the many fixed pools, the smart thing to do was to double the size of the pool and get some breathing room. Which meant on that particular project, on average, 25% of our allocated memory was going unused. You can then do a production step where you run the game, find the high watermarks for all your assets, and tune the fixed-pool allocations to allocate just enough. Which isn't a fun thing to do every build.
So...what about cache coherency? It is all about cache coherency these days, isn't it? Or was that last year? Still, I'd say that hamstringing yourself with fixed pools is a premature speed optimization - if you discover you're thrashing the cache in some tight loop or other, you can make yourself a flexible--not fixed--pool with vector<> (or realloc() if you're anti-STL). (But then vector<> has the same 25% wasted memory problem unless you do a similar high water-mark production step. So only use them for the bottleneck stuff!)
So - in short - dynamic allocation good, for flexibility and maximum use of available resources.
Also - I'm not saying leave your allocation woes to the end of the project. Tackle them as soon as they become a problem; which will probably be as soon as you have a playable build on the console. You'll probably find it can only soak for n minutes before you get a fragmentation crash, if you're doing some dynamic allocation using the built-in xbox allocator. And you'll probably also find it doesn't run at 60 yet. Now's the time to solve that - optimization is no longer premature.
So I'm really talking about what order you do things in. The guys on the panel at the conference-formerly-known-as-Xfest were saying build your allocation system first to be cache friendly and to avoid fragmentation, and then get your game to a playable state. I'm saying get your game to a playable state, then worry about whether it fits on the console, fragments the heap, and runs at 60. I say that because memory fragmentation and cache coherency are fairly well-understood problems, but "the game isn't fun" isn't a well-understood problem at all.
Oh, and I should have known Andrei would post a comment to the last thing, since he was the main guy who was taking the team's haphazardly allocated & STL'd code and making it run fast. But he only had to do that with stuff that turned out to be bottleneck - if we had rules in place that said 'never use new' and 'never use STL' I believe the whole team's productivity would have suffered - it would have been a net loss - not to mention we might not have gotten our proof-of-concepts done on time and who knows what cool features might have been cancelled? Back to levels? No swinging system? The Spider-Man 2 game sold somewhat higher than one would expect, based on movie ticket sales - I can't prove it, but I believe it was because it was a lot more fun than Spider-Man 1 - so, by being agile, and focusing on gameplay/productivity over efficient code, I believe we created business value. And, hey, we did get it to run at 30 most of the time. (Thanks to Andrei's - and others - hard work.)
"Hmm...when my daughter's old enough to read am I going to have to go through all my blog entries and take the swear words out?"
I'm sure refactoring is a part of agile blogging ;-)
Posted by: anoneemouse | August 23, 2006 at 06:00 AM
If your memory access patterns are predictable, you can still get some good cache performance by judicious use of prefetching even with dynamic allocation.
Posted by: Hector Yee | August 23, 2006 at 11:33 AM
Pretty much how I feel about things too, Jamie. Start off with a solid general purpose allocator with some kind of binning/block system to reduce fragmentation for small objects. Make sure you build in some decent tracking and debugging facilities too.
Special purpose allocators are wasteful in my opinion. We saved a lot of memory on USM by consolidating most of the special purposed pools into one single small memory allocator.
If the maximum working set and the object size is small enough then you might get some cache performance improvements by using a smaller heap that would maintain locality.
Posted by: Martin Donlon | August 24, 2006 at 06:43 PM
Don't forget that malloc/free/new/delete is not the only allocation model that exists. The Jak and Daxter games used a very different memory model for game objects - the gist was that a) the children of one class (Process) could make a big contiguous land-grab when created, b) heap allocations were verboten everywhere else, and c) relocations were permitted. The system would memcpy a Process somewhere new, then call a virtual function to have it fix up its pointers.
Obviously, dealing with those relocations is a project-long headache, but in exchange you get perfect defragmentation (compaction, really) that's trivial to implement.
Of course, you can get dynamic allocations in your code way down anyway with a little skill at using the stack (which can, in turn, be acquired with some practice at a good functional language like Ocaml.)
Posted by: Charlie | August 24, 2006 at 11:56 PM
Obviously it's a speed/memory trade-off. Fixed pool allocation and release are both 2 instructions. That is the reason it is frequently used in operating system kernels (in conjunction with new block allocation on page fault that gets rid of the if statement for multiblock pools). Also FYI fixed pools aren't guaranteed to buy you any cache performance in the long run. Anyhow, even Martin's lightweight slab allocator was taking up 2% of the frame on PS2 on USM at some point. You just can't do that in any performance critical code. Any kind of general purpose allocator will be at least 10 times as slow as fixed pool or stack or no allocation. If basic performance intuition should tells you that the code is likely to become a bottleneck then chances are you'll have to go back and look for ways to either get rid of the allocation entirely or make it faster. With per-allocation access locking in multithreaded environments it's going to become even more of a problem. Re: tuning the pool sizes - fixed pools are multiblock allocators so you don't have to force an upper cap on those. Whenever I explicitly disable multiblock pool expansions on Spiderman projects in the past was to bring up an indication of a problem (object count abnormally high et c.). There is a reason for why all these different allocation models exist, that is because the heap model doesn't work for everyone in all cases. Educating yourself about different allocation models just gives you more options to choose from in any given situation. But of course whoever ends up doing things is the person who makes the final decisions with whatever preferences/knowledge they have about the problem domain. Peace.
Posted by: alienspy77 | August 25, 2006 at 03:20 PM
The best dynamic memory allocation is none at all. To avoid allocation, aim to model systems that have no persistent state. If there is no state to save, there is no reason to allocate memory.
Stateless systems require considerably more thought, but can pay off hugely in terms of lower overall complexity and raw performance.
Posted by: Bryan McNett | August 27, 2006 at 10:58 PM
I am not familiar with the stateless system model but from its description it is obvious that its not possible for any non trivial application. There is always state and state sharing. The best one can hope for is to minimize it. A function with only local variables is stateless. Using such functions you can write a math library but even a physics library will have problems, because the computations are very non trivial and require caching inputs for good performance.
Without state you could never write any efficient occlusion culling algorithm because the whole point of it is to precompute something that does not change and use it later.
I may have misunderstood the Bryan McNett's comment "To avoid allocation, aim to model systems that have no persistent state" but to me it means: "Get out of game programming and start writing simple math libraries".
Posted by: Jason Smith | September 15, 2006 at 08:40 AM