Friday, March 26, 2010

Game Logic IX : Generic Optimisations

While thinking about how to do general optimisations on this data-oriented stuff, like good sorted views on tables and fast set existence checks, something that struck me was that I've been used to using a non standard container for a very long time that helps with both of these operations.

Judy arrays have been around a long time, and I remember at the time I first found them, I thought they were magical and cool. Now, they're looking even more useful as they offer a very fast sorted insert operation. They can also be optimised to return early from existence checks. I'm not sure I'd like to try and make a branchless version of it, but there are many places where the version of Judy arrays that I implemented could be made branch free, without any degradation of performance on out of order processors. Judy arrays were designed to mostly allocate 16 byte lumps, or regular multiples of. This on it's own is probably worth caring about for the sake of the cache line size.

Judy arrays are really worth reading up on. So go do it.

The other thing I thought of when talking about existence based behaviour was that sometimes you wanted to be able to check if an implicit entity existed through their components. This is quite easy on a sorted container, but what about an unsorted container? Do you have to run through all the elements to find out if a certain one exists? Well, no, not all the time. A certain Mr Bloom invented a very clever use of a bit field and a hash function. The Bloom filter lets you check to see if an entry exists. It sometimes returns false positives, but the bloom filter operation is so quick and so lacking in memory usage that it's a really good trade off to run it quick, then do a make sure afterwards.

Judy Arrays
Bloom Filters

And so you now have some generic optimisations for your row-oriented development.

Monday, March 22, 2010

Branchless Development

So, onto making the way we get around the problem of dynamic stuff in row-oriented development. One of the things that did worry me for a while was the possibility of row oriented being a memory thrasher, but then, as homogeneous data is notoriously easy to pool, all i needed for a slick quick row allocator was a risc processor friendly branch free pool container.

Branchless Containers

Along the way I came up with some other stuff too. A stack, queue, linked list and a couple of helper functions. Mail me if you want my source in it's single header file incarnation.

Sunday, March 21, 2010

How real people do it.

GDC presentation by the makers of PixelJunk:Shooter

MakingTheShooter

Ties in quite nicely with the other article

Friday, March 19, 2010

Game Logic VIII : optimising

When optimising code from an OO base, I find that one of the problems I face more often than not is the tendency for the code to have just in time value calculation. This was really cool when I started out in coding, even a year ago I would have advocated lazy evaluation whenever possible, but now I'm not so sure. The issue with lazy evaluation is the inherent additional state data about the value, and the I cache miss for when it does need evaluating.

Pushing your data around in rows and transforming it from one state to another, you'll never hit the dreaded I-cache fail that lazy evaluation necessitates, but, you can still have lazy evaluation. Huh?

Simple, it's existence based processing again. One of the input tables you need for a process can be a dependency table, this dependency table is your lazy evaluator friend. All it needs to do is check to see if a value doesn't already exist, then add to the table of values to calculate, then process that table, then voila, you have your lazy values without having to actually break away from your main thread, because your main thread doesn't do too much before it demands the values.

Also, optimising is quite a bit easier because you know what to optimise. How often have you looked at some code and had an "aha" moment, because you've noticed that the real bottleneck was something daft like "if( gDoDebugRenderingOfThisOneThing )", which is not a const bool, so it doesn't compile out of your release build? For me, often enough to say that debug code and runtime code (processing of the game state vs processing of views on the data) should be kept separate.

Optimising processes or transforms is also a lot easier than optimising great spiderweb like connected runtime dynamic function messes that I've found in scene-graphs, game-logic and AI code paths. Humans can reason about single transforms a lot easier than they can reason about complex nested dependencies.

Finally, ifs are really bad for your code, and processing data through existence lists gives you a lot of the power of ifs without the failure overhead. The principle behind my process for removing ifs is to provide a ifless list structure that uses branchless value update techniques to write to link->next pointers. The only ifs you have to use are the "is list terminated yet" type.

By the way, thanks for reading all these posts. I hope it helps. Its helped me understand my own idea better to write it out. I think it's time for me to write up a complete document now.

Wednesday, March 17, 2010

Game Logic VII : Beautiful Homogeneity

Going back a bit now, from the start there's been other work done on stream processing and the structure of arrays approach to developing objects that can be manipulated at high velocity, but something that doesn't crop up much in their talk is the idea of processing from one source into destination of a different size. Most of my realisations have been of those kind, where a row is processed into becoming more rows (LOD up and increase the AI count), or less rows (process items and find them no longer necessary, so revert to defaults and drop the row).

This leads to a slightly different problem than the ones faced by the pure stream processing crowd. If my stuff relies on tables that can grow and shrink, surely it's going to be either just as bad at random access (if I use a linked list for my rows), or horrible for insert or delete (if I use dynamically sized arrays for my structs of arrays). Well, as I've learned from Mike Acton, you don't have to do stuff the way everyone expects.

Most of the different uses of rows fall into one of a few categories, and solving the performance of each of them can be thought of as a separate task. As far as I can tell, there are only a few types of table. I'm going to try to explain what I might do to stop things from getting bad.

There is the very simple type of table that just needs to have deletions and insertions, ignoring order, but maintaining its current-content state during a pass of the processing. By that I mean that you can guarantee that an iterator gets to see all the rows that existed at the beginning of the iteration run. Deletions and insertions are guaranteed not to affect the iterator. I think this is just a case of following Mike's lead. I'd use a similar system, but when the deletion scan passes through, rather than collapse the table back, try to put a new insertion in the gap first. That should save a few memory ops. Extending the table in case of an out of capacity exception would probably be simplest to implement as an extension mechanism (akin to that of many good dynamic pool allocators), the actual iterators will need to bear this in mind, but it may be quite a small price to pay for a no-brainer access model. Always remember, we're after maximal throughput of data, not avoiding the occasional spike.

Implicit entity tables, in some circumstances, might need to be sorted. One example would be if they need to be merged/joined for connecting sensory input into the decision tables for changing which state table their FSM exists in. A sorted table would be better than an unsorted one, however, most of the time, this could be solved by having a dedicated sorted view into the general container for the tables. If the table really does need to be sorted, then initially I'd suggest a batch based representation. Put batches of N in some self balancing tree, and whenever you need to delete an item, just mark it as deleted, so skipping is simple, and when you need to insert mid sequence, split the batch into two batches and insert into the now obvious space in the middle. If you have the safety net of an insertion and delete sync point, you can optimise this really well. By the way, I would choose a tree representation so to maintain the power of the data set being sorted. The size of N is important, but I don't think it will be easy to guess at. I think you would nee to tune per use, as access patterns and the size of the data will change the optimal value for N.

Obviously, if you're doing this, you're not going to be thrashing memory, as it's all simply poolable. This is cool as one of the problems with highly active databases in the past was disk thrashing for high throughput systems.

Tuesday, March 16, 2010

Game Logic VI : Finite State machines and Decision Tables

Finite state machines can easily be defined by existence in a table, this much should be apparent from the previous articles, but the alphabet (triggers that cause state change) can also be represented by row information. Decision tables are quite common in database tech, and they provide an easy to read, simple to reason about, really easy to modify setup for making decisions about multiple outcomes given multiple incoming data.

I'd recommend reading some stuff.

With a decision table per state, finding the next state is quite a simple task, and one that can be optimised algorithmically (using standard techniques for query optimisation, or specific ones for optimising join order on decision tables).

Also coming to mind as I type, things that are frame coherent, like renderables that have a state of "potentially visible" can use existence based processing and be ticked only when necessary. Using count-lodding to drop in sub-objects from larger size cullable groups when they become visible also allows for the traditional hierarchical approach to culling, while maintaining the benefits of FSM simplicity.

Oh and another thing, on debugging, I saw this on a data-oriented development post

"When debugging why a specific mesh is being culled, though, it really helps to see all the data about that entity in one place. Otherwise, you spend a lot of time traversing data structures in the watch window, just to find out what state an object got in that caused it to flip its visible bit to false. So the view of the data that humans need is another important piece of the puzzle"

No, think bigger dude!

As you can see, the preference for all the data in one place comes about because we're thinking about things as objects that have a state, but rethink these words with decision tables and we end up having a very good possibility of keeping tabs on what decisions were made. We can keep the resultant decision row of the culled items, and even the calculated values that lead to that row choice. This actually lets us look back in time rather than have to try to reproduce the same scenario multiple times. You could say that we're naturally stuck thinking that the only way we can debug something is by thinking about the system rather than just asking it questions directly. Microsoft have just released some cool stuff in Studio 2010 that lets you look at the history of a value, and there was a talk about a JVM that let you roll back the whole state machine, and now we can do the same without having to use managed code.

This actually makes debugging easier, not harder.

Think how often you've wondered "why" something happened. When you're homogenising the techniques you use to make decisions like this, debugging will be easier as logging decisionscan be written once and reused often. Ever heard of a reusable debugging framework before? You have now!

Monday, March 15, 2010

Game Logic V : Count Lodding

This whole thing about entities being implicit on their attributes does allow something else quite cool to emerge from data-oriented development. I'm calling it count-lodding until someone tells me what we really call it.

Take your natural squadron of space ships, visible on radar from the ground. The player has a radar and is picking up the squadron, which in game space is actually represented as an entity of type squadron, with attributes number_of_ships=4.
As the squadron comes into visible range, the distance lod requests that a squadron renderer be instantiated, suddenly there is now a new entity in the render lists, a mesh render of four simple ships.
As the squadron gets within good visible range, the squadron lods out into a single ship (that is inserts a row into the squadron leader ship table and deletes it's row from the low-lod squadron table), and spawns three more entities that are squadron-ships, all formation flying following their leader.
The number of entities has gone up from 1, to 4.
Now, anti-aircraft weaponry has come on-line and blown up two of the space ships, causing the pilot of one to eject, and the wings of the other to fall off.
1: Add row to "disabled aircraft", delete row from "squadron-ships", add row to "ejected pilots"
2: Add row to "disabled aircraft", add two rows to "broken wings"
The number of entities has gone up from 4, to 7.

as the wings and planes com crashing to the ground, their physics stabilises and they turn into mementos and simple render calls. Entities goes back down to 3, unless the ejected pilot's shute doesn't open, in which case, we can drop to 2, but a splatter render row might need to be added.

Count lodding, therefore, is about increasing the count based on lod, or decreasing it.

Consider AIs getting into a car, as they get in, their "car passenger" behaviour can be a dead behaviour, no table, and they get held only as rendering objects. The car turns from "passive parked" (non-processing table), to "driving car". At a certain distance, the people in the car can have their rendering rows removed too, meaning we've gone from 4 active, 1 passive entity, to 1 active entity with 4 mementos for rebuilding their render or other attributes in case of going high lod, or getting out of the car once they've reached their destination.

This saves us time, and memory. Both these things sound good to me.

Game Logic IV : Existence based processing

Existence based processing is something that has been missing from many data-oriented development articles. This isn't just a side note, this is a really important thing. If you have lists of stuff that need to get done, then you get dynamic runtime polymorphism. Let me explain.

Runtime polymorphism normally means runtime binding of code using virtual tables and the like, which cost you a double dereference at best. We are aware of the extra space taken up by these signatures at the beginning of all of our polymorphic structures, and the extra time involved in double dereferencing the function pointer on virtual calls. We also think it's impossible for a class to exhibit runtime binding without these costs. In games we normally use virtuals to allow us to specialise a class into a particular role, such as a person being either a player character, a normal badguy, or maybe just an innocent bystander. How they get around, their mode of transport, or even down to how they path find, it's different for each one. The player just uses controller input most of the time, sometimes controlled during cut-scenes, for a badguy, it's his template or setup from the level, for an innocent bystander it might be a canned animation for the most part, or some flocking algorithm. The important thing to remember is that whenever anything does a movement tick, it normally goes through a process of hunting down the virtual table by accessing the vtableptr at the beginning of the child class of the "person", then offsetting into the table to find the function pointer for "TickMovement()".

Normally I say because in my experience, people do it this way.

Existence based processing handles it differently. For virtuals that define a behaviour like in the case above, the table your person belongs to affects how they move. Literally, the table the person is in defines what processing will be done on them to get them to move. If the person has their motion data in the pedestrian table, they get handled like a pedestrian, looking for a random path somewhere. If they're in the badguy table, they use their start points and their patrol routes to choose their motion. If they're in the player table (which might not just be one entity, think about network games), then they'll have some input to process, then get on with moving too. But in each of these cases, there is no need for a double dereferencing vtable pointer.

That's one benefit. The other benefit is using existence to provide an alternative to bools.

Summarising compression technology: encoding probability. All known compression technologies work on the simple premise that if it's likely, use less bits to descibe that it happened. Existence based processing offers us a little of that, but for our CPU time. If most of the people in the world are at full health, only process those who have non-full health. This is simple, whenever someone gets hurt, insert a row in the "am hurt" table, and process the table it every time you would have processed health. If that person dies, then you could convert them into a deadperson (who also doesn't need an "am hurt" as all dead people have zero health, presumably). So, for things that are sometimes not default, you can reduce the amount of time spent on their processing and reduce their memory usage.
  • anim: in a modal animation for some task, like throwing a grenade, or changing weapon
  • anyone: hurt, but regenerating health
  • player: under water, losing oxygen or just surfaced and replenishing
  • people: heard some gunfire, so actively looking out for suspicious activity
  • shops: player interacted with me so keep track of what happened so I don't repeat myself or suddenly forget he sold me a +5 longsword in the ten seconds he's been away.
The idea of patching default data with modifications has been around a long time, notably with landscape systems that start from a fractal and let artists affect it then save the modifications. But this is much more runtime.

Now, let me reiterate, this gives you runtime changable polymorphism, costs you a pointer probably due to point back to parent class, which is equivalent to vtable pointer cost, but saves you the double dereference for accessing the polymorphic qualities. Also, you get to change your "type" at runtime quite easily because your type is defined by your existence in a set / table.

Game Logic III

A while back I posted about a new way of thinking about games logic and game entities. Now, I've been writing up a doc on how to make your game data-oriented using ideas from database engineering.

I feel like I've come full circle, right back to university, where I first learned about databases, about data-flow and about entity relationship diagrams.

So, onto the important. Data-oriented development: taking the old OO and throwing out the random access patterns, bringing in the structs of arrays. Graphics cards are very good stream processors, and so will all the big CPUs in our near future. Data-oriented development will let us stream through this era and give us a high possibility of utilising our machines to their fullest.

This article explains the basics better than I can, but it uses a lot of words...

Extending that, think about bools being replaced by existence in tables, using table rows for attributes but their existence to indicate that the attribute is somehow not default. We do it in many markup languages. Also think about entities as implicit through their behaviours, you can switch type as quick as you can delete a row from one table and insert it into another. This gives you runtime dynamic polymorphism without even the overhead of a normal virtual call.

If entities are implicit, then you also get what I call count-lodding, on the cheap. Count-lodding is where the LOD determines how many entities there are, such as a single "squadron" entity turning into individual planes once they're close enough, or passengers in a car, driver too, turning in their IDs to the car itself, which went from being a car-passive, to a car-driven, taking some attributes from a pedestrian that got into the driving seat, now merely a rendering task and a memento of their pedestrian selves.

This methodology leads us to a minimal amount of memory used for all tasks, keeps our allocations in regular sized pieces (because rows are regular), and gives us direction for optimisation, and provides a simple to parallelise code base, in fact, probably generically threadable. Serialising becomes a doddle, and networking should be quite a lot simpler too. Also, using this data-oriented tabled approach gives us access to years of development experience by database engineers. It finally gives us some really useful science to utilise in our day to day job.

Many times I've wanted there to be games-design-patterns, but never came across much more than the GameSessionLevel pattern, or the ObjectModelController pattern.