Going back a bit now, from the start there's been other work done on stream processing and the structure of arrays approach to developing objects that can be manipulated at high velocity, but something that doesn't crop up much in their talk is the idea of processing from one source into destination of a different size. Most of my realisations have been of those kind, where a row is processed into becoming more rows (LOD up and increase the AI count), or less rows (process items and find them no longer necessary, so revert to defaults and drop the row).
This leads to a slightly different problem than the ones faced by the pure stream processing crowd. If my stuff relies on tables that can grow and shrink, surely it's going to be either just as bad at random access (if I use a linked list for my rows), or horrible for insert or delete (if I use dynamically sized arrays for my structs of arrays). Well, as I've learned from Mike Acton, you don't have to do stuff the way everyone expects.
Most of the different uses of rows fall into one of a few categories, and solving the performance of each of them can be thought of as a separate task. As far as I can tell, there are only a few types of table. I'm going to try to explain what I might do to stop things from getting bad.
There is the very simple type of table that just needs to have deletions and insertions, ignoring order, but maintaining its current-content state during a pass of the processing. By that I mean that you can guarantee that an iterator gets to see all the rows that existed at the beginning of the iteration run. Deletions and insertions are guaranteed not to affect the iterator. I think this is just a case of following Mike's lead. I'd use a similar system, but when the deletion scan passes through, rather than collapse the table back, try to put a new insertion in the gap first. That should save a few memory ops. Extending the table in case of an out of capacity exception would probably be simplest to implement as an extension mechanism (akin to that of many good dynamic pool allocators), the actual iterators will need to bear this in mind, but it may be quite a small price to pay for a no-brainer access model. Always remember, we're after maximal throughput of data, not avoiding the occasional spike.
Implicit entity tables, in some circumstances, might need to be sorted. One example would be if they need to be merged/joined for connecting sensory input into the decision tables for changing which state table their FSM exists in. A sorted table would be better than an unsorted one, however, most of the time, this could be solved by having a dedicated sorted view into the general container for the tables. If the table really does need to be sorted, then initially I'd suggest a batch based representation. Put batches of N in some self balancing tree, and whenever you need to delete an item, just mark it as deleted, so skipping is simple, and when you need to insert mid sequence, split the batch into two batches and insert into the now obvious space in the middle. If you have the safety net of an insertion and delete sync point, you can optimise this really well. By the way, I would choose a tree representation so to maintain the power of the data set being sorted. The size of N is important, but I don't think it will be easy to guess at. I think you would nee to tune per use, as access patterns and the size of the data will change the optimal value for N.
Obviously, if you're doing this, you're not going to be thrashing memory, as it's all simply poolable. This is cool as one of the problems with highly active databases in the past was disk thrashing for high throughput systems.