Saturday, December 04, 2010

Data is more important than code.

In summary, Dino Dini's post on data-oriented-development has me worried that even old school developers aren't aware of the problems of putting code ahead of data. On the one hand he's quite fluent in hardware appreciation, and on the other he's thinking that it's a good idea to cache a length of a vector. I've not yet seen a performance oriented reason to do that on modern machines, but I have done it recently on a very small processor. This might be the reason he's leaning against the idea of data oriented development. I don't know, but I can only assume he hasn't had to do too much highly cache sensitive work, or much work on in-order processors.

He mentions that both John Carmack and Mike Acton are trying to promulgate the destruction of abstraction through Data Oriented Design, and whether or not he is right about those particular cases, the fact remains that abstraction is not the enemy of data-oriented-development. The real enemey of data-oriented-development is data-driven-control-flow development, also known as object-oriented-development.

Dino Dini is right in that the DOD approach is as old as the hills, but it's not the old as the hills that he mentions being used to, not the days of the Spectrum or Amiga or ST or Megadrive where the CPU was strong, but not far outclassing the memory it worked with, and saving instructions would save you cycles pretty much no matter how you saved it. No, it's a lot older than that. Data oriented development has it's strongest roots in a time when the memory bandwidth to cpu power had the same gap as it has now, namely when the memory in question was tape storage on giant cabinets, and the cpus has local memory (similar in scope to the cache of modern architectures). We had a blip of OO friendly time in the 90s when memory was getting big enough to hold all our working data AND our CPUs could get hold of it fast enough to work on it. But either side of that blip we've been either trying to read stuff out of slow ram into fast cache to use on our super CPUs, or we've been loading off slow disk or tape into our tiny rams to use on our simlarly speedy CPUs.

So what did we do back then? Back when data was on petrifyingly slow media such as megnetic tape or worse, punch cards? We processed things as steams. Which, at the heart of it, is what DOD is all about. No more random access to code by way of reading some data. No more requesting random data or dereferencing multiple times just to get to the one thing you want to work on.

I'm having to cut this post short, but I hope that's at least some insight into why many advocates of DOD might sound mad, but actually, they're just trying to make us think.

5 comments:

Vince said...

DoD and OOP are not trying to solve the same problem. OOP is about software complexity. DoD is about software performance.

This is where the analogy to the days of tape drives and drum memory fails - programs back then were tiny and simple compared to those of today. DoD will help you design stream based *systems* within a large engine. It does not offer much about designing an entire engine/game architecture and how all these systems should fit together.

That's why I think most of this OOP vs DoD discussion is a false choice. The DoD critiques of OOP focus on the micro level (virtual functions in inner loops/over abstraction within systems) and ignore the fact that we have very large code bases to manage, and need some way, whether structured or OOP, to organize them at the macro level.

As an example - D3D provides a very useful OOP abstraction on top of a GPU that is completely stream-oriented. If you prefer structured abstractions, you can go with OpenGL. Even the consoles offer a very thin layer similar to D3D/OpenGL (very,very thin in the case of the PS3).

The choice of OOP and DoD is not mutually exclusive at that layer in the system - OOP/structured does what it does best (abstract away complicated HW details), and DoD does what it does best (process large amounts of data fast).

Richard Fabian said...

OOP is not abstraction. But, OOP does abstract away complicated hardware details, but it also creates the necessity to abstract (via consistency or interfacing) in many cases where it is not necessary.
But again, my point is not that you can't use Abstraction with DOD, you can. It's that DOD is fighting OOP, which is dangerous simply because it makes abstractions cost more than necessary.

Vince said...

OOP is a methodology for abstraction. So is structured programming. You can find good or bad examples of both. In particular, finding the right level of abstraction in C++ can be very difficult. It can also be very difficult in C.

You are focusing on OOP at the micro level and not how software is organized at the macro level.

DoD is not an organizing principle for software as a whole, and doesn't say anything about managing software complexity, which is still a very real problem.

It is not an alternative to OOP or structured programming, it is a critique of how many current programs are written at the systems level and a recipe for writing systems which map to the hardware well. It doesn't say anything about how those systems should fit together, how a design should be layered, or how an architecture should be organized.

They are apples and oranges.

Richard Fabian said...

"OOP is a methodology for abstraction. So is structured programming"

OOP is a subset of structured programming, but I think I know what you mean. More importantly, OOP is a method by which you allow the problem domain to be represented more clearly in code. The difference between OOP and DOP mostly appears to be that the problem is defined in the code in DOP, not the domain.

"You are focusing on OOP at the micro level and not how software is organized at the macro level."

It would be hard for me to focus on OOP at the macro level as it doesn't exist at the macro level in most games. Every time i've seen an attempt at OO on the macro scale it's been encumbered by a multitude of hacks to let objects communicate because the real world boundaries are not good separating lines for the code. I consider OOP to be harmful in systems that have side effects.

"DoD is not an organizing principle for software as a whole, and doesn't say anything about managing software complexity, which is still a very real problem."

I don't know why this is apparent to you, as organising data as the centre of attention automatically seems to reduce practical complexity in DOD projects. I guess your mileage varied.

sting said...

Vince, DoD does not prevent abstraction, just like with OOP you don't need to systematically apply abstraction at the granular object/method level.

What you really need to abstract is function (as in functionality), not object operations or object communication. It can be unknown to me how a subsystem skins an array of vertices that I pass it, the details are internal to it, yet this doesn't prevent me from knowing what various subsystems that vertex array goes through and how to accordingly structure it so that data access is ideal.