Tuesday, June 18, 2013

The problem with nouns.


When you name things, you nail them down. You attach meaning and you introduce some rules about what they are, how they are related to other things around them. When you give them a precise name, you exclude them from discussion where they might have been heard. When you give them a vague name, they lose their power, they become part of the mob, become only one part of the larger picture, but not able to speak up for their individual essence.

When you name things, you give them a context. A context gives a thing meaning. A dog has more context than an animal. A Doberman has even more context, and when you think of a Doberman you have some expected attributes. But all three descriptions are valid. Interesting though is how many attributes are lost when specialising. How likely would "edible" be considered as an attribute of Doberman in comparison to animal? This expectation colours our thinking, starts to hide things from us when trying to reason about how things interact. These specialised contexts are the trick behind many lateral thinking puzzles. Things that are attributes of the things mentioned, but excluded because if we did not inhibit them, we would have an overabundance of information.

Meaning and context also link things together. A dog has a hair colour, which seems a good idea to add as an attribute when you're designing a form to hold information about some specific dogs. You don't find leaf shape, or engine size on a dog information form. As your information structure gets larger, you  might add less and less frequently known or requested information, such as time since been to vet, or preferred food. As you add to this the signal to noise ratio drops.

The signal to noise ratio directly affects how effectively you are using your memory. In the simplest case, you are wasting memory for all the members of a structure that point to null or contain a default value. In the more complex case, you are ensuring that the members are likely to be organised by some design meaning rather than how they are used by the methods that read them. This latter problem is the one that causes cache thrashing and coupling in the code between the functions and the containers of the data.

The pattern of access of the data directly affects how efficiently you are able to process the data. If there is only one process that will ever be done to the data, then the data should be formed such that it is for that operation. If the data is for many functions. Then ya data needs to find a best fit schema for all the operations with bias towards the most frequent.

Naming data into collections will often stand in the way of this as data may expect to belong in a structure that bears a resemblance. This context link means nothing to the computer unless the context is part of how the data processed.

Unname your data. Decontextify your schemas into transform oriented collections.

No comments: