Brainstorm 7: How long is now?

I worry too much. I live too far into the future; always so acutely aware of the potential distant knock-on effects of my actions that I’m sometimes quite paralyzed. On the downside this can be a real handicap, but on the upside it means I’m intelligent, because seeing into the future is what intelligence is for. But how? And how do we differentiate between past, present and future? What do we really mean by “now”?

My main thesis for this project is that the brain is a prediction machine. In other words I think it takes so long for nerve signals to reach the brain and be analyzed by it (you may be surprised to know it takes about a tenth of a second merely for signals to reach the primary visual cortex from the retina, never mind be turned into understanding), that we’d be dead by now if it weren’t for our ability to create a simulation of the world and run it ahead of time, so that we are ready for what’s about to happen instead of always reacting to what has already happened. I’m suggesting that this simulation ability derives, at least in part, from a capacity to make small predictions based on experience, at ALL levels of the nervous system. These little fragments of “if this happens then I suspect that will happen next” are there to counter processing delays and reaction times, and give us the ability to anticipate. But they also (I suggest) provide the building blocks for other, more interesting things: a) our ability to create a contextual understanding of the world – a stable sense of what is happening; b) our ability to form plans, by assembling sequences of little predictions that will get us from a starting state to a goal state; and c) our capacity for imagination, by allowing us to link up sequences of cause and effect in an open-ended way. The capacity for imagination, in turn, is what allows us to be creative and provides the virtual world in which consciousness arises and free thought (unconstrained by external events) can occur.

I rather think some clever tricks are involved, most especially the ability to form analog models of reality, as opposed to simple chains of IF/THEN statements, and the ability to generalize from one set of experiences to similar ones that have never been experienced (even to the extent that we can use analogies and metaphors to help us reason about things we don’t directly understand). But I’d say that the root of the mechanism lies in simple statistical associations between what is happening now and what usually happens next.

So let’s look at a wiring diagram for a very simple predictive machine.

This is the simple touch-sensitive creature I talked about in Brainstorm 6. The blue neurons receive inputs, from touch-sensitive nerve endings, which occurred some milliseconds ago on its skin. The red neuron shows two touch inputs being compared (in this case the cell has become tuned to fire if the right input is present just before the left input). I think we can call the red neuron an abstraction: it takes two concrete “I am being touched” inputs and creates an abstract fact – “I am being stroked leftwards here”. This abstraction then becomes an input for higher-level abstractions and so on.

The green neuron is then a prediction cell. It is saying, “if I’m being stroked leftwards at this point, then I expect to be touched here next.” Other predictions may be more conditional, requiring two or more abstractions, but in this case one abstraction is enough. The strength of the cell’s response is a measure of how likely it is that this will happen. The more often the prediction cell is firing at the moment the leftmost touch sensor is triggered, the stronger the connection will become, and the more often that this fails to happen, the weaker it will become (neurologically I’d hypothesize that this occurs due to LTP and LTD (long-term potentiation and long-term depression) in glutamate receptors, giving it an interesting nonlinear relationship to time).

So what do we DO with this prediction? I’m guessing that one consequence is surprise. If the touch sensor fires when the prediction wasn’t present, or the prediction occurs and nothing touches that sensor, then the creature needs a little jolt of surprise (purple neuron). Surprise should draw the creature’s attention to that spot, and alert it that something unexpected is happening. It may not be terribly surprising that a particular touch sensor fails to fire, but the cumulative effect of many unfulfilled predictions tells the creature that something needs to be worried about, at some level. On the other hand, if everything’s going according to expectations then no action need be taken and the creature can even remain oblivious.

But for the rest of my hypothesis to make sense, the prediction also needs to chain with other predictions. We need this to be possible so that top-down influences (not shown on the diagram) can assemble plans and daydreams, and see far into the future. But I believe there has to be an evolutionary imperative that predates this advanced capacity, and I’d guess that this is the need to see if a trend leads ultimately to pain or pleasure (or other changes in drives). Are we being stroked in such a way that it’s going to hurt when the stimulus reaches a tender spot? Or is the moving stimulus a hint that some food is on its way towards our mouth, which we need to start opening?

Now here comes my problem (or so I thought): In the diagram I’m assuming that the prediction gets mixed with the sensory signal (the green axon leading into the blue cell) so that predictions act like sensations. This way, the organism will react as if the prediction came true, leading to another prediction, and another. Eventually one of these predictions will predict pleasure or pain.

[Technical note: Connectionists wouldn’t think this way. They’d assume that pleasure/pain are back-propagated during learning, such that this first prediction neuron already “knows” how much pleasure or pain is likely to result further down the line, since this fact is stored in its synaptic weight(s). I’m not happy with this. For one thing, thinking is never going to arise in such a system, because it’s entirely reactive. Secondly (and this is perhaps why brains DO think), the reward value for this prediction is likely to be highly conditional upon other active predictions. This isn’t obvious in such a simple model, but in a complete organism the amount of pleasure/pain that ultimately results may depend very heavily on what else is going on. It may depend on the nature of the touch, or have its meaning changed radically by the context the creature is in (is it being threatened or is something having sex with it?). It’s therefore not possible to apportion a fixed estimate of reward by back-propagating it through the network. That sort of thing works up to a point in an abstract pattern-recognition network like a three-layer perceptron, but not in a real creature. In my humble opinion, anyway!]

Oh yes, my problem: So, if a prediction acts as if it were a sensation (and this is the only way it can make use of the subsequent (red) abstraction cells in order to make further predictions) then how does the organism know the difference between what is happening and what it merely suspects will happen??? If all these predictions are chained together, the creature will feel as if everything that might happen next already is happening.

This has bugged me for the past few days. But this morning I came to a somewhat counter-intuitive conclusion, which is that it really doesn’t matter.

What does “now” actually mean? We think of it as the infinitesimal boundary between past and future; between things that are as yet unknown and our memories. But now is not infinitesimal. I realized this in the shower. I was looking at the droplets of water spraying from the shower-head and realized that I can see them. This perhaps won’t surprise you, but it did me, because I’ve become so conditioned now to the view that the world I’m aware of is actually a predictive simulation of reality, not reality itself. This HAS to be true (although now is not the time to discuss it). And yet here I was, looking at actual reality. I wasn’t inventing these water droplets and I couldn’t predict their individual occurrence. Nor was the information merely being used to synchronize my model and keep my predictions in line with how things have actually turned out – I was consciously aware of each individual water droplet.

But I was looking at water that actually came out of my shower-head over a tenth of a second ago; maybe far longer. By the time the signals had caused retinal ganglion cells to fire, zoomed down my optic nerve, chuntered through my optic chiasm and lateral geniculate nucleus, and made their tortuous and mysterious way through my cortex, right up to the level of conscious awareness, those droplets were long gone. So I was aware of the past and only believed I was aware of the present. (In fact, just to make it more complex, I think I was aware of several pasts – the moment at which I “saw” the droplets was different from the moment that I knew that I’d seen the droplets.)

Yet at the same time, I was demonstrably aware of an anticipated present, based upon equally retarded but easier to extrapolate facts. I wasn’t simply responding to things that happened a large fraction of a second ago. If a fish had jumped out of the shower-head I’d certainly have been surprised and it would have taken me a while to get to grips with events, but for the most part I was “on top of the situation” and able to react to things as they were actually happening, even though I wouldn’t find out about them until a moment later. I was even starting bodily actions in anticipation of future events. If the soap had started to slip I’d have begun moving so that I could catch it where it was about to be, not where it was when I saw it fall. But for the most part my anticipations exactly canceled out my processing delays, so that, as far as I knew, I was living in the moment.

So I was simultaneously aware of events that happened a fraction of a second ago, as if they were happening now; events that I believed were happening now, even though I wouldn’t get confirmation of them for another fraction of a second; and events that hadn’t even happened yet (positioning my hands to catch the soap in a place it hadn’t even reached). ALL of these were happening at once, according to my brain; they all seemed like “now”.

Perhaps, therefore, these little predictive circuits really do act as if they are sensations. Perhaps the initial sensation is weak, and the predictions (if they are confident) build up to create a wave of activity whose peak is over a touch neuron that won’t actually get touched until some time in the future. Beyond a certain distance, the innate uncertainty or conditionality of each prediction would prevent the wave from extending indefinitely. Perhaps this blurred “sensation” is what we’re actually aware of. Perhaps for touch there’s an optimum distance and spread. In general, the peak of the wave should lie over the piece of skin that will probably get touched X milliseconds into the future, where X is the time it takes for an actual sensation to reach awareness or trigger an appropriate response. But it means the creature’s sense of “now” is smeared. Some information exists before the event; some reaches awareness at the very moment it is (probably) actually happening; the news that it DID actually happen arrives some time later. All of this is “now.”

Or perhaps not. After all, if I imagine something happening in my mind, it happens more or less in real time, as a narrative. I don’t see the ghosts of past, present and future superimposed. This, though, may be due to the high-level selection process that is piecing together the narrative. Perhaps the building blocks can only see a certain distance into the future. Primitive building blocks, like primary sensations, only predict a few milliseconds. Highly abstract building blocks, like “we’re in a bar; someone is offering me a drink” predict much further into the future, but only in a vague way. To “act out” what actually happens, these abstractions need to assemble chains of more primitive predictions to fill in the details, and so the brain always has to wait and see what happens in its own story, before initiating the next step. I’m not at all sure about this, but I can’t see any other way to assemble a complex, arbitrarily detailed, visual and auditory narrative inside one’s head without utilizing memories of how one thing leads to another at a wide range of abstractions. These memories have to have uses beyond conscious, deliberate thought, and so must be wired into the very process of perception. And in order for them to be chained together, the predicted outcomes need to behave as if they were stimuli.

I’m going to muse on this some more yet. For instance I have a hunch that attention plays a part in how far a chain of predictions can proceed (while prediction in turn drives attention), and I haven’t even begun to think about precisely how these simulations can be taken offline for use as plans or speculations, or precisely how this set-up maps onto motor actions (in which I believe intentions are seen as a kind of prediction). But this general architecture of abstractions and predictions is beginning to look like it might form the basis for my artificial brain. Of course there’s an awful lot of twiddly bits to add, but this seems like it might be a rough starting point from which to start painting in some details, and I have to start somewhere. Preferably soon.

Brainstorm 6: All change

In my last Brainstorming session I was musing on associations and asked myself what is being associated with what, that enables a brain to make a prediction (and hence perform simulations). A present state is clearly being associated with the state that tends to follow it, but what does that mean? It’s obvious for some forms of information but a lot less obvious for others and for the general case. Learning that one ten-million-dimension vector tends to follow another is neither practical nor intelligent – it doesn’t permit generalization, which is essential. Something more compact and meaningful is happening.

If the brain is to be able to imagine things, there must be a comprehensive simulation mechanism, capable of predicting the future state in any arbitrary scenario (as long as it’s sufficiently familiar). If I imagine a coffee cup in my hand and then tilt my imaginary hand, the cup falls. I can even get a fair simulation of how it will break when it hits the floor. If I imagine myself talking to someone, we can have a complete conversation that matches the kinds of thing this person might say in reality – I have a comprehensive simulation of their own mind inside mine. It’s comparatively easy to see how a brain might predict the future position of a moving stimulus on the retina, but a lot less obvious how this more general kind of simulation works. Coffee cups don’t have information about how they fall built into their properties, nor do they fall on a whim. Somehow it’s the entirety of the situation that matters – the interaction of cup and hand – and knowledge of falling objects in general (as well as the physical properties of pottery) somehow gets transferred automatically into the simulation as needed.

Pierre-Simon Laplace once said: “An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed … the future just like the past would be present before its eyes.” In other words, if you know the current state of the universe precisely then you can work out its state at any time in the future. He wasn’t entirely right, as it happens – if Laplace was himself that intellect, then he would also be part of the universe, and so the act of gathering the data would change some of the data he needed to gather. He could never have perfect knowledge. And we know now that the most infinitesimal inaccuracy will magnify very rapidly until the prediction is out of whack with reality. But even so, in practical terms determinism works. If our artificial brain knew everything it was capable of knowing about the state of its region of the universe (in other words, the value of a ten-million-dimensional vector) then it would have enough knowledge to make a fair stab at the value of this vector a short while later. If that weren’t true, intelligence wouldn’t be possible.

But Laplace had a very good point when he mentioned “all forces that set nature in motion.” It’s not just the state of the world that matters, but the rate and direction of change. It’s an interesting philosophical question, how an object can embody a rate of change at an instant in time (discuss!). It has a momentum, but that’s dodging the issue. Nevertheless, change is all-important, and real brains are far more interested in change than they are in static states. In fact they’re more-or-less blind to things that don’t change – quite literally. If you can hold your eyes perfectly still when focusing on a fixed point, you’ll go temporarily blind in a matter of seconds! Try it – it’s not easy but it can be done with practice and it’s quite startling.

Getting preoccupied with recognizing objects, etc. fails to help me with this question of prediction, and vision is misleading because it’s essentially a movement-detection system that has been heavily modified by evolution to make it possible to establish facts about things that aren’t moving. The static world is essentially transformed into a moving one (e.g. through microsaccades) before being analyzed in ways we don’t understand and may never be able to, unless we understand how change and prediction are handled more generally. So how about our tactile sense? Maybe that’s a good model to think about for a while?

Ok, I’ll start with a very simple creature – a straight line, with touch sensors along its surface. If I touch this creature with my finger one of the sensors will be triggered (because its input has changed), but will soon become silent again as the nerve ending habituates. At this point the creature can make a prediction, but not a very useful one: my finger might move left or it might move right. It can’t tell which at first, but if my finger starts to move left, it can immediately predict where it’s going to go next. It’s easy to imagine a neuron connected to a pair of adjacent sensors, which will fire when one sensor is triggered before the other.

Eureka! We have a prediction neuron – it knows that the third sensor in the line is likely to be triggered shortly. In fact we can imagine a whole host of these neurons, tuned to different delays and hence sensitive to speed. Each one can make a prediction about which other sensors are likely to be touched within a given period. We can imagine each neuron feeding some kind of information back down to the sensor that it is predicting will be touched. The neurons have a memory of the past, which they can compare to the present in order to establish future trends. The more abstract this memory, the more we can describe it as forming our present context. Context is all-important. If you’ve ever woken from a general anesthetic, you’ll know that it takes a while to re-establish a context – who you are, where you are, how you got there – and until you have this you can’t figure out what’s likely to happen next.

So far, so good. We have a reciprocal connection of the kind that seems to be universal in the brain. We can imagine a further layer of neurons that listen to these simpler neurons and develop a more general sense of the direction and speed of movement, which is less dependent on the actual location of the stimulus. By the time we get a few layers deep, we have cells that can tell us if the stroking of my finger is deviating from a straight line (well, we could if my simplified creature wasn’t one-dimensional!).

But what’s the point of feeding back this information to the sensory neurons themselves? The first layer of cells is telling specific sensory neurons to expect to be touched in a few milliseconds. Big deal – they’ll soon find out anyway. Nevertheless, two valuable pieces of information come out of this prediction:

Firstly, if a sensory neuron is told to expect a touch and it doesn’t arrive, we want our creature to be surprised. Things that just behave according to expectations can usually be safely ignored, and we only want to be alerted to things that don’t do what we were expecting. Surprise gives us a little shock – it causes a bunch of physiological responses. We may get a little burst of adrenaline, to prepare us in case we need to act, and our other sensory systems get alerted to pay more attention to the source of the unexpected change (this is called an “orienting response”). Neurons higher up in the system are thus primed and able to make decisions about what, if anything, to do about this unexpected turn of events. The shock will ripple up the system until something finally knows what to do about that sort of thing. Most of the time this will be an unconscious response (like when we flick an insect off our arm) but sometimes nothing will know how to deal with this, and consciousness needs to get in on the act.

Secondly, once we have a hunch about where the stimulus is going to show up next, we can start to look further ahead to where it is likely to be heading. The more often our low-level predictions are confirmed, the more confident we can be, and the more time we’ve had in which to make this ripple of predictive activity travel ahead of the stimulus, to figure out what might happen in a few moments’ time. Perhaps my finger is stroking along the creature towards a tender spot that will hurt it; perhaps it’s moving in the other direction, towards the creature’s mouth, where it has a hope of eating my finger. Pain or pleasure get predicted, and behavior results whenever one or the other seems likely.

We have to presume that all of this stuff wires itself up through experience – by association. The first layer of sensory neurons learns when the sensor it is associated with is about to be touched, by understanding statistical relationships between the states of neighboring sensors. These first-level neurons presumably cooperate and compete with each other to ensure that each one develops a unique tuning and all possible circumstances get represented (this is exactly homologous, IMHO, to what happens in primary visual cortex, with edge-orientation/motion-sensitive neurons). The higher layers, which make longer-term predictions, learn to associate certain patterns of movement with pain or pleasure. The most abstract layers are presumably capable of learning that certain responses maximize pleasure or minimize pain.

Leaving aside the question of how these responses get coordinated, we now have a complete behavioral mechanism. And it’s NOT a stimulus-response system. The behavior is being triggered by predictions of what is about to happen, not what has just happened (this is a moot point and you may object that the system is still responding to the past stimuli, but I think an essential threshold has been crossed here and it’s fair to call this an anticipatory mechanism).

It’s clear that somehow the prediction needs to be compared to reality, and surprise should be generated if they don’t match, and it’s clear that predictions need to be able to associate themselves with reward. Somehow predictions also need to take part in servo action – actions are goal-directed, and hence are themselves predictions of a future state. Comparing what your sensors predict is going to happen, to what you intend to happen, is what allows you to make anticipatory changes and bring reality into line with your intentions. I need to think about that a bit, though.

But what about the ability to use this predictive mechanism to imagine possible futures? We presumably now have the facility to imagine a high-level construct, such as “let’s suppose I’m feeling someone stroke my skin” and actually feel the stroke occurring, as these higher-level neurons pass down their predictions to lower levels at which individual touch sensors are told to expect/pretend they’ve been stimulated. Although obviously this time we shouldn’t be surprised when nothing happens! The surprise response needs to be suppressed, and somehow the predictions ought to stand in for the sensations. That has implications for the wiring and all sorts of questions remain unresolved here.

It’s much harder, though, to see how we can assemble an entire context in our heads – the hand and the coffee cup, say. Coffee cups only fall when hands drop them. Dropping something only occurs when a hand is placed at a certain set of angles. A motor action is associated with a visual change, but only in a particular class of contexts, and the actual visual change is also highly context-dependent: If a cup was in your hand, that’s what you’ll see fall. Remarkably, if you imagine holding a little gnome in your hand instead, what you’ll see is a falling gnome, not a falling cup, even if you’ve never actually dropped a minuscule fantasy creature before in your life! In fact your imaginary gnome may even surprise you by leaping to safety! Somehow the properties of objects are able to interact in a highly generalizable way, and these interactions can trigger mental imagery, which eventually trickles down to the actual sensory system as if they’d really occurred (there are several lines of evidence to suggest that when we imagine something we “see” it using the same parts of our visual system that would be active if we’d really seen it).

Somehow the brain encodes cause and effect, at many levels, in a generalizable way. Complex chains of inference occur when we mentally decide to rotate our hand and see what happens to the thing it was holding, and the ability to make these inferences must arise from statistical learning that is designed to predict future states from past ones.

And somehow I have to come up with just such a general scheme, but at a level of abstraction suitable for a game. My creatures are not going to be covered in touch sensors or see the world in terms of moving colored pixels. It’s a shame really, because I understand these things at the low level – it’s the high level that still eludes me…

P.S. This post got auto-linked to a post on the question of why we can’t tickle ourselves (I’m assuming you’re not schizophrenic here, or you won’t know what I’m talking about, because you can!). We can’t tickle ourselves because our brain knows the difference between things we do and things that get done to us (self/non-self determination). If we try to tickle ourselves, we predict there will be a certain sensation and this prediction is used to cancel out the actual sensation. It’s pretty important for an organism to differentiate between things it does to the world and things the world does to it (bumping into something feels the same as being bumped into, but the appropriate responses are different). So here’s another pathway that requires anticipation, and another example of the brain as a simulation engine.

Brainstorm 5: joining up the dots

I promised myself I’d blog about my thoughts, even if I don’t really have any and keep going round in circles. Partly I just want to document the creative process honestly – so this includes the inevitable days when things aren’t coming together – and partly it helps me if I try to explain things to people. So permit me to ramble incoherently for a while.

I’m trying to think about associations. In one sense the stuff I’ve already talked about is associative: a line segment is an association between a certain set of pixels. A cortical map that recognizes faces probably does so by associating facial features and their relative positions. I’m assuming that each of these things is then denoted by a specific point in space on the real estate of the brain – oriented lines in V1 and faces in the FFA. In both these cases there are several features at one level, which are associated and brought together at a higher level. A bunch of dots maketh one line. Two dark blobs and a line in the right arrangement maketh a face. A common assumption (which may not be true) is that neurons do this explicitly: the dendritic field of a visual neuron might synapse onto a particular pattern of LGN fibres carrying retinal pixel data. When this pattern of pixels becomes active, the neuron fires. That specific neuron – that point on the self-organizing map – therefore means “I can see a line at 45 degrees in this part of the visual field.”

But the brain also supports many other kinds of associative link. Seeing a fir tree makes me think of Christmas, for instance. So does smelling cooked turkey. Is there a neuron that represents Christmas, which synapses onto neurons representing fir trees and turkeys? Perhaps, perhaps not. There isn’t an obvious shift in levels of representation here.

Not only do turkeys make me think of Christmas, but Christmas makes me think of turkeys. That implies a bidirectional link. Such a thing may actually be a general feature, despite the unidirectional implication of the “line-detector neuron” hypothesis. If I imagine a line at 45 degrees, this isn’t just an abstract concept or symbol in my mind. I can actually see the line. I can trace it with my finger. If I imagine a fir tree I can see that too. So in all likelihood, the entire abstraction process is bidirectional and thus features can be reconstructed top-down, as well as percepts being constructed/recognized bottom-up.

But even so, loose associations like “red reminds me of danger” don’t sound like the same sort of association as “these dots form a line”. A line has a name – it’s a 45-degree line at position x,y – but what would you call the concept that red reminds me of danger? It’s just an association, not a thing. There’s no higher-level concept for which “red” and “danger” are its characteristic features. It’s just a nameless fact.

How about a melody? I know hundreds of tunes, and the interesting thing is, they’re all made from the same set of notes. The features aren’t what define a melody, it’s the temporal sequence of those features; how they’re associated through time. Certainly we can’t imagine there being a neuron that represents “Auld Lang Syne”, whose dendrites synapse onto our auditory cortex’s representations of the different pitches that are contained in the tune. The melody is a set of associations with a distinct sequence and a set of time intervals. If someone starts playing the tune and then stops in the middle I’ll be troubled, because I’m anticipating the next note and it fails to arrive. Come to that, there’s a piano piece by Rick Wakeman that ends in a glissando, and Wakeman doesn’t quite hit the last note. It drives me nuts, and yet how do I even know there should be another note? I’m inferring it from the structure. Interestingly, someone could play a phrase from the middle of “Auld Lang Syne” and I’d still be able to recognize it. Perhaps the tune is represented by many overlapping short pitch sequences? But if so, then this cluster of representations is collectively associated with its title and acts as a unified whole.

Thinking about anticipating the next note in a tune reminds me of my primary goal: a representation that’s capable of simulating the world by assembling predictions. State A usually leads to state B, so if I imagine state A, state B will come to mind next and I’ll have a sense of personal narrative. I’ll be able to plan, speculate, tell myself stories, relive a past event, relive it as if I’d said something wittier at the time, etc. Predictions are a kind of association too, but between what? A moving 45-degree line at one spot on the retina tends to lead to the sensation of a 45-degree line at another spot, shortly afterwards. That’s a predictive association and it’s easy to imagine how such a thing can become encoded in the brain. But Turkeys don’t lead to Christmas. More general predictions arise out of situations, not objects. If you see a turkey and a butcher, and catch a glint in the butcher’s eye, then you can probably make a prediction, but what are the rules that are encoded here? What kind of representation are we dealing with?

“Going to the dentist hurts” is another kind of association. “I love that woman” is of a similar kind. These are affective associations and all the evidence shows that they’re very important, not only for the formation of memories (which form more quickly and thoroughly when there’s some emotional content), but also for the creation of goal-directed behavior. We tend to seek pleasure and avoid pain (and by the time we’re grown up, most of us can even withstand a little pain in the expectation of a future reward).

A plan is the predictive association of events and situations, leading from a known starting point to a desired goal, taking into account the reward and punishment (as defined by affective associations) along the route. So now we have two kinds of association that interact!

To some extent I can see that the meaning of an associative link is determined by what kind of thing it is linking. The links themselves may not be qualitatively different – it’s just the context. Affective associations link memories (often episodic ones) with the emotional centers of the brain (e.g. the amygdala). Objects can be linked to actions (a hammer is associated with a particular arm movement). Situations predict consequences. Cognitive maps link objects with their locations. Linguistic areas link objects, actions and emotions with nouns, verbs and adjectives/adverbs. But there do seem to be some questions about the nature of these links and to what extent they differ in terms of circuitry.

Then there’s the question of temporary associations. And deliberate associations. Remembering where I left my car keys is not the same as recording the fact that divorce is unpleasant. The latter is a semantic memory and the former is episodic, or at least declarative. Tomorrow I’ll put my car keys down somewhere else, and that will form a new association. The old one may still be there, in some vague sense, and I may one day develop a sense of where I usually leave my keys, but in general these associations are transient (and all too easily forgotten).

Binding is a form of temporary association. That ball is green; there’s a person to my right; the cup is on the table.

And attention is closely connected with the formation or heightening of associations. For instance, in Creatures I had a concept called “IT”. “IT” was the object currently being attended to, so if a norn shifted its attention, “IT” would change, and if the norn decided to “pick IT up”, the verb knew which noun to apply to. In a more sophisticated artificial brain, this idea has to be more comprehensive. We may need two or more ITs, to form the subject and object of an action. We need to remember where IT is, in various coordinate frames, so that we can reach out and grab IT or look towards IT or run away from IT. We need to know how big IT is, what color IT is, who IT belongs to, etc. These are all associations.

Perhaps there are large-scale functional associations, too. In other words, data from one space can be associated with another space temporarily to perform some function. What came to mind that made me think of this is the possibility that we have specialized cortical machinery for rotating images, perhaps developed for a specific purpose, and yet I can choose, any time I like, to rotate an image of a car, or a cat, or my apartment. If I imagine my apartment from above, I’m using some kind of machinery to manipulate a particular set of data points (after all, I’ve never seen my apartment from above, so this isn’t memory). Now I’m imagining my own body from above – I surely can’t have another machine for rotating bodies, so somehow I’m routing information about the layout of my apartment or the shape of my body through to a piece of machinery (which, incidentally, is likely to be cortical and hence will have self-organized using the same rules that created the representation of my apartment and the ability to type these words). Routing signals from one place to another is another kind of association.

Language is interesting (I realize that’s a bit of an understatement!). I don’t believe the Chomskyan idea that grammar is hard-wired into the brain. I think that’s missing the point. I prefer the perspective that the brain is wired to think, and grammar is a reflection of how the brain thinks. [noun][verb][noun] seems to be a fundamental component of thought. “Janet likes John.” “John is a boy.” “John pokes Janet with a stick.” Objects are associated with each other via actions, and both the objects and actions can be modulated (linguistically, adverbs modulate actions; adjectives modify or specify objects). At some level all thought has this structure, and language just reflects that (and allows us to transfer thoughts from one brain to another). But the level at which this happens can be very far removed from that of discrete symbols and simple associations. Many predictions can be couched in linguistic terms: IF [he] [is threatening] [me] AND [I][run away from][him] THEN [I][will be][safe]. IF [I][am approaching][an obstacle]AND NOT ([I][turn]) THEN [I][hurt]. But other predictions are much more fluid and continuous: In my head I’m imagining water flowing over a waterfall, turning a waterwheel, which turns a shaft, which grinds flour between two millstones. I can see this happening – it’s not just a symbolic statement. I can feel the forces; I can hear the sound; I can imagine what will happen if the water flow gets too strong and the shaft snaps. Symbolic representations and simple linear associations won’t cut it to encode such predictive power. I have a real model of the laws of physics in my head, and can apply it to objects I’ve never even seen before, then imagine consequences that are accurate, visual and dynamic. So at one level, grammar is a good model for many kinds of association, including predictive associations, but at another it’s not. Are these the same processes – the same basic mechanism – just operating at different levels of abstraction, or are they different mechanisms?

These predictions are conditional. In the linguistic examples above, there’s always an IF and a set of conditionals. In the more fluid example of the imaginary waterfall, there are mathematical functions being expressed, and since a function has dependent variables, this is a conditional concept too. High-level motor actions are also conditional: walking consists of a sequence of associations between primitive actions, modulated by feedback and linked by conditional constructs such as “do until” or “do while”.

So, associations can be formed and broken, switched on and off, made dependent on other associations, apply specifically or broadly, embody sequence and timing and probability, form categories and hierarchies or link things without implying a unifying concept. They can implement rules and laws as well as facts. They may or may not be commutative. They can be manipulated top-down or formed bottom-up… SOMEHOW all this needs to be incorporated into a coherent scheme. I don’t need to understand how the entire human brain works – I’m just trying to create a highly simplified animal-like brain for a computer game. But brains do some impressive things (nine-tenths of which most AI researchers and philosophers forget about when they’re coming up with new theories). I need to find a representation and a set of mechanisms for defining associations that have many of these properties, so that my creatures can imagine possible futures, plan their day, get from A to B and generalize from past experiences. So far I don’t have any great ideas for a coherent and elegant scheme, but at least I have a list of requirements, now.

I think the next thing to do is think more about the kinds of representation I need – how best to represent and compute things like where the creature is in space, what kind of situation it is in, what the properties of objects are, how actions are performed. Even though I’d like most of this to emerge spontaneously, I should at least second-guess it to see what we might be dealing with. If I lay out a map of the perceptual and motor world, maybe the links between points on this map (representing the various kinds of associations) will start to make sense.

Or I could go for a run. Yes, I like that thought better.

Brainstorm 4 – squishing hyperspace

Ok, back to work. I wanted to expand on what I was saying about the cortex as a map of the state of the world, before I get onto the topic of associations.

Imagine the brain as a ten-million-dimensional hypercube. Got that?

Hmm, maybe I should backtrack a bit. Let’s suppose that the brain has a total of ten million sensory inputs and motor outputs (each one being a nerve fiber coming in from the skin, the retina, the ear, etc., or going out to a muscle or gland). For sake of argument (and I appreciate the dangers in this over-simplification), imagine that each nerve signal can have one of 16 amplitudes. Every single possible experience that a human being is capable of having is therefore representable as a point in a ten-million-dimensional graph, and since we have only 16 points per axis we need only 16 raised to the power of ten million points to represent everything that can happen to us (including all the things we could possibly do to the world, although we probably need to factor in another few quadrillion points to account for our internal thoughts and feelings).

(If you’re not used to this concept of phase space, imagine that the brain has only two inputs and one output. A three-dimensional graph would therefore be enough to represent every possible combination of those values: the value of input 1 is a distance along the X-axis, input 2 is along the Y-axis and the output value is along the Z-axis. Where these three lines meet is the point that represents this unique state. A change of state is represented by an arrow connecting two points. Everything that can happen to that simplified brain – every experience and thought and reaction it is capable of – can be described by points, lines and surfaces within that space. It’s a powerful way to think about many kinds of system, not just brains. OK, so now just expand that model and imagine it in 10,000,000-dimensional space and you’re in business!)

Er, so that’s quite a big number. If each point were represented by an atom, the entire universe would get completely lost in some small dark corner of this space and never be seen again. Luckily for us, no single human being ever actually experiences more than an infinitesimal fraction of it. When did you last stand on one foot, scratching your left ear, looking at a big red stripe surrounded by green sparkles, whistling the first bar of the Hallelujah Chorus? Not lately, I’m guessing. So we only need to represent those states we actually experience, and then only if they turn out to be useful in some way. Of course we don’t immediately know whether they’re going to turn out useful, so we need a way to represent them as soon as we experience them and then forget them again if they turn out to be irrelevant.

Thus far, this is the line of thinking that I used when I designed the Creatures brains. Inside norns, neurons wire themselves up to represent short permutations of input patterns as they’re experienced, and then connect to other neurons representing possible output patterns. Pairs of neurons equate to points in the n-dimensional space of a norn’s brain, but only a small fraction of that possible space needs to be represented in one creature’s lifetime. These representations fade out unless they get reinforced by punishment or reward chemicals, and the neural network learns to associate certain input patterns with the most appropriate output signal. All these experiences compete with each other for the right to be represented, such that only the most relevant remain and old memories are wiped out if more space is needed. There’s also an implicit hierarchy in the representations (due to the existence of simpler permutations) that allows the norns to generalize – they have a hunch about how to react to new situations, based on previous similar ones.

There’s a great deal more complexity to the Norns’ brains than this and I managed to solve some quite interesting problems. I’m not sure that anyone else has designed such a comprehensive artificial brain and actually made it work, either before or in the 18 years since. But nevertheless, basically this design was a pile of crap. For one thing, there was no order to this space. Point 1,2,3 wasn’t close to point 1,2,4 in the phase space – the points were just in a list, essentially, and there was no geometry to the space. The creatures’ brains were capable of limited generalization because of the hierarchy (too long a story for now) but I really wanted generalization to fall out of the spatial relationships: If you don’t know what to do in response to situation x,y,z, try stimulating the neighboring points, because they represent qualitatively similar situations and you may already have learned how best to react to them. The sum of these “recommendations” is a good bet for how to react to this novel situation. Sometimes this won’t be true, in fact, and that requires the brain to draw boundaries between things that are similar and yet require different responses (a toy alligator is very similar to a real one, and yet…). This is called categorization (and comes in two flavors: perceptual and functional – my son Chris did his PhD on functional categorization). Anyway, basically, we need the n-dimensional phase space to be collapsed down (or projected) into two dimensions (assuming the neural network is a flat sheet), such that representations of similar situations end up lying near to each other.

(At this point, some of you may be astute enough to ask: why collapse n dimensions down to two at all? The human cortex is a flat sheet, so biology has little choice, but we can represent any number of dimensions in a computer with as much ease as two. This is true, but only in principle. In practice, computers are nowhere near big enough to hold a massively multi-dimensional array of 16 elements per dimension (say we only need a mere one hundred dimensions – that’s already 2×10^111 gigabytes!), so we have to find some scheme for collapsing the space while retaining some useful spatial relationships. It could be a list, but why not a 2D surface, since that’s roughly what the brain uses and hence we can look for hints from biology?)

There is no way to do this by simple math alone, because to represent even three dimensions on a two-dimensional surface, the third dimension needs to be broken up into patches and some contiguity will be lost. For instance, imagine a square made from 16×16 smaller squares, each of which is made from 16 stripes. This flattens a 16x16x16 cube into two dimensions. But although point 1,1,2 is close to point 1,1,3 (they’re on neighboring stripes), it’s not close to point 1,2,2, because other stripes get in the way. You can bring these closer together by dividing the space up in a different way, but that just pushes other close neighbors apart instead. Which is the best arrangement as far as categorization and generalization are concerned? One arrangement might work best in some circumstances but not others. When you try to project a 16x16x16x16x16x16x16-point hypercube into two dimensions this becomes a nightmare.

The real brain clearly tries its best to deal with this problem by self-organizing how it squishes 10,000,000 dimensions into two. You can see this in primary visual cortex, where the 2D cortical map is roughly divided up retinotopically (i.e. matching the two-dimensional structure of the retina, and hence the visual scene). But within this representation there are whorls (not stripes, although stripes are found elsewhere) in which a third and fourth dimension (edge-orientation and direction of motion) are represented. Orientation is itself a collapsing down of two spatial dimensions – simply recording the angle of a line instead of the set of points that make it up (that’s partly what a neuron does – it describes a spatial pattern of inputs by a single point). Here we see one of the many clever tricks that the brain uses: The visual world (at least as far as the change-sensitive nature of neurons is concerned) is made up of line segments. Statistically, these are more common than other arbitrary patterns of dots. So visual cortex becomes tuned to recognize only these patterns and ignore all the others (at least in this region – it probably represents textures, etc. elsewhere). The brain is thus trying its best, not only to learn the statistical properties and salience of those relatively few points its owner actually visits in the ten-million-dimensional world of experience, but also to represent them in a spatial arrangement that best categorizes and associates them. It does this largely so that we don’t have to learn something all over again, just because the situation is slightly different from last time.

So, finding the best mechanism for projecting n-dimensional space into two or three dimensions, based on the statistics and salience of stimuli, is part of the challenge of designing an artificial brain. That much I think I can do, up to a point, although I won’t trouble you with how, right now.

I will just mention in passing that there’s a dangerous assumption that we should be aware of. The state space of the brain is discrete, because information arrives and leaves via a discrete number of nerve fibers. The medium for representing this state space is also discrete – a hundred billion neurons. HOWEVER, this doesn’t mean the representation itself is discrete. I suspect the real brain is so densely wired that it approximates a continuous medium, and this is important for a whole host of things. It’s probably very wrong to implicitly equate one neuron with one point in the space or one input pattern. Probably the information in the brain is stored holistically, and each neuron makes a contribution to multiple representations, while each representation is smeared across many (maybe very many) neurons. How much I need to, or can afford to, take account of this for such a pragmatic design remains to be seen. It may be an interesting distraction or it may be critical.

Anyway, besides this business of how best to represent the state space of experience, there are other major requirements I need to think about. In Creatures, the norns were reactive – they learned how best to respond to a variety of situations, and when those situations arose in future, this alone would trigger a response. They were thus stimulus-response systems. Yeuch! Nasssty, nassty behaviourist claptrap! Insects might (and only might) work like that, but humans certainly don’t (except in the more ancient parts of our brains). Probably no mammals do, nor birds. We THINK. We have internal states that change over time, even in the absence of external changes. Our thoughts are capable of linking things up in real-time, to create routes and plans and other goal-directed processes. Our “reactions” are really pre-actions – we don’t respond to what’s just happened but to what we believe is about to happen. We can disengage from the world and speculate, hope, fear, create, invent. How the hell do we do this?

Well, the next step up, after self-organizing our representations, is to form associations between them. After that comes dynamics – using these associations to build plans and speculations and to simulate the world around us inside the virtual world of our minds. This post has merely been a prelude to thinking about how we might form associations, how these relate to the underlying representations, what these associations need to be used for, and how we might get some kind of dynamical system out of this, instead of just a reactive one. I just wanted to introduce the notion of state space for those who aren’t used to it, and talk a little about collapsing n-dimensional space into fewer dimensions whilst maximizing utility. Up until now I’ve just been bringing you up to speed. From my next post onward I’ll be feeling my own way forward. Or maybe just clutching at straws…

Brainstorm #3 – cheating doesn’t pay (sometimes)

I was going to write about self-organizing maps and associative links next but I need to make a little detour.

One of the quandaries when making a virtual artificial creature, as opposed to a robot, is how much to cheat. In software, cheating is easy, while simulating physical reality is hard. And I’m writing a video game here – I have severe computational constraints, user expectations (like the need to simulate many creatures simultaneously), and a very limited development schedule. Hmm… So let’s cheat like mad!

Oh, but… For one thing, cheating is cheating. I think Creatures was successful in large part because I was honest. I did my genuine best (given the constraints of technology and knowledge) to make something that was really alive, and this time I plan to be even more tough on myself and do my best to create something that really thinks and might even be conscious.

There’s also an intellectual reason not to cheat more than I can help, though. Cheating doesn’t pay. I’ll walk you through it.

Take vision, for instance. How am I going to handle the creatures’ visual systems? The honest, not-cheating-at-all way would be to attach a virtual camera (or two) to each creature’s head and use the 3D engine to render the scene from the creature’s perspective onto a bitmap. This would act as the creature’s retina, and the neural network would then have to identify objects, etc. from the features in the scene. Well that’s not going to happen. For one thing I can’t afford that much computer power inside a game. For another, it would involve me solving ALL the major challenges of visual science, and even at my most ambitious I can see that’s not going to be feasible between now and next summer.

At the other extreme, in Creatures I simply told the norns the category of the object they were currently looking towards. If they looked towards  a ball, the “I can see a toy” neuron was stimulated. If it was a carrot, the “I can see food” neuron lit up. It was the best I could do twenty years ago but it won’t cut it now. So I need something in-between.

But it’s harder than it at first appears. We don’t just use vision for recognizing objects; we use it for locating things in space and for navigating through space. Everyday objects can be treated as simple points, with a given direction and depth from the creature. But a close or large object extends over a wide angle of view. A wall may occupy half the visual field and the objective may be to walk around it. You can’t treat it as a point.

How should my creatures navigate anyway? The obvious way to handle navigation is to use a path-planning algorithm to find a route from here to there, avoiding obstacles. All the information is there in the virtual world for this to happen. Trying to do it from the creature’s own limited sensory information and memory seems like a ridiculous amount of effort that nobody will ever recognize or appreciate.

But here’s the thing:

Relating objects in space involves associations. Forming a mental map of your world is an object lesson in associative memory. Navigating to a target location is remarkably similar to planning, which in turn is remarkably similar to simulating the future, which is the core of conscious experience and the very thing I want to understand and implement. Come to that, navigating is very akin to servoing – reducing the distance between where I am and where I want to be. And for humans at least, this is a nested servo process: To go from the backyard to the shed to get a tool, I need first to go into my kitchen closet and get the key. To get into my kitchen I need to go through the back door, which is in the opposite direction to the shed. Then I have to go to the closet, reach for the key and then backtrack towards the shed. It’s a chain of servo actions and it’s nonlinear (the ultimate goal is reached by first moving away from it). These are precisely the things that I set out in Brainstorm #1 as the features I’m looking for. If I cheated, I might not even have seen the connection between visually-guided navigation and thinking.

In the brain, we know that there are “place fields” in the hippocampus (an older, simpler, curly fringe of the cortex). As far as I know, there’s no evidence (and it doesn’t seem awfully likely) that these “points of best representation” (see Brainstorm #2) are arranged geographically. I’ll have to catch up on the latest information, but it seems like these memories of place form a different kind of relationship and I can’t assume the brain navigates using a map in the conventional, geometrical sense. But somehow geographical relationships are encoded in the brain such that it’s possible for us to figure out (far better than any current computer game) how to get from A to B. We’re capable of doing this with both certain knowledge and uncertainty – navigating around moving obstacles, say, or traveling through unfamiliar territory. This is SO similar to goal-directed planning in general. It’s SO similar to predicting possible futures. All that differs is the kind of associative link (“is to the north-east of” instead of “tends to be followed by” or “is like”). There has to be a coherence to all this.

For a brief moment then I imagined a declarative statement written in PROLOG! God no! Please don’t make the brain a forward-chaining planner or an expert system! It’s interesting that the early days of AI were pretty close to the mark in some ways. Thinking IS a bit like deducing that “Mark is Sheila’s husband” from a set of predicates like “Mark is a man”; “Mark has a son called Peter”, “people who share a child are married”, etc. It IS a bit like a probablistic tree planning algorithm. But these are too abstract, too symbolic, too digital. Navigating through space is an analog process. Reaching out to grab an object is not a discrete, symbolic operation. Being fed a sequence of carefully chosen facts and rules is not the same as learning by experience. And yet…

The reasons why symbolic AI has failed are many and varied, and I don’t have the space or energy. But you can see that the early pioneers were heading in a good direction, thwarted only by some fundamental mistakes and false assumptions about symbol processing and abstraction. It was a fault of the paradigm and tools of both science and mathematics, not intent.

But my point here is that my creatures need to see in a very much more sophisticated way than norns did, and yet a more abstract way than true vision. And I need to find both an external (sensory) and internal (memory) representation that is authentic enough to make visually guided behavior a NATURAL PART OF thinking. The two are so close in concept that they must surely share a mechanism, or at least a set of computational principles. On the one hand this adds new problems – I have to think about navigation, obstacle avoidance, visual binding, retinotopic-to-egocentric conversion, egocentric-to-geographical conversion and a bunch of other things on top of all my other problems. On the other hand, by not cheating (too much, whatever that means) I’m now blessed with a whole new set of symptoms and requirements that give me a better grasp of what must be going on in the brain. It will help me see the central design problem more clearly. This, incidentally, is the reason why we should all be trying to create complete organisms, instead of fooling ourselves that the problem can be divided up and solved piecemeal.

I don’t know the answers to any part of this yet and there will be many future posts on visual representation, etc. But I’m glad I thought about this before starting to think more closely about associative links.

Brainstorm #2

Ye Gods! I’d better get in quickly with a second installment – I’ve already written more words in replies to comments than there were in my first post. Thanks so much to all of you who have contributed comments already – I only posted it yesterday! I really appreciate it and I hope you’ll continue to add thoughts and observations.

Opening up my thought processes like this is a risky and sometimes painful thing to do, and I know from past experience that certain things tend to happen, so I’d like to make a few general observations to forestall any misunderstandings.

Firstly, I know a lot of you have your own ambitions, theories and hopes in this area, and I’ll do what I can to accommodate them or read your papers or whatever. But bear in mind that I can’t please everybody – I have to follow my own path. So if I don’t go in a direction you’d like me to go, I apologize. I’ll try to explain my reasoning but inevitably I’m going to have to make my own choices.

Secondly, I do this kind of work because I believe I have some worthwhile insights already. I’m not desperately looking for ideas or existing theories – the people who invented these ideas are perfectly welcome to write their own games. This is a tricky area, because I like it when someone says “have you thought of doing XXX?” but I’m not so interested in “have you seen YYY theory or ZZZ’s work?” I just don’t work that way – I prefer to think things through from first principles – and I’m writing this game largely to develop my own ideas, rather than with the pragmatic aim of writing a commercial application by bolting together other people’s.

Lastly, I invariably develop software alone. Nobody has offered to help or asked for this to be open source yet, but I know it’s coming. I don’t do collaborations. Collaborations have driven me crazy (and almost bankrupt) in the past. I know there are loads of people who would love to be part of a project like this, but all I can suggest is that you go off together and write one, because it’s not for me. I’m opening it up because I know people find it interesting and I wanted to share the design process, but I’m not interested in working on the actual code with others. It’s just not my thing.

Oh, and I do realize this is ambitious. I know it may not work. But I’m not as naive as I look, either. I’ve written four commercial games and at least a dozen commercial titles in other fields, so I’m pretty competent in terms of software development and product design. And I’ve been working in AI since the late 1970’s. Although it’s only my hobby, strictly speaking, I’m pretty well connected with the academic community and conversant with the state of the art. And I have an existence proof in Creatures, as long as you make allowances for the fact that I started writing it almost two decades ago. So don’t worry that I’m unwittingly being foolish and naive – I already know exactly how foolish I am!

Forgive me for saying these things up front – I really welcome and appreciate everybody’s support, thoughts, criticisms and general conversation. I just wanted to state a few ground rules, because it’s quite emotionally taxing to open up your innermost thought processes for inspection, and the provisional nature of everything can sometimes make it look like I’m floundering when really I’m just trucking along steadily.

Ok, so where to next? The features I mentioned yesterday were all aspects I’d like to see emerging from a common architecture. Jason admonished me to make sure I design a hierarchical brain, in which lower levels (equivalent to the thalamus and the brainstem) are fully functioning systems in their own right, and could be the complete brains of simpler animals as well as the evolutionary foundation for higher brain functions. I think this is important and a good point. The reptilian thalamus/limbic system probably works by manipulating more primitive reflexes in the brainstem. The cortex then unquestionably supervenes over the thalamus (for instance if we deliberately wish to look in a particular direction we quite probably do this by sending signals from the cortex (the frontal eye fields) to the superior colliculi of the thalamus, AS IF they were visual stimuli, thus causing the SC to carry out its normal unconscious duty of orienting the eyes towards a sudden movement). And finally, the prefrontal lobes of the cortex seem to supervene over an already functional set of subconscious impulses, motor and perceptual circuits in the rest of cortex, adding planning, the ability to defer reward, empathy and possibly subjective consciousness to the repertoire. So there are good reasons to follow this scheme myself.

But for now I’d like to think mostly about the cortical layer of the system. This is (perhaps) where memory plays the greatest role; where classification, categorization and generalization occur; and where prediction and the ability to generate simulations arises. I can assume that beneath this there are a bunch of reflexes and servoing subsystems that provide the outputs – I’ll worry about how to implement these later. But somehow I need to develop a coherent scheme for recognizing and classifying inputs and associating these with each other, both freely (as in “X reminds me of Y”) and causally (as in “if this is the trajectory that events have been taking, this is what I think will happen next”). Somehow these predictions need to iterate over time, so that the system can see into the future and ask “what if?” questions.

Let’s think about classification first. The ability to classify the world is crucial. It’s insufficient for intelligence, despite the huge number of neural nets, etc. that are nothing but classifier systems, but it’s necessary.

Here’s an assertion: let’s assume that the cortical surface is a map, such that, for any given permutation of sensory inputs, there will be a set of points on the surface that come to best represent that permutation.

It’s a set of points – a pattern – because I’m assuming this is a hierarchical system. If you hear a particular voice, a set of points of activity will light up in primary auditory cortex and elsewhere, representing the frequency spectrum of the voice, the time signature, the location, etc. Some other parts of auditory cortex will contain the best point to represent whose voice it is, based on those earlier points, or which word they just said. Other association areas deeper in the system will contain the points that best represent the combination of that person’s voice with their face, etc. Perhaps way off in the front there will be a point that best represents the entire current context – what’s going on. Other points in motor cortex represent things you might do about it, and they in turn will activate points lower down representing the muscle dispositions needed to carry out this action. So the brain will have a complex pattern of activation, but it’s reasonable to assert (I think) that EACH POINT ON THE CORTICAL SURFACE MAY BEST REPRESENT SOME GIVEN PERMUTATION OF INPUTS (INCLUDING CORTICAL ACTIVITY ELSEWHERE).

The cortex would therefore be a map of the state of the world. This is a neat assumption to work with, because it has several corollaries. For one thing, if the present state of the world is mapped out as such a pattern, then the future state, or the totally imagined state, or the intended state of the world can simultaneously be mapped out on the same real estate (perhaps using different cells in the same cortical columns). Having such a map allows the brain to specify world state in a variety of ways for a variety of reasons: sensation, perception, anticipation, intention, imagination and attention. Each is a kind of layer on the map, and they can be presumed to interact. So, for instance, the present state and recent past states give rise to the anticipated future state, via memories of probability derived from experience. Or attention can be guided by the sensory map and used to filter the perceptual or motor maps.

A second corollary might be that SIMILAR PERMUTATIONS TEND TO BE BEST REPRESENTED BY CLOSE NEIGHBORS. If this is true, then the system can generalize, simply by having some fuzziness in the neural activity pattern. If we experience a novel situation, it will give rise to activity centered over a unique point, but this point is close to other points representing similar, perhaps previously experienced situations. If we know how to react to them, we can guess that this is the best response to the novel situation too, and we can make use of this knowledge simply by stimulating all the points around the novel one.

When I say these are points on the cortical surface, I mean there will be an optimum point for each permutation, but the actual activity will be much more broad. I have a strong feeling that the brain works in a very convolved way – any given input pattern will activate huge swathes of neurons, but some more than others, such that the “center of gravity” of the activity is over the appropriate optimum point. I showed with Lucy that such large domes of activity can be used for both servoing and coordinate transforms (e.g. to orient the eyes and head towards a stimulus depending on where it is in the retinal field – a transform from retinal to head-centered coordinates). Smearing out the activity in this way also permits generalization, as above. But it’s a bummer to think about, because everything’s blurry and holographic!

I have some nagging issues about all this but for now I’ll run with it. It’s a neat mechanism, and if biology doesn’t work this way then it damn well ought! It’s a good starting point, anyway. Lots of things fall out of it.

And I already have a mechanism that works for the self-organization of primary visual cortex and may be more generally applicable to this “classification by mapping” scheme. But that, and some questions and observations about categories and the collapse of phase space, can wait for next time!

EDIT: Just a little footnote on veracity: I like to be inspired by biology but this doesn’t mean I follow it slavishly. So if I assert that perhaps the cortex acts like a series of overlaid maps, I’ll have done so because it’s plausible and there’s some supportive evidence. But please remember that this is an engineering project – I’m not saying the cortex DOES work like this; only that it’s reasonably consistent with the facts and provides a useful hunch for designing an artificial brain. It’s a way of inventing, not discovering. So sometimes I say cortex and mean the real thing, and sometimes I’m talking about my hypothetical engineered one. I ought to use inverted commas really, but I hope you’ll infer the distinction.

Brainstorm #1

Ok, here goes…

Life has been rather complicated and exhausting lately. Not all of it bad by any means; some of it really good, but still rather all-consuming. Nevertheless, it really is time that I devoted some effort to my work again. So I’ve started work on a new game (hooray! I hear you say ;-)). I have no idea what the game will consist of yet – just as with Creatures I’m going to create life and then let the life-forms tell me what their story is.

I wasted a lot of time writing Sim-biosis and then abandoning it, but I did learn a lot about 3D in the process. This time I’ve decided to swallow my pride and use a commercial 3D engine – Unity. (By the way, I’m writing for desktop environments – I need too much computer power for iPhone, etc.) Unity is the first 3D engine I’ve come across that supports C#.NET (well, Mono) scripting AND is actually finished and working, not to mention has documentation that gives developers some actual clue about the contents of the API. I have to jury-rig it a bit because most games have only trivial scripts and I need to write very complex neural networks and biochemistries, for which a simple script editor is a bit limiting, but the next version has debug support and hopefully will integrate even better with Visual Studio, allowing me to develop complex algorithms without regressing to the technology of the late 1970’s in order to debug them. So far I’m very impressed with Unity and it seems to be capable of at least most of the weird things that a complex Alife sim needs, as compared to running around shooting things, which is what game engines are designed for.

So, I need a new brain. Not me, you understand – I’ll have to muddle along with the one I was born with. I mean I need to invent a new artificial brain architecture (and eventually a biochemistry and genetics). Nothing else out there even begins to do what I want, and anyway, what’s the point of me going to all this effort if I don’t get to invent new things and do some science? It’s bad enough that I’m leaving the 3D front end to someone else.

I’ve decided to stick my neck out and blog about the process of inventing this new architecture. I’ve barely even thought about it yet – I have many useful observations and hypotheses from my work on the Lucy robots but nothing concrete that would guide me to a complete, practical, intelligent brain for a virtual creature. Mostly I just have a lot more understanding of what not to do, and what is wrong with AI in general. So I’m going to start my thoughts almost from scratch and I’m going to do it in public so that you can all laugh at my silly errors, lack of knowledge and embarrassing back-tracking. On the other hand, maybe you’ll enjoy coming along for the ride and I’m sure many of you will have thoughts, observations and arguments to contribute. I’ll try to blog every few days. None of it will be beautifully thought through and edited – I’m going to try to record my stream of consciousness, although obviously I’m talking to you, not to myself, so it will come out a bit more didactic than it is in my head.

So, where do I start? Maybe a good starting point is to ask what a brain is FOR and what it DOES. Surprisingly few researchers ever bother with those questions and it’s a real handicap, even though skipping it is often a convenient way to avoid staring at a blank sheet of paper in rapidly spiraling anguish.

The first thing to say, perhaps, is that brains are for flexing muscles. They also exude chemicals but predominantly they cause muscles to contract. It may seem silly to mention this but it’s surprisingly easy to forget. Muscles are analog, dynamical devices whose properties depend on the physics of the body. In a simulation, practicality overrules authenticity, so if I want my creatures to speak, for example, they’ll have to do so by sending ASCII strings to a speech synthesizer, not by flexing their vocal chords, adjusting their tongue and compressing their lungs. But it’s still important to keep in mind that the currency of brains, as far as their output is concerned, is muscle contraction. It’s the language that brains speak. Any hints I can derive from nature need to be seen in this light.

One consequence of this is that most “decisions” a creature makes are analog; questions of how much to do something, rather than what to do. Even high-level decisions of the kind, “today I will conscientiously avoid doing my laundry”, are more fuzzy and fluid than, say, the literature on action selection networks would have us believe. Where the brain does select actions it seems to do so according to mutual exclusion: I can rub my stomach and pat my head at the same time but I can’t walk in two different directions at once. This doesn’t mean that the rest of my brain is of one mind about things, just that my basal ganglia know not to permit all permutations of desire. An artificial lifeform will have to support multiple goals, simultaneous actions and contingent changes of mind, and my model needs to allow for that. Winner-takes-all networks won’t really cut it.

Muscles tend to be servo-driven. That is, something inputs a desired state of tension or length and then a small reflex arc or more complex circuit tries to minimize the difference between the muscle’s current state and this desired state. This is a two-way process – if the desire changes, the system will adapt to bring the muscle into line; if the world changes (e.g. the cat jumps out of your hands unexpectedly) then the system will still respond to bring things back into line with the unchanged goal. Many of our muscles control posture, and movement is caused by making adjustments to these already dynamic, homeostatic, feedback loops. Since I want my creatures to look and behave realistically, I think I should try to incorporate this dynamism into their own musculature, where possible, as opposed to simply moving joints to a given angle.

But this notion of servoing extends further into the brain, as I tried to explain in my Lucy book. Just about ALL behavior can be thought of as servo action – trying to minimize the differential between a desired state and a present state. “I’m hungry, therefore I’ll phone out for pizza, which will bring my hunger back down to its desired state of zero” is just the topmost level in a consequent flurry of feedback, as phoning out for pizza itself demands controlled arm movements to bring the phone to a desired position, or lift one’s body off the couch, or move a tip towards the delivery man. It’s not only motor actions that can be viewed in this light, either. Where the motor system tries to minimize the difference between an intended state and the present state by causing actions in the world, the sensory system tries to minimize the difference between the present state and the anticipated state, by causing actions in the brain. The brain seems to run a simulation of reality that enables it to predict future states (in a fuzzy and fluid way), and this simulation needs to be kept in train with reality at several contextual levels. It, too, is reminiscent of a battery of linked servomotors, and there’s that bidirectionality again. With my Lucy project I kept seeing parallels here, and I’d like to incorporate some of these ideas into my new creatures.

This brings up the subject of thinking. When I created my Norns I used a stimulus-response approach: they sensed a change in their environment and reacted to it. The vast bulk of connectionist AI takes this approach, but it’s not really very satisfying as a description of animal behavior beyond the sea-slug level. Brains are there to PREDICT THE FUTURE. It takes too long for a heavy animal with long nerve pathways to respond to what’s just happened (“Ooh, maybe I shouldn’t have walked off this cliff”), so we seem to run a simulation of what’s likely to happen next (where “next” implies several timescales at different levels of abstraction). At primitive levels this seems pretty hard-wired and inflexible, but at more abstract levels we seem to predict further into the future when we have the luxury, and make earlier but riskier decisions when time is of the essence, so that means the system is capable of iterating. This is interesting and challenging.

Thinking often (if not always) implies running a simulation of the world forwards in time to see what will happen if… When we make plans we’re extrapolating from some known future towards a more distant and uncertain one in pursuit of a goal. When we’re being inventive we’re simulating potential futures, sometimes involving analogies rather than literal facts, to see what will happen. When we reflect on our past, we run a simulation of what happened, and how it might have been different if we’d made other choices. We have an internal narrative that tracks our present context and tries to stay a little ahead of the game. In the absence of demands, this narrative can flow unhindered and we daydream or become creative. As far as I can see, this ability to construct a narrative and to let it freewheel in the absence of sensory input is a crucial element of consciousness. Without the ability to think, we are not conscious. Whether this ability is enough to constitute conscious awareness all by itself is a sticky problem that I may come back to, but I’d like my new creatures actively to think, not just react.

And talking about analogies brings up categorization and generalization. We classify our world, and we do it in quite sophisticated ways. As a baby we start out with very few categories – perhaps things to cry about and things to grab/suck. And then we learn to divide this space up into finer and finer, more and more conditional categories, each of which provokes finer and finer responses. That metaphor of “dividing up” may be very apposite, because spatial maps of categories would be one way to permit generalization. If we cluster our neural representation of patterns, such that similar patterns lie close to each other, then once we know how to react to (or what to make of) one of those patterns, we can make a statistically reasonable hunch about how to react to a novel but similar pattern, simply by stimulating its neighbors. There are hints that such a process occurs in the brain at several levels, and generalization, along with the ability to predict future consequences, are hallmarks of intelligence.

So there we go. It’s a start. I want to build a creature that can think, by forming a simulation of the world in its head, which it can iterate as far as the current situation permits, and disengage from reality when nothing urgent is going on. I’d like this predictive power to emerge from shorter chains of association, which themselves are mapped upon self-organized categories. I’d like this system to be fuzzy, so that it can generalize from similar experiences and perhaps even form analogies and metaphors that allow it to be inventive, and so that it can see into the future in a statistical way – the most likely future state being the most active, but less likely scenarios being represented too, so that contingencies can be catered for and the Frame Problem goes away (see my discussion of this in the comments section of an article by Peter Hankins). And I’d like to incorporate the notion of multi-level servomechanisms into this, such that the ultimate goals of the creature are fixed (zero hunger, zero fear, perfect temperature, etc.) and the brain is constantly responding homeostatically (and yet predictively and ballistically) in order to reduce the difference between the present state and this desired state (through sequences of actions and other adjustments that are themselves servoing).

Oh, and then there’s a bunch of questions about perception. In my Lucy project I was very interested in, but failed miserably to conquer, the question of sensory invariance (e.g. the ability to recognize a banana from any angle, distance and position, or at least a wide variety of them). Invariance may be bound up with categorization. This is a big but important challenge. However, I may not have to worry about it, because I doubt my creatures are going to see or feel or hear in the natural sense. The available computer power will almost certainly preclude this and I’ll have to cheat with perception, just to make it feasible at all. That’s an issue for another day – how to make virtual sensory information work in a way that is computationally feasible but doesn’t severely limit or artificially aid the creatures.

Oh yes, and it’s got to learn. All this structure has to self-organize in response to experience. The learning must be unsupervised (nothing can tell it what the “right answer” was, for it to compare its progress) and realtime (no separate training sessions, just non-stop experience of and interaction with the world).

Oh man, and I’d like for there to be the ability for simple culture and cooperation to emerge, which implies language and thus the transfer of thoughts, experience and intentions from one creature to another. And what about learning by example? Empathy and theory of mind? The ability to manipulate the environment by building things? OK, STOP! That’s enough to be going on with!

A shopping list is easy. Figuring out how to actually do it is going to be a little trickier. Figuring out how to do it in realtime, when the virtual world contains dozens of creatures and the graphics engine is taking up most of the CPU cycles is not all that much of a picnic either. But heck, computers are a thousand times faster than they were when I invented the Norns. There’s hope!