Brainstorm 5: joining up the dots

I promised myself I’d blog about my thoughts, even if I don’t really have any and keep going round in circles. Partly I just want to document the creative process honestly – so this includes the inevitable days when things aren’t coming together – and partly it helps me if I try to explain things to people. So permit me to ramble incoherently for a while.

I’m trying to think about associations. In one sense the stuff I’ve already talked about is associative: a line segment is an association between a certain set of pixels. A cortical map that recognizes faces probably does so by associating facial features and their relative positions. I’m assuming that each of these things is then denoted by a specific point in space on the real estate of the brain – oriented lines in V1 and faces in the FFA. In both these cases there are several features at one level, which are associated and brought together at a higher level. A bunch of dots maketh one line. Two dark blobs and a line in the right arrangement maketh a face. A common assumption (which may not be true) is that neurons do this explicitly: the dendritic field of a visual neuron might synapse onto a particular pattern of LGN fibres carrying retinal pixel data. When this pattern of pixels becomes active, the neuron fires. That specific neuron – that point on the self-organizing map – therefore means “I can see a line at 45 degrees in this part of the visual field.”

But the brain also supports many other kinds of associative link. Seeing a fir tree makes me think of Christmas, for instance. So does smelling cooked turkey. Is there a neuron that represents Christmas, which synapses onto neurons representing fir trees and turkeys? Perhaps, perhaps not. There isn’t an obvious shift in levels of representation here.

Not only do turkeys make me think of Christmas, but Christmas makes me think of turkeys. That implies a bidirectional link. Such a thing may actually be a general feature, despite the unidirectional implication of the “line-detector neuron” hypothesis. If I imagine a line at 45 degrees, this isn’t just an abstract concept or symbol in my mind. I can actually see the line. I can trace it with my finger. If I imagine a fir tree I can see that too. So in all likelihood, the entire abstraction process is bidirectional and thus features can be reconstructed top-down, as well as percepts being constructed/recognized bottom-up.

But even so, loose associations like “red reminds me of danger” don’t sound like the same sort of association as “these dots form a line”. A line has a name – it’s a 45-degree line at position x,y – but what would you call the concept that red reminds me of danger? It’s just an association, not a thing. There’s no higher-level concept for which “red” and “danger” are its characteristic features. It’s just a nameless fact.

How about a melody? I know hundreds of tunes, and the interesting thing is, they’re all made from the same set of notes. The features aren’t what define a melody, it’s the temporal sequence of those features; how they’re associated through time. Certainly we can’t imagine there being a neuron that represents “Auld Lang Syne”, whose dendrites synapse onto our auditory cortex’s representations of the different pitches that are contained in the tune. The melody is a set of associations with a distinct sequence and a set of time intervals. If someone starts playing the tune and then stops in the middle I’ll be troubled, because I’m anticipating the next note and it fails to arrive. Come to that, there’s a piano piece by Rick Wakeman that ends in a glissando, and Wakeman doesn’t quite hit the last note. It drives me nuts, and yet how do I even know there should be another note? I’m inferring it from the structure. Interestingly, someone could play a phrase from the middle of “Auld Lang Syne” and I’d still be able to recognize it. Perhaps the tune is represented by many overlapping short pitch sequences? But if so, then this cluster of representations is collectively associated with its title and acts as a unified whole.

Thinking about anticipating the next note in a tune reminds me of my primary goal: a representation that’s capable of simulating the world by assembling predictions. State A usually leads to state B, so if I imagine state A, state B will come to mind next and I’ll have a sense of personal narrative. I’ll be able to plan, speculate, tell myself stories, relive a past event, relive it as if I’d said something wittier at the time, etc. Predictions are a kind of association too, but between what? A moving 45-degree line at one spot on the retina tends to lead to the sensation of a 45-degree line at another spot, shortly afterwards. That’s a predictive association and it’s easy to imagine how such a thing can become encoded in the brain. But Turkeys don’t lead to Christmas. More general predictions arise out of situations, not objects. If you see a turkey and a butcher, and catch a glint in the butcher’s eye, then you can probably make a prediction, but what are the rules that are encoded here? What kind of representation are we dealing with?

“Going to the dentist hurts” is another kind of association. “I love that woman” is of a similar kind. These are affective associations and all the evidence shows that they’re very important, not only for the formation of memories (which form more quickly and thoroughly when there’s some emotional content), but also for the creation of goal-directed behavior. We tend to seek pleasure and avoid pain (and by the time we’re grown up, most of us can even withstand a little pain in the expectation of a future reward).

A plan is the predictive association of events and situations, leading from a known starting point to a desired goal, taking into account the reward and punishment (as defined by affective associations) along the route. So now we have two kinds of association that interact!

To some extent I can see that the meaning of an associative link is determined by what kind of thing it is linking. The links themselves may not be qualitatively different – it’s just the context. Affective associations link memories (often episodic ones) with the emotional centers of the brain (e.g. the amygdala). Objects can be linked to actions (a hammer is associated with a particular arm movement). Situations predict consequences. Cognitive maps link objects with their locations. Linguistic areas link objects, actions and emotions with nouns, verbs and adjectives/adverbs. But there do seem to be some questions about the nature of these links and to what extent they differ in terms of circuitry.

Then there’s the question of temporary associations. And deliberate associations. Remembering where I left my car keys is not the same as recording the fact that divorce is unpleasant. The latter is a semantic memory and the former is episodic, or at least declarative. Tomorrow I’ll put my car keys down somewhere else, and that will form a new association. The old one may still be there, in some vague sense, and I may one day develop a sense of where I usually leave my keys, but in general these associations are transient (and all too easily forgotten).

Binding is a form of temporary association. That ball is green; there’s a person to my right; the cup is on the table.

And attention is closely connected with the formation or heightening of associations. For instance, in Creatures I had a concept called “IT”. “IT” was the object currently being attended to, so if a norn shifted its attention, “IT” would change, and if the norn decided to “pick IT up”, the verb knew which noun to apply to. In a more sophisticated artificial brain, this idea has to be more comprehensive. We may need two or more ITs, to form the subject and object of an action. We need to remember where IT is, in various coordinate frames, so that we can reach out and grab IT or look towards IT or run away from IT. We need to know how big IT is, what color IT is, who IT belongs to, etc. These are all associations.

Perhaps there are large-scale functional associations, too. In other words, data from one space can be associated with another space temporarily to perform some function. What came to mind that made me think of this is the possibility that we have specialized cortical machinery for rotating images, perhaps developed for a specific purpose, and yet I can choose, any time I like, to rotate an image of a car, or a cat, or my apartment. If I imagine my apartment from above, I’m using some kind of machinery to manipulate a particular set of data points (after all, I’ve never seen my apartment from above, so this isn’t memory). Now I’m imagining my own body from above – I surely can’t have another machine for rotating bodies, so somehow I’m routing information about the layout of my apartment or the shape of my body through to a piece of machinery (which, incidentally, is likely to be cortical and hence will have self-organized using the same rules that created the representation of my apartment and the ability to type these words). Routing signals from one place to another is another kind of association.

Language is interesting (I realize that’s a bit of an understatement!). I don’t believe the Chomskyan idea that grammar is hard-wired into the brain. I think that’s missing the point. I prefer the perspective that the brain is wired to think, and grammar is a reflection of how the brain thinks. [noun][verb][noun] seems to be a fundamental component of thought. “Janet likes John.” “John is a boy.” “John pokes Janet with a stick.” Objects are associated with each other via actions, and both the objects and actions can be modulated (linguistically, adverbs modulate actions; adjectives modify or specify objects). At some level all thought has this structure, and language just reflects that (and allows us to transfer thoughts from one brain to another). But the level at which this happens can be very far removed from that of discrete symbols and simple associations. Many predictions can be couched in linguistic terms: IF [he] [is threatening] [me] AND [I][run away from][him] THEN [I][will be][safe]. IF [I][am approaching][an obstacle]AND NOT ([I][turn]) THEN [I][hurt]. But other predictions are much more fluid and continuous: In my head I’m imagining water flowing over a waterfall, turning a waterwheel, which turns a shaft, which grinds flour between two millstones. I can see this happening – it’s not just a symbolic statement. I can feel the forces; I can hear the sound; I can imagine what will happen if the water flow gets too strong and the shaft snaps. Symbolic representations and simple linear associations won’t cut it to encode such predictive power. I have a real model of the laws of physics in my head, and can apply it to objects I’ve never even seen before, then imagine consequences that are accurate, visual and dynamic. So at one level, grammar is a good model for many kinds of association, including predictive associations, but at another it’s not. Are these the same processes – the same basic mechanism – just operating at different levels of abstraction, or are they different mechanisms?

These predictions are conditional. In the linguistic examples above, there’s always an IF and a set of conditionals. In the more fluid example of the imaginary waterfall, there are mathematical functions being expressed, and since a function has dependent variables, this is a conditional concept too. High-level motor actions are also conditional: walking consists of a sequence of associations between primitive actions, modulated by feedback and linked by conditional constructs such as “do until” or “do while”.

So, associations can be formed and broken, switched on and off, made dependent on other associations, apply specifically or broadly, embody sequence and timing and probability, form categories and hierarchies or link things without implying a unifying concept. They can implement rules and laws as well as facts. They may or may not be commutative. They can be manipulated top-down or formed bottom-up… SOMEHOW all this needs to be incorporated into a coherent scheme. I don’t need to understand how the entire human brain works – I’m just trying to create a highly simplified animal-like brain for a computer game. But brains do some impressive things (nine-tenths of which most AI researchers and philosophers forget about when they’re coming up with new theories). I need to find a representation and a set of mechanisms for defining associations that have many of these properties, so that my creatures can imagine possible futures, plan their day, get from A to B and generalize from past experiences. So far I don’t have any great ideas for a coherent and elegant scheme, but at least I have a list of requirements, now.

I think the next thing to do is think more about the kinds of representation I need – how best to represent and compute things like where the creature is in space, what kind of situation it is in, what the properties of objects are, how actions are performed. Even though I’d like most of this to emerge spontaneously, I should at least second-guess it to see what we might be dealing with. If I lay out a map of the perceptual and motor world, maybe the links between points on this map (representing the various kinds of associations) will start to make sense.

Or I could go for a run. Yes, I like that thought better.

Brainstorm #1

Ok, here goes…

Life has been rather complicated and exhausting lately. Not all of it bad by any means; some of it really good, but still rather all-consuming. Nevertheless, it really is time that I devoted some effort to my work again. So I’ve started work on a new game (hooray! I hear you say ;-)). I have no idea what the game will consist of yet – just as with Creatures I’m going to create life and then let the life-forms tell me what their story is.

I wasted a lot of time writing Sim-biosis and then abandoning it, but I did learn a lot about 3D in the process. This time I’ve decided to swallow my pride and use a commercial 3D engine – Unity. (By the way, I’m writing for desktop environments – I need too much computer power for iPhone, etc.) Unity is the first 3D engine I’ve come across that supports C#.NET (well, Mono) scripting AND is actually finished and working, not to mention has documentation that gives developers some actual clue about the contents of the API. I have to jury-rig it a bit because most games have only trivial scripts and I need to write very complex neural networks and biochemistries, for which a simple script editor is a bit limiting, but the next version has debug support and hopefully will integrate even better with Visual Studio, allowing me to develop complex algorithms without regressing to the technology of the late 1970’s in order to debug them. So far I’m very impressed with Unity and it seems to be capable of at least most of the weird things that a complex Alife sim needs, as compared to running around shooting things, which is what game engines are designed for.

So, I need a new brain. Not me, you understand – I’ll have to muddle along with the one I was born with. I mean I need to invent a new artificial brain architecture (and eventually a biochemistry and genetics). Nothing else out there even begins to do what I want, and anyway, what’s the point of me going to all this effort if I don’t get to invent new things and do some science? It’s bad enough that I’m leaving the 3D front end to someone else.

I’ve decided to stick my neck out and blog about the process of inventing this new architecture. I’ve barely even thought about it yet – I have many useful observations and hypotheses from my work on the Lucy robots but nothing concrete that would guide me to a complete, practical, intelligent brain for a virtual creature. Mostly I just have a lot more understanding of what not to do, and what is wrong with AI in general. So I’m going to start my thoughts almost from scratch and I’m going to do it in public so that you can all laugh at my silly errors, lack of knowledge and embarrassing back-tracking. On the other hand, maybe you’ll enjoy coming along for the ride and I’m sure many of you will have thoughts, observations and arguments to contribute. I’ll try to blog every few days. None of it will be beautifully thought through and edited – I’m going to try to record my stream of consciousness, although obviously I’m talking to you, not to myself, so it will come out a bit more didactic than it is in my head.

So, where do I start? Maybe a good starting point is to ask what a brain is FOR and what it DOES. Surprisingly few researchers ever bother with those questions and it’s a real handicap, even though skipping it is often a convenient way to avoid staring at a blank sheet of paper in rapidly spiraling anguish.

The first thing to say, perhaps, is that brains are for flexing muscles. They also exude chemicals but predominantly they cause muscles to contract. It may seem silly to mention this but it’s surprisingly easy to forget. Muscles are analog, dynamical devices whose properties depend on the physics of the body. In a simulation, practicality overrules authenticity, so if I want my creatures to speak, for example, they’ll have to do so by sending ASCII strings to a speech synthesizer, not by flexing their vocal chords, adjusting their tongue and compressing their lungs. But it’s still important to keep in mind that the currency of brains, as far as their output is concerned, is muscle contraction. It’s the language that brains speak. Any hints I can derive from nature need to be seen in this light.

One consequence of this is that most “decisions” a creature makes are analog; questions of how much to do something, rather than what to do. Even high-level decisions of the kind, “today I will conscientiously avoid doing my laundry”, are more fuzzy and fluid than, say, the literature on action selection networks would have us believe. Where the brain does select actions it seems to do so according to mutual exclusion: I can rub my stomach and pat my head at the same time but I can’t walk in two different directions at once. This doesn’t mean that the rest of my brain is of one mind about things, just that my basal ganglia know not to permit all permutations of desire. An artificial lifeform will have to support multiple goals, simultaneous actions and contingent changes of mind, and my model needs to allow for that. Winner-takes-all networks won’t really cut it.

Muscles tend to be servo-driven. That is, something inputs a desired state of tension or length and then a small reflex arc or more complex circuit tries to minimize the difference between the muscle’s current state and this desired state. This is a two-way process – if the desire changes, the system will adapt to bring the muscle into line; if the world changes (e.g. the cat jumps out of your hands unexpectedly) then the system will still respond to bring things back into line with the unchanged goal. Many of our muscles control posture, and movement is caused by making adjustments to these already dynamic, homeostatic, feedback loops. Since I want my creatures to look and behave realistically, I think I should try to incorporate this dynamism into their own musculature, where possible, as opposed to simply moving joints to a given angle.

But this notion of servoing extends further into the brain, as I tried to explain in my Lucy book. Just about ALL behavior can be thought of as servo action – trying to minimize the differential between a desired state and a present state. “I’m hungry, therefore I’ll phone out for pizza, which will bring my hunger back down to its desired state of zero” is just the topmost level in a consequent flurry of feedback, as phoning out for pizza itself demands controlled arm movements to bring the phone to a desired position, or lift one’s body off the couch, or move a tip towards the delivery man. It’s not only motor actions that can be viewed in this light, either. Where the motor system tries to minimize the difference between an intended state and the present state by causing actions in the world, the sensory system tries to minimize the difference between the present state and the anticipated state, by causing actions in the brain. The brain seems to run a simulation of reality that enables it to predict future states (in a fuzzy and fluid way), and this simulation needs to be kept in train with reality at several contextual levels. It, too, is reminiscent of a battery of linked servomotors, and there’s that bidirectionality again. With my Lucy project I kept seeing parallels here, and I’d like to incorporate some of these ideas into my new creatures.

This brings up the subject of thinking. When I created my Norns I used a stimulus-response approach: they sensed a change in their environment and reacted to it. The vast bulk of connectionist AI takes this approach, but it’s not really very satisfying as a description of animal behavior beyond the sea-slug level. Brains are there to PREDICT THE FUTURE. It takes too long for a heavy animal with long nerve pathways to respond to what’s just happened (“Ooh, maybe I shouldn’t have walked off this cliff”), so we seem to run a simulation of what’s likely to happen next (where “next” implies several timescales at different levels of abstraction). At primitive levels this seems pretty hard-wired and inflexible, but at more abstract levels we seem to predict further into the future when we have the luxury, and make earlier but riskier decisions when time is of the essence, so that means the system is capable of iterating. This is interesting and challenging.

Thinking often (if not always) implies running a simulation of the world forwards in time to see what will happen if… When we make plans we’re extrapolating from some known future towards a more distant and uncertain one in pursuit of a goal. When we’re being inventive we’re simulating potential futures, sometimes involving analogies rather than literal facts, to see what will happen. When we reflect on our past, we run a simulation of what happened, and how it might have been different if we’d made other choices. We have an internal narrative that tracks our present context and tries to stay a little ahead of the game. In the absence of demands, this narrative can flow unhindered and we daydream or become creative. As far as I can see, this ability to construct a narrative and to let it freewheel in the absence of sensory input is a crucial element of consciousness. Without the ability to think, we are not conscious. Whether this ability is enough to constitute conscious awareness all by itself is a sticky problem that I may come back to, but I’d like my new creatures actively to think, not just react.

And talking about analogies brings up categorization and generalization. We classify our world, and we do it in quite sophisticated ways. As a baby we start out with very few categories – perhaps things to cry about and things to grab/suck. And then we learn to divide this space up into finer and finer, more and more conditional categories, each of which provokes finer and finer responses. That metaphor of “dividing up” may be very apposite, because spatial maps of categories would be one way to permit generalization. If we cluster our neural representation of patterns, such that similar patterns lie close to each other, then once we know how to react to (or what to make of) one of those patterns, we can make a statistically reasonable hunch about how to react to a novel but similar pattern, simply by stimulating its neighbors. There are hints that such a process occurs in the brain at several levels, and generalization, along with the ability to predict future consequences, are hallmarks of intelligence.

So there we go. It’s a start. I want to build a creature that can think, by forming a simulation of the world in its head, which it can iterate as far as the current situation permits, and disengage from reality when nothing urgent is going on. I’d like this predictive power to emerge from shorter chains of association, which themselves are mapped upon self-organized categories. I’d like this system to be fuzzy, so that it can generalize from similar experiences and perhaps even form analogies and metaphors that allow it to be inventive, and so that it can see into the future in a statistical way – the most likely future state being the most active, but less likely scenarios being represented too, so that contingencies can be catered for and the Frame Problem goes away (see my discussion of this in the comments section of an article by Peter Hankins). And I’d like to incorporate the notion of multi-level servomechanisms into this, such that the ultimate goals of the creature are fixed (zero hunger, zero fear, perfect temperature, etc.) and the brain is constantly responding homeostatically (and yet predictively and ballistically) in order to reduce the difference between the present state and this desired state (through sequences of actions and other adjustments that are themselves servoing).

Oh, and then there’s a bunch of questions about perception. In my Lucy project I was very interested in, but failed miserably to conquer, the question of sensory invariance (e.g. the ability to recognize a banana from any angle, distance and position, or at least a wide variety of them). Invariance may be bound up with categorization. This is a big but important challenge. However, I may not have to worry about it, because I doubt my creatures are going to see or feel or hear in the natural sense. The available computer power will almost certainly preclude this and I’ll have to cheat with perception, just to make it feasible at all. That’s an issue for another day – how to make virtual sensory information work in a way that is computationally feasible but doesn’t severely limit or artificially aid the creatures.

Oh yes, and it’s got to learn. All this structure has to self-organize in response to experience. The learning must be unsupervised (nothing can tell it what the “right answer” was, for it to compare its progress) and realtime (no separate training sessions, just non-stop experience of and interaction with the world).

Oh man, and I’d like for there to be the ability for simple culture and cooperation to emerge, which implies language and thus the transfer of thoughts, experience and intentions from one creature to another. And what about learning by example? Empathy and theory of mind? The ability to manipulate the environment by building things? OK, STOP! That’s enough to be going on with!

A shopping list is easy. Figuring out how to actually do it is going to be a little trickier. Figuring out how to do it in realtime, when the virtual world contains dozens of creatures and the graphics engine is taking up most of the CPU cycles is not all that much of a picnic either. But heck, computers are a thousand times faster than they were when I invented the Norns. There’s hope!

Follow

Get every new post delivered to your Inbox.

Join 681 other followers