Brainstorm #3 – cheating doesn’t pay (sometimes)

I was going to write about self-organizing maps and associative links next but I need to make a little detour.

One of the quandaries when making a virtual artificial creature, as opposed to a robot, is how much to cheat. In software, cheating is easy, while simulating physical reality is hard. And I’m writing a video game here – I have severe computational constraints, user expectations (like the need to simulate many creatures simultaneously), and a very limited development schedule. Hmm… So let’s cheat like mad!

Oh, but… For one thing, cheating is cheating. I think Creatures was successful in large part because I was honest. I did my genuine best (given the constraints of technology and knowledge) to make something that was really alive, and this time I plan to be even more tough on myself and do my best to create something that really thinks and might even be conscious.

There’s also an intellectual reason not to cheat more than I can help, though. Cheating doesn’t pay. I’ll walk you through it.

Take vision, for instance. How am I going to handle the creatures’ visual systems? The honest, not-cheating-at-all way would be to attach a virtual camera (or two) to each creature’s head and use the 3D engine to render the scene from the creature’s perspective onto a bitmap. This would act as the creature’s retina, and the neural network would then have to identify objects, etc. from the features in the scene. Well that’s not going to happen. For one thing I can’t afford that much computer power inside a game. For another, it would involve me solving ALL the major challenges of visual science, and even at my most ambitious I can see that’s not going to be feasible between now and next summer.

At the other extreme, in Creatures I simply told the norns the category of the object they were currently looking towards. If they looked towards  a ball, the “I can see a toy” neuron was stimulated. If it was a carrot, the “I can see food” neuron lit up. It was the best I could do twenty years ago but it won’t cut it now. So I need something in-between.

But it’s harder than it at first appears. We don’t just use vision for recognizing objects; we use it for locating things in space and for navigating through space. Everyday objects can be treated as simple points, with a given direction and depth from the creature. But a close or large object extends over a wide angle of view. A wall may occupy half the visual field and the objective may be to walk around it. You can’t treat it as a point.

How should my creatures navigate anyway? The obvious way to handle navigation is to use a path-planning algorithm to find a route from here to there, avoiding obstacles. All the information is there in the virtual world for this to happen. Trying to do it from the creature’s own limited sensory information and memory seems like a ridiculous amount of effort that nobody will ever recognize or appreciate.

But here’s the thing:

Relating objects in space involves associations. Forming a mental map of your world is an object lesson in associative memory. Navigating to a target location is remarkably similar to planning, which in turn is remarkably similar to simulating the future, which is the core of conscious experience and the very thing I want to understand and implement. Come to that, navigating is very akin to servoing – reducing the distance between where I am and where I want to be. And for humans at least, this is a nested servo process: To go from the backyard to the shed to get a tool, I need first to go into my kitchen closet and get the key. To get into my kitchen I need to go through the back door, which is in the opposite direction to the shed. Then I have to go to the closet, reach for the key and then backtrack towards the shed. It’s a chain of servo actions and it’s nonlinear (the ultimate goal is reached by first moving away from it). These are precisely the things that I set out in Brainstorm #1 as the features I’m looking for. If I cheated, I might not even have seen the connection between visually-guided navigation and thinking.

In the brain, we know that there are “place fields” in the hippocampus (an older, simpler, curly fringe of the cortex). As far as I know, there’s no evidence (and it doesn’t seem awfully likely) that these “points of best representation” (see Brainstorm #2) are arranged geographically. I’ll have to catch up on the latest information, but it seems like these memories of place form a different kind of relationship and I can’t assume the brain navigates using a map in the conventional, geometrical sense. But somehow geographical relationships are encoded in the brain such that it’s possible for us to figure out (far better than any current computer game) how to get from A to B. We’re capable of doing this with both certain knowledge and uncertainty – navigating around moving obstacles, say, or traveling through unfamiliar territory. This is SO similar to goal-directed planning in general. It’s SO similar to predicting possible futures. All that differs is the kind of associative link (“is to the north-east of” instead of “tends to be followed by” or “is like”). There has to be a coherence to all this.

For a brief moment then I imagined a declarative statement written in PROLOG! God no! Please don’t make the brain a forward-chaining planner or an expert system! It’s interesting that the early days of AI were pretty close to the mark in some ways. Thinking IS a bit like deducing that “Mark is Sheila’s husband” from a set of predicates like “Mark is a man”; “Mark has a son called Peter”, “people who share a child are married”, etc. It IS a bit like a probablistic tree planning algorithm. But these are too abstract, too symbolic, too digital. Navigating through space is an analog process. Reaching out to grab an object is not a discrete, symbolic operation. Being fed a sequence of carefully chosen facts and rules is not the same as learning by experience. And yet…

The reasons why symbolic AI has failed are many and varied, and I don’t have the space or energy. But you can see that the early pioneers were heading in a good direction, thwarted only by some fundamental mistakes and false assumptions about symbol processing and abstraction. It was a fault of the paradigm and tools of both science and mathematics, not intent.

But my point here is that my creatures need to see in a very much more sophisticated way than norns did, and yet a more abstract way than true vision. And I need to find both an external (sensory) and internal (memory) representation that is authentic enough to make visually guided behavior a NATURAL PART OF thinking. The two are so close in concept that they must surely share a mechanism, or at least a set of computational principles. On the one hand this adds new problems – I have to think about navigation, obstacle avoidance, visual binding, retinotopic-to-egocentric conversion, egocentric-to-geographical conversion and a bunch of other things on top of all my other problems. On the other hand, by not cheating (too much, whatever that means) I’m now blessed with a whole new set of symptoms and requirements that give me a better grasp of what must be going on in the brain. It will help me see the central design problem more clearly. This, incidentally, is the reason why we should all be trying to create complete organisms, instead of fooling ourselves that the problem can be divided up and solved piecemeal.

I don’t know the answers to any part of this yet and there will be many future posts on visual representation, etc. But I’m glad I thought about this before starting to think more closely about associative links.

About stevegrand
I'm an independent AI and artificial life researcher, interested in oodles and oodles of things but especially the brain. And chocolate. I like chocolate too.

46 Responses to Brainstorm #3 – cheating doesn’t pay (sometimes)

  1. Jason Holm says:

    I think rendering the scene from the creature’s POV is good, but I don’t think you need to analyze a bitmap — the beauty of 3d objects. You should just be able to grab data from the scene — “There are 200 objects in my view. Based on the quality of my retina (dpi), 100 of them are less than three square pixels — so they’re either too far away or too small for me to care. 50 of them are 100% behind other objects, so I obviously can’t see them. 50 of them are large enough for me to care about and only partially obstructed. Can I see color or just black and white? Chances are, 45 of the objects all share the same average texture color, meaning they’re too common to grab my attention unless I’m specifically looking for them. There are five objects that interest me. One is 80% obstructed, so I need to change my view if I want to be able to analyze it. The others are large enough, unobstructed enough, and high enough of contrast for me to send off to Stage 2 visual processing.”

    And being 3D, you can cheat and check to see if the creature has already seen one of the objects before, automatically rotate it to the same angle and distance, and render it. Now you have two images – one from memory, and one from the eyes. Compare the two with some sort of image algorithm to see if the creature would recognize it. If it does, you can stop processing it — if it’s still in view and unobstructed enough, you can just assume the creature still sees it and still remembers what it is. If the creature turns its head and turns back, it may or may not be able to find it again easily, depending on how its short term memory works.

    Of course, this leaves off the whole processing of “I’ve never seen this before, but it looks a lot like something ELSE I have seen…”

    • stevegrand says:

      Yes, but you’re missing my point. There are endless ways to cheat, but unless you’re really careful they destroy the very thing you’re trying to create. What you’re suggesting is just hard-coding, not intelligence. And it’s not in a language that brains understand. It doesn’t work. It hasn’t worked for 50 years. Give it up. If you continue that line of thinking you’ll tell me what the creature is supposed to do in response to these pieces of data in the second stage of visual processing, how it is meant to react, how it should carry out those actions, etc. and what you’ll end up with is a simulation of something vaguely like a living thing that behaves almost as if it is intelligent! Get the data format wrong and you can’t rely on there being any kind of machine capable of processing it intelligently. That’s why I do biologically inspired AI. It’s only by looking at the problem from a physiological perspective that we can see what it is that brains are actually doing. I have to cheat, but not in the way you describe – that’s too much and you’re being the intelligent object, not the creature.

  2. Terren says:

    Good stuff. One little tidbit I’ve read is that the same route traversed forwards vs. backwards (to the grocery store from your house, vs. back home) activates different sets of place neurons. I.e. there isn’t a one-to-one mapping between places and neurons. Not sure I could produce a reference though.

    • stevegrand says:

      Interesting. I have a book on the hippocampus but it’s so boring I’ve not stayed awake long enough to learn anything. I’ll have to force myself.

  3. Jason Holm says:

    Too bad we can’t just hack the brains of 100 different animals and take the “cheatiest” parts of each of them, then put them back together into one big brain…

    Now I’m going to be reading about vision all night, breaking it apart into components… beats playing the Sims I suppose!

    • stevegrand says:

      > Too bad we can’t just hack the brains of 100 different animals and take the “cheatiest” parts of each of them, then put them back together into one big brain…

      Heh! If we did then either it wouldn’t work or it would be an ant!

      Given what you said about bodies, maybe you’re trying to sell me on AI Nouveau? I know the protagonists of New AI well, but only half agree with the doctrine/dogma. There is something fundamentally different between a mammal’s brain and that of an invertebrate (cephalopods perhaps being an honorable exception). Qualitatively different, not just one of degree. That’s the focus of my interest – the general-purposeness of mammalian intelligence, but I do take your points about the interaction between brain and body, etc.

  4. James says:

    great stuff! it’s exciting to hear about it and I look forwards to seeing more of the details down the line.

  5. Jason Holm says:

    “a ridiculous amount of effort that nobody will ever recognize or appreciate”

    Sadly, that’s kind of the point. People only recognize AI in games when it either DOESN’T work, or does something NOVEL, but that novelty can’t be continuous or it loses its effect.

    In films now, it’s become commonplace to create entire scenes in 3D. A designer knows they’ve done a good job when nobody ever mentions the look of the scene’s effects – if they did their job well, the brain just accepts it. If it has a hint of artificiality, people notice and talk about it, even if it was good. If your creatures navigated using an internal model and nobody ever noticed, I wouldn’t consider that a waste of time — I’d consider that the success.

    But it is a game, so there are times where you want to grab their attention by doing something unexpected. And sometimes it’s the simplest things. I remember two instances from the game “Thief” that blew me away:

    The first level of the game had a heavily armed front door, implying you should really sneak through the back exit. However, the guards had bad predictable AI, so you could shoot one and run off into a sewer hole. The guard would come to the top of the hole looking for you, but never come down. So I shot him again from below. He knew I was there, so he stayed there, letting me finally kill him. I did that with all the outer guards. I then piled their bodies in front of the gate. Due to a clipping error, I managed to open the front gate by “corner peaking” through the gate to where I could reach the opening mechanism. I went in and closed the gate behind me.

    So there I was in the main hall snooping around, and I either alerted a guard by being too loud or by a wounding arrow shot. Either way, he did his “assault! alert! help!” routine, and ran to the front gate yelling “guards! backup!” or whatever. He opened the gate, ran out, found a pile of his buddies, and I distinctly remember the voice recording going “Oh, sh*BLEEP*”.

    Secondly, I was snooping around upstairs. I opened a door and was rummaging around a room. Suddenly, from out in the hallway, I hear “the door is open… someone bust have been here…” and I of course freak out.

    These two examples were obviously hard-coded, but my point is that people will ignore good AI, will definitely pay attention to bad AI, and will get excited about novel AI, so even if nobody ever recognizes or appreciates all the work on mental model navigation or whatever, it is still worth it. This is about simulating a mammalian brain, right? If all you want is a popular game, the brains just need to blow up after a headshot.

    I guess the question is, are you trying to make a game or a simulation, and if you want a combination of the two, where on the spectrum will you be happiest with the result.

    (on a side note, I’ve actually never heard of AI Nouveau before — I just understand more about biology than computer science so that’s why I come at it from the angle I do. If I remember correctly, that’s how Thomas Ray came up with Tierra — he didn’t know what sort of things a computer was or wasn’t capable of, so he just wrote it in a way he knew life was capable of. I keep getting lost on Paragraph One of any “Artificial Neural Networks for Dummies” I read, so it’s obvious I need to brush up on some algebra and read a little calculus before I really have any clue how you’re going to make this thing work!)

    • stevegrand says:

      Good points.

      Hah! Forget calculus and algebra – they’re the reason neural networks never got anywhere! Stick to the neurobiology – I do. Yes, Tom did create Tierra from a biological perspective (although with a distinctly computational flavor, ironically). My simulations are more structure-oriented than process-oriented, but that’s a long story.

      AI Nouveau is a big topic too, but basically the dismal failure of top-down, symbolic, logic-oriented AI led to a bit of a revolution, in which people started to recognize several things: Firstly, that playing chess is easy, while turning on a faucet is incredibly difficult, so we should focus on faucets. Secondly, a lot of things that seem like intelligence in nature are actually consequences of their physical structure or a result of remarkably simple control systems. And thirdly, that invertebrate nervous systems are generally constructed from collections of such specialized, clever-tricky subsystems, many of which play off one another to make more complex behaviors possible.

      All of that is true, but sadly it has become a dogma of its own, in recent years, and now there’s a tendency to think of a human being as an overgrown insect. This is simply not true. Insects “chose” a neural system that is very easy for evolution to modify in clever ways. Hence insects have filled every niche and have a massive variety. Mammals, otoh, plumped for a nervous system that is easy to modify during an organism’s own lifetime (e.g the ability to learn by kind, not just by degree) and took a more general-purpose route. Ants are not overgrown sea slugs and humans are not overgrown ants – there are fundamental computational differences in their nervous architecture – but to listen to many current AI/Alife/animat researchers you’d think we were just a collection of specialized modules. There are good neurobiological and mathematical reasons why this isn’t so.

      I’m trying to make a simulation, which people can interact with for their entertainment. It’s not a game in the conventional sense – more like Sim City than Duke Nukem. No plot. No “yerhafters” (as in “first yerhafter kill the troll, then yerhafter eat the mushroom…”). Just creatures that intrigue people and make them want to explore and investigate them. Creatures people can legitimately believe in and relate to.

  6. Vegard says:

    This discussion reminds me of the chapter of “Creation: Life” where you had the impossibly fat atom simulator machine (and the copier gun) and the discussion of which level to simulate things on (simulating atoms is probably more realistic, but is technically impossible, while having direct lines into the brain telling the creature whether something is edible is less realistic, but easier to implement).

    What is the middle ground?

    How about only telling the creature about things that are directly observable, for example colours, shapes, and patterns? This gets you past the problems of recognising images (you don’t put a picture of the banana on the creature’s retina, you signal “yellow”, “curved”, “small”, etc.) but also lets the creature learn and decide for itself what it is seeing (it doesn’t know it’s edible before it’s tried to munch on it).

    And — thanks for the new blog posts, I’ve really missed reading them! (I wanted to put a comment on your “is there anybody still here?” post, but it wasn’t open for comments.)

    • Jason Holm says:

      The thing I picked up about vision last night was that, for all the visual tricks eyes and brains use to understand the world (motion parallax, perspective, relative size, occlusion, depth from motion), it all starts with being able to separate in our heads a gravel road as one object, and a single rock on that road as an individual object. From a visual aspect, where does an apple stop being an apple and start being a stem, or a branch, or a tree, or a forest, or a continent, or a planet? We can’t see with our eyes any division from one to the next unless they are physically separated, and even then, it’s all still just colored lights on our retina. Something in our brains breaks them into distinct objects, and each species does it differently based on how the “parts” are important to it.

      The “cheat” would be that a 3D world is BUILT with parts. Steve isn’t going to be building his world with atoms, nor will everything be one giant mesh. Trees and landscape and removable apples will all be individual objects. The work has already been done for the brain — the questions is how realistic is it to give the brain this info ahead of time vs analyzing a flat bitmap.

      This all reminds me of Katamari Damacy. When you’re small, things like houses are obstacles. Get big enough and they become collectible objects.

      To a deer, a tree is an obstacle to be avoided. To an elephant, it’s an obstacle to be removed. To a beaver, it’s raw building material that needs to be shaped and moved — or eaten to feed the gut bateria it feeds on. To a squirrel, it’s a vertical landscape with internal rooms. To a monkey, it’s a series of paths and handholds. All of these ways of seeing a tree as an individual object are vast and varied, yet they’re all performed by mammalian brains.

      Explains why Spore creatures can’t climb trees, dig burrows, land on tree tops after flying, knock down trees, hide inside trees, or do ANYTHING but pick fruit off of them. It also explains why Steve is a far braver man than I to find a realistic yet feasable solution…

    • stevegrand says:

      > How about only telling the creature about things that are directly observable, for example colours, shapes, and patterns?

      Yes, I think you’re right that some kind of pre-extraction of features like this is the best tack to take. Although like Jason says, there are issues with the spatial arrangement of these features, their segmentation into objects and the binding of features together. A car is smooth, hard and shiny, but so is a cup. Sometimes it’s the arrangement of features, more than the features themselves, that defines an object.

      I have a few ideas about this and there are some interesting ways to cheat, based on affordances. I thought about this a long time ago for an ecosystem simulator collaboration that (like most collaborations) went nowhere – I’ll have to look out my notes. But the spatial issues trouble me yet.

      There are also some important questions about coordinate transforms – how best to “paint” the ever-changing view from the eye onto an egocentric frame, so that the visual world becomes stable and objects develop spatial relationships. We can’t stop ourselves from feeling like we’re “looking around the world”, and yet the world is actually swimming past our eyes. Somehow we combine what we see with knowledge of where our eyes are pointing, to produce a stable perception of a static world. Some clever things have to happen behind the scenes here, but there are several ways to approach it and I need to think it through.

      This is a topic for a post in a few days, I think!

      • Jason Holm says:

        Is it easier for a creature to develop proprioception than to navigate around a world space? Does one build on the other? I’m just curious if Lucy, who was basically bolted to a table, had some way to read back her servo positions if you disconneted her eye(s), and if the ability to do that first and adding on vision and locomotion wasn’t how nature worked…

        Did primitive critters develop “touch sensors” or “proprioception sensors” first?

        I would think being able to understand spatial boundaries and ones position within them first would give the vision systems something to add to rather than build from scratch? Since Hellen Keller managed to feed herself and get around, sight and sound seem like secondary sensors for getting around a world and interacting with things in it…

        Or am I getting too non-mammalian again?

      • stevegrand says:

        > Did primitive critters develop “touch sensors” or “proprioception sensors” first?

        Touch. Well, actually smell/taste was the first ever sense, and is still to this day handled differently from the other senses in the mammalian brain. Even bacteria have chemosensing. But touch came a close second in multicellular creatures.

        I take your point. Yes, Lucy did have proprioception. Her muscles were servomotors, but I connected them to her limbs by springs and had an extra potentiometer on the limb joint. With a bit of fancy software I could simulate pairs of antagonistic muscles, including the ability to let her limbs go slack (e.g. so that I could guide her, like teaching someone a golf swing) and taut (resistive to external forces). She could also adjust her muscles to adapt to a changing weight or force.

        My logic was pretty much as you suggest: I figured that we learn to see depth because we have arms that can reach out and touch things, and we learn to reach out and touch things because we can see. Each calibrates the other. I think the interaction between senses is really important (Helen Keller notwithstanding!).

        Making sense of the 3D world MUST surely depend on knowing that you’ve bumped into things, otherwise it’s just a lot of pixels with some hard-to-grasp statistical properties. Babies are primarily tactile and you can see them trying to come to terms with what they see, based on what they feel (which seems their more innate sense). But then we grow up and about half our brain gets devoted to vision, so we end up highly visual. A lot of that real estate is devoted to relating vision to proprioception and touch, though, and orienting oneself in space.

        For sure my creatures will need both touch and proprioception. Up to the limits of the physics engine, anyway!

  7. Jason Holm says:

    “Insects ‘chose’ a neural system that is very easy for evolution to modify in clever ways. Hence insects have filled every niche and have a massive variety.”

    “Mammals, otoh, plumped for a nervous system that is easy to modify during an organism’s own lifetime (e.g the ability to learn by kind, not just by degree) and took a more general-purpose route.”

    *mindblown* — Please, tell me someone has taken these two ideas and filled an entire book about them, or that you will be doing so after making this game.

    • stevegrand says:

      Heh! I’ve always thought it was self-evident. Maybe I’m wrong, but that’s how it seems to me. Arthropods in general discovered the secret of evolutionary flexibility by becoming sequences of modifiable segments, each with a pair of legs, gills and ganglia. Insects took that and ran with it. An insect’s head is made from three modified segments (e.g. legs that became mouth parts) and hence three paired ganglia (one handles olfaction, one vision and one integrates the others, if I remember right).

      There’s something very clever about these ganglia: they’re not just ad hoc networks of neurons but have some massive parallelism. It was easy for evolution to repurpose the modules, and their parallelism made cooperative sensing (e.g. vision via a compound eye) possible; something denied to simpler invertebrates. I tend to liken this modularity to FPGAs and other configurable electronic devices made from large arrays of very simple programmable elements.

      These highly evolved cousins of ours have occupied a vast array of niches and developed many neurally-based solutions, to navigation, predation, etc. Open up a new niche and an insect will quickly evolve to fill it. They can even learn by degree – a desert ant can learn unique landmarks to use for navigating to its burrow – but they can’t learn by kind (develop new skills or strategies). The desert ant can ONLY learn landmarks and use them to navigate by – it can’t do anything else with this information.

      Meanwhile, along our own evolutionary lineage, amphibians developed a highly modular architecture of their own, in the form of the thalamus, but it, too, is pretty ontogenetically inflexible. Then came the cortex and real learning and intelligence became possible. It’s still a debated question about how much of cortical architecture is genetic and how much developmental, but I’m firmly in the developmental camp and the evidence is tending that way. It’s a generalized structure that’s capable of wiring itself up through experience to perform a huge variety of computations. At some level these must all be the SAME computation, since cortex is remarkably homogeneous and most architectural variations can be accounted for through plasticity. But the variety of things cortex can do – and LEARN to do – is amazing. We don’t innately have piano-playing circuits, nor reading circuits. And yet the recognition of letter forms has a very distinct location in the brain. It’s a self-organizing machine and very capable of learning new tricks, even in comparatively primitive mammals (as long as their ecological niche makes the right demands on them). Cows learn about being milked, rats learn to press pedals and run mazes, horses learn commands, dogs learn to please humans, etc. Evolution helps with this (some breeds of dog are better than others) but most of it occurs through experience.

  8. bill topp says:

    hold on thar pilgrim. you’re discussing writing this game for “the computer”. there’s no such thing. with the PC architecture every box is at least in some little way unique. you might reply that you’re writing it for the lowest common denominator computer or the consensus computer. there’s a whole world of wonder (and hurt) out there in PC land. is it possible that you should first ask yourself for WHICH computer you’re programming and then make sure you’ve got that computer? if you can see that you can code your concepts but only for a box with two graphics cards and a physics card perhaps the decision to do that would advance the process. it is known that people who want a game are willing to buy hardware.

    • stevegrand says:

      Yes I know – I spent ten years in the games industry and we think about things like that! I always code on a machine that’s hot enough to handle the extra load of debug code and tools, but a little bit behind what the keenest gamers have now. By the time a product is released, this is pretty tame hardware. The general trick is to switch features on or off according to the power of the end user’s PC.

      But in your email you raised an important point worth talking about here (and maybe in a post). I’m NOT trying to simulate the human brain – I’m taking INSPIRATION from it. They’re very different things. While I think all this through I’m trying to stick to a mental model involving biological neurons with zillions of synapses, etc. That’s totally impractical, but I don’t care at this stage. The important thing is that thinking like a computational neuroscientist ensures that I have a coherent idea and haven’t fallen foul of the common tendency to abstract each element (deal with planning separately from navigation, say) and then find they simply don’t work together. Once I have a workable concept then I’ll look for a more abstracted (but still genetically definable) mechanism that’s computationally tractable. I keep all this in mind as I go, but it may not look like it in my blog posts. There are stages to this process. Maybe I really should blog about how I go about these things – it might interest some people. I’m a big believer in lateral thinking.

      My dad taught me about the value of believing in skyhooks while inventing things. Later on, you can deal with the fact that no such things exist, but if you start out too obsessed with why something’s not possible, it never will be.

  9. John Harmon says:

    Thanks for this blog — one of the most substantive and interesting I’ve run across on large scale brain function… Good luck with your project!

    I wanted to offer my opinion regarding a question that you recently posed: how does the brain creates a stable world perception dispite head and/or body movement through that world? My opinion is that the mechanism of priming (or “prediction”) is largely responsible for creating the ongoing stablity of visual objects, their spatial relationship to one another, and the stability of the visual field as a whole.

    For example, if you were to walk down a corridor, the perception of the walls and nearby objects would constantly be changing their spatial position, relative to your head and the rest of your body. As you move forward, automatic predictions are being made by the visual system. For example, the awareness of what the visual field is looking like + where I intent to move next would generate a visual prediction, paraphrased as, “if I were to move forward and slightly to the left at a certain speed, then the visual field will probablly look like X.” The visual system is using immediate past and current sensory input, and knowledge of movement intention, to predict what the visual field will be percieved as (where the colors and shapes will be), if that anticipated movement were to occur.

    In other words, the act of locomotion triggers a collection of spatial/visual memories that are most strongly associated with present perceptual experience and future action. This set of memories becomes activated just ahead of the movement. These memories are in the form of a set of possible visual representations, activated with varying degrees of strength. The memories that most closely match (1) present perception and (2)future predictions, are the memories that will be most strongly energized. The most strongly energized memories are the memories that create a person’s visual field experience.

    A representation of the visual field is created/iterated in this manner every 1/10 a second, resulting in the perception of a stable world where everything “hangs together.”

    This process of how the visual field is created through time relates to your concept of a servo process, whereby future goals are created and then the agent works towards minimizing the difference between goals and experience. However in this case, the future predictions/goals are created “bottom-up,” by perceptual experience memories themselves (not by a top-down goal). The difference between prediction and experience in this case is minimized every 1/10 of a second (both in the present and into the future).

    • stevegrand says:

      Thanks, John!

      Yes, what you say about visual prediction makes sense. It can’t be the whole story – somehow we have to convert the image that’s streaming past our eyes into egocentric and geographical coordinates, e.g. so that we can remember the locations of objects that have gone out of sight and relate them to objects we can see. But in the shorter time frame I think you must be right. Hubel and Weisel’s V1 “edge detectors” are usually tuned to motion and have wide receptive fields. It seems likely that a cell tuned to the movement of a 45-degree line heading in a particular direction would be laterally connected to other cells with the same preferences in the spot where such a line is likely to appear shortly. If it does, then the prediction worked; if it doesn’t then some kind of surprise has occurred and attention needs to be refocused, the context needs to be updated, etc.

      I’m just about to go on to think about predictive and associative links in general, so I’ll bear in mind what you say about this lower level. I’m sure the same general principles apply throughout.

  10. Paul Almond says:

    Hello, Steve

    I understand what you are after. You want to cheat enough with vision to allow it to be workable in the game, but you do not want to cheat so much that you end up processing high-level sensory data and your brain then has to start containing variables corresponding to high-level aspects of the world, forcing how it can work.

    Here is an idea.

    Have vision, but make it much easier to determine distance and shape by overlaying a “standard texture” onto every object. Make this texture very simple, and instead of “seeing” every “pixel” in the image of this texture, have your eye see only representative parts of it.

    An example will better illustrate what I mean.

    Suppose someone painted every object in your field of view black. Then they stuck little white dots everywhere, evenly spaced out in a regular pattern. If you imagine overlaying a grid texture onto every surface, the white dots are placed where the intersections of the lines would be. You only feed information about the white dots into your creatures’ eyes.

    This would simplify things a lot. Distance would be easier to deal with. If an object is further away, the dots would tend to be closer together. However, it presents the creature with some issues. If a surface is presented at an angle, the dots are also closer together. If the dots are close together is that a surface at an angle or one a long way away? The creature would need to look at the context to get some idea. It seems to me that this might be a positive thing, because it makes vision a bit more tractable, but only to a degree – and I think that is what you wanted.

    If you wanted a more sophisticated version, you could make the dots “different colors” (Of course, I just mean that each dot is identified as belonging to a particular class), and make the colors relate to “what things are supposed to be made out of”. Vegetation might be one color, and animals might be another, for example.

    One way of helping the creature a bit might be to have a low density of dots on objects in most of the field of view, but to have the density of dots higher on objects near the center of the field of view – or in some part of the view that the creature’s brain selects. The creature could decide what it “wanted” to look it in more detail, with more information processing.

    One possibility is to give creatures an extended and/or simplified sense of touch. Maybe allow them to use a range finder, but have some disadvantage associated with it. maybe it is only short range. Maybe it costs energy to use. Maybe it can only be moved around slowly. The idea should be that clever creatures quickly associate the visual data with the rangefinder information and start to use the visual data more.

    I think the issue here might be whether a visual system like this is abstract enough – whether it sufficiently reduces the amount of computation required – but it may at least help in forming an idea of how much abstraction is wanted.

    • stevegrand says:

      Thanks Paul. Yes, that’s exactly the issue. I think the dots would probably still be too much on the realistic side for what I need, though. I have such limited computer power available and a lot of creatures to process in realtime (even those that are off-camera need to keep functioning). But it’s an interesting scheme. It would be fun to know how much a human being could see, presented with such arrays of dots (and no edges). My guess is it would be difficult for us unless we or the scene were in motion. Have you seen people with motion-capture suits on? It’s easily possible to understand what they’re doing, even though all we can see is a handful of moving dots. Arrays of static dots have ambiguities, as you say, which would be resolved if the eye was able to scan or move into the scene. It would be a challenge to think of a mechanism that could handle this, though. The fact that humans could probably do it easily is very suggestive. I just wish I knew what it was suggestive OF!

      • Gryphon says:

        Isn’t it just suggestive of how good we are at interpreting and finding the patterns in ANY kind of data stream? I mean, people can learn to “see” with their tongues, if you stick a grid of microelectrodes on there hooked up to a camera. It doesn’t really matter what the channel for the data is, once we realize how we ought to be processing it and what patterns we should be looking for.

        Dots are awesome; I think the furthest we can reduce a person, and still have the avatar be socially meaningful in a full-richness-of-human-experience, 3D kind of sense, is three dots; one on the head to convey gaze, which is super important for shared attention and social interaction etc., and one on each hand to give a couple of channels to infer the kinetics of the rest of the body by and convey agency and intent and various gestural verbs and stuff. So long as they were very honest and high-fidelity–say, generated from someone in front of a Kinect–I think there’s a vast amount of bandwidth in three dots, both for social intent and physics-based perception of the other person as somebody also existing in space.

        I think it’s all in the patterns. Once we ferret out the pattern and recognize it, it doesn’t matter whether it’s dots or a series of electric shocks to the tongue; we see, “that’s human,” or “that’s behaving like a physics object,” or “wait, let’s try processing this as visual information” and the pattern in the data speaks.

        (Of dots, though, even a single point is not to be underestimated. There’s this freeware little game called Transformice, where everyone bounds around in 2D space collaborating to survive various obstacles to get to the cheese. It’s astounding what you can gather, in a social sense, about the intentions of other players from just their cutesy little points. You do get a big helping hand from context, obviously, of the stage and the obstacles and everybody’s shared goals (get cheese, show off, don’t die). But still! Even one dot, so long as it displays some kind of agency, can speak to the social centers of the brain.)

      • stevegrand says:

        Absolutely. I just wish I knew how we do it!

        One of the many things that baffle me is how the brain manages to create what I can only describe as “vector” representations from pixel inputs. All that stuff about moving dots seems to require information about spatial relationships in a scaleable form, and yet almost all our sensory inputs are in pixel form. Two dots side by side ten degrees apart looks very much like two dots side by side 11 degrees apart to our minds, and yet on the retina they have nothing whatsoever in common – one scene triggers neurons 12,345 and 18,543, while the other triggers 11,765 and 19,873. How do we relate things to each other across pixel spaces with no looping, no pointers and all the other “moving parts” that computer programs have? Neurons just sit there.

        How do we know that someone’s just traced out the letter S on our leg with a finger, when we learned about S’s by looking at them with our eyes? How do we know that the two dots we see now are the SAME two dots that we saw a moment ago in a slightly different position? Why do people sometimes feel like their hands or other objects have swollen up to huge proportions? How do we rotate an image in our minds? AAARGH! I’m missing something somewhere. This is one of those things where people tell me “it’s obvious innit?”, until I ask them to show me exactly how it works. Somehow information can be related across neural space in an incredibly powerful way. I’ve tried all sorts of ideas, including Fourier transforms and convolution, and still can’t figure it out. AARGH! AARGH!…

      • Vegard says:

        I don’t think we’re as good as you believe we are.

        You can read a book upside down, sure, but you’ll be very slow. In my own experience, the problem is that I can no longer recognise whole words, so I have to resort to decoding the words letter by letter.

        Now, you can ask, “but how do you turn the letters around in your head”? Well, I cannot say for sure. But what I know is that I still have to think a bit about it in most cases — especially for characters which are similar when they are rotated (p/d and b/p are some of the most difficult, but also u/n, also depending on the font).

        I also find that my reading gets slower with the number of degrees the text is rotated at. I can read moderately fast at 45 degrees, but not as fast as at 0 degrees. At more than 90 degrees, I need to resort to the letter-by-letter analysis.

        I am pretty sure it is possible to learn how to read upside down at full speed. Just like some people learned to speak in reverse (yep, they did: Here’s an episode of what I believe is the Norwegian equivalent of “Britain’s got talent”: — If you can’t see it, I can try to provide a separate download). But you need to learn it, there’s no magic trick of the brain.

        The fact that we have to think about it when doing the “slow decoding” suggests to me that there’s something going on at the higher levels of the brain. And by higher levels, I mean conscious thought. So when I see an upside-down Q, I see “O with a diagonal strike on the top”. And I believe I recognise this diagonal strike as the tail that we usually put in a different place and, together with the fact that “things appearing in the lower right corner move to the upper right corner when we rotate it by 90 degrees”, my conscious mind is able to work it out. It’s difficult to explain exactly — I suggest playing with yourself and seeing how you react to seeing upside-down letters (even pictures; try your loved ones. You’ll for sure see new things about their faces that you hadn’t realised before) or hearing reverse speech.

      • stevegrand says:

        Yes, although as it happens, letters and faces are both special cases, so I think they’re a bit misleading. Like you say, we have difficulty recognising letters beyond about 45 degrees, and there are some nice illusions to show that we have specialized face-recognition circuitry.

        But mental rotation is different from recognition. I can imagine a car, say, and spin it around in my head. In 3D. No problem. Come to that I have no problem rotating ‘b’ and ‘d’ either. Visual intelligence tests usually require us to rotate somewhat complex shapes and decide which of a set of other shapes is the same as the test one. I think people probably vary in their ability to do this, depending on whether they are visual thinkers – hence its appearance in IQ tests – but for myself I find it pretty easy and the key thing is, I CAN do it. I don’t do it intellectually, by saying “well, this part should now be over there”. I simply rotate it. Experiments show that the further a shape has to be rotated to make a match, the longer it takes, and the time is proportional to angle.

        It’s not a real image, in the sense that the details kind of come and go as they’re required, like most mental imagery. But there is some rotation going on – the salient points are actually being rotated – it’s not some kind of logical puzzle where I DEDUCE the new positions. I can just see it happen. And yet none of my neurons is moving and at least part of my mental reconstruction is happening in cortical maps that we know are retinotopic. I imagine this data is being projected back into retinotopic space from a more abstract frame, but what frame??? How do we store rotation-invariant representations in the brain? And how do we tag them with their current rotation?

        I think faces and words are unusual because we so rarely see them from odd angles and we have a powerful need to read them very quickly. I bet our ability to decode upside-down letters is about the same speed as our ability to decode non-letters.

      • Vegard says:

        Ok, I agree that mental rotation is different from recognition.

        Though I still think it’s useful to use simple and concrete examples to begin to understand the more complex things.

        To be honest, I can’t really “see” things in my mind. At least not the way I see with my eyes. I can catch glimpses of things, but it’s more like a suggestion of shapes. They are glimpses because they disappear almost as soon as I’ve thought about them. Therefore I can’t really rotate an imaginary car around smoothly, as if it was animated, but I can catch these glimpses of a car from whatever angle I choose. Well, I am actually having some trouble with that too. I can’t seem to make small adjustments to the angles. It’s very easy to imagine the car at a straight angle from any of the sides. I can also sort of make a stop-motion animation in my head with the car rotating around a single axis. And I can view the car from head height at many angles. But if I try to imagine the car mostly from below, rotated slightly around several axes, I fail. Maybe it’s different for you. I think it has to do with what angles I’m used to seeing cars at.

        (About this specific example, I think you may actually have an advantage because of the 3D modelling you’ve been doing.)

        A better example than I came up with in my last post, and simpler than yours, is that of music. I can hear melodies in my head, and once I know a melody, I can imagine it at different speeds (tempo) and pitches. So I believe that the melodies themselves (that I know) must be somehow represented in a way that makes the speed and pitch easy to “modify” whenever I play it back in my head. (By the way, I believe our ears wired up so that each frequency corresponds to a nerve or a group of nerves, so that the input to the brain is effectively a spectrum analysis. This should be similar to your retinotopic map, except that the frequency spectrum is 1-dimensional instead of 2-dimensional.)

        I don’t know how the brain encodes melodies, but I suppose it must be some sort of sequence of pitch intervals. (I don’t know too much about how the brain encodes anything at all, so take the following with a grain of salt.) First of all, let’s say we represent each interval (1 half-tone, 2 half-tones, 3 half-tones, etc.) by a single distinct neuron. That’s pretty easy; they are wired up such that when we charge it, by negative feedback, we lock onto a specific frequency with our voice (be it our real voice or our “inner” voice). Exactly how the frequency is determined can be influenced by other factors, such as the root note/starting pitch (this is exactly what allows us to project the sequence of notes onto different pitch ranges); this should be easy to do with some kind of scaling of the output of the interval neurons. We can also encode durations in a similar way; eight notes, quarter notes, half notes, whole notes, etc. all get their own neuron. Then we need some sort of generic circuitry for holding a note. Each of the duration neurons are connected to this “holding a note” circuitry. The melodies themselves could be encoded as a sequence of neurons (one neuron for each note). When a specific-note neuron spikes, it signals one of the pitch-interval neurons and one of the duration neurons, and when the “timer” expires, it signals the next note. (Hm, I agree this looks more like a digital circuit than a brain, and I have no clue how the brain could actually build this on its own. And it would require a lot of neurons for just a single melody.)

        Anyway — my point of all that was to show that pitches and durations are actually very easy to scale analogly in the “projection routine” of (the internal representation of) a melody.

        Maybe the same thing holds for your mental pictures? Maybe your mental pictures of the car are not really rasterised like a 3D scene is rendered on a computer, maybe your mental picture of the car really is something simpler and more abstract than a bunch of pixels, and this is why you can rotate it so easily. You also wrote that details come and go as they’re required — maybe this means that you’re simply viewing one “part” of the image at a time, as a time sequence, and the car that you thought you saw all at once is only really there in very small parts (much like our actual vision!).

        Enough nonsense from me for a while 🙂

      • stevegrand says:

        > Maybe it’s different for you.

        I think it is. Some people are visual thinkers and some focus more on other aspects. I know people who don’t see pictures inside their heads at all, but are more able with words. If anything I’d say I can do 3D modeling BECAUSE I find it easy to rotate images in my head. I’m good at moving things in my head but lousy at remembering visual details. A painter would be much better than me at that.

        Music’s a great example to think about. It’s interesting that we can transpose tunes really easily and imagine them in a different key, but we can’t scale them, in the sense of changing the intervals between the notes. It’s easy to “hear” them played back using different instruments, too. I read somewhere that we seem to store tunes we’ve only heard played in one way (Bohemian Rhapsody, say) in a fixed key and with fixed timbre, whereas tunes like Happy Birthday are stored independently of any key signature, because we’ve heard them sung in many keys. Primary auditory cortex is a spectrum analysis, as you say, although somehow it handles both timbre and pitch. Transposing a tune in our minds would simply require us to shift the pattern of harmonics left or right. But it still leaves the question of how patterns can be projected onto and moved around in the brain. If the memory of a sound is stored in specific neurons then it has to be in a way that can reconstruct a pattern in a much less localised and fixed way. One conclusion I’ve come to is that it’s not really the neurons themselves that do the computing in the brain – it’s the patterns of activity. I can think of several ways to project and reconstruct complex patterns of nerve activity – for instance by constructive interference like in a fixed-array radar – but I can’t figure out a general scheme that fits all the evidence. It drives me nuts trying to think of a way, because I think that’s the key to everything. Meanwhile, for my game I’m using a less bothersome method…

  11. torea says:

    Thanks for the answers to my previous post and for your “open process” of creation. That’s very interesting to follow!

    >The honest, not-cheating-at-all way would be to attach a virtual camera (or two) to each creature’s head and use the 3D engine to render the scene from the creature’s perspective onto a bitmap.

    If instead of projecting the real scene, you project several simplified version of the world which includes some non-visible information, it’s probably possible to cheat “gently”.
    For example, the first projection gives you an approximated 3D range data of the scene. The second gives you color coded information about the objects visible: one specific color is attached to a specific object.
    With simplified objects in the scene, you can render multiple time the scene from multiple viewpoints without overloading too much the GPU.

    It may be interesting to couple the analysis of the scene rendering with the attention mechanism: don’t bother analyzing some parts of the visible image if the creature focuses on a specific part.

    > In the brain, we know that there are “place fields” in the hippocampus (an older, simpler, curly fringe of the cortex). As far as I know, there’s no evidence (and it doesn’t seem awfully likely) that these “points of best representation” (see Brainstorm #2) are arranged geographically.

    Are you referring to grid cells?

    • stevegrand says:

      > Are you referring to grid cells?

      Ooh! Wow! Hmm… Gosh…

      No, I wasn’t – I was referring to place cells – the cognitive maps of O’Keefe and Nagel. But I hadn’t heard of these other cells until you told me – they’re fascinating!

      I’m already starting to drool about some possibilities there. Thanks very much for putting me on to this. I said I wasn’t all that keen about being told to look at so-and-so’s work, but what I meant by that was that I’m not really interested in other theoreticians’ ideas. Experimentalists’ DATA, on the other hand, is invaluable! This looks like very interesting data.

      I’ll report back!

      • torea says:

        You’re welcome!

        >I said I wasn’t all that keen about being told to look at so-and-so’s work, but what I meant by that was that I’m not really interested in other theoreticians’ ideas. Experimentalists’ DATA, on the other hand, is invaluable!

        I agree that works in neuroscience or experimental psychology are usually more interesting than many theories based on very few scientific data.
        Actually I’m surprised that not much work have been done following the discovery of the grid cells given their possible role in the formation of memories.

      • stevegrand says:

        Yes, these grid cells are fascinating. I keep getting the feeling that many people are thinking about them back-to-front, though. If you map the receptive field in terms of rat location you get a grid, but that seems a bit misleading. If you think of them as MOVEMENT-sensitive cells then they just become active repetitively, as the rat moves a specific distance in any direction. A bit like a trundle-wheel. It’s obviously not that simple because we have to explain the hexagonal lattice, but it seems conceptually better to think of them as recording movement, rather than absolute location. Absolute location then emerges from this in a further step (perhaps the multiple RF diameters are metaphorically similar to decimal places). They seem related to other neurons that respond to head direction, and I bet we have others that encode eye direction and body direction. I like the notion that they relate to episodic memory by enabling the association of memories with specific locations. I’ll try to incorporate some of this in my project (whilst trying not to get too sucked into playing around with the neuroscience!).

      • torea says:

        From what I’ve read (scholarpedia has a nice entry on that too), grid cells seem to be coding a relative location of the rat according to some reference point.
        I see them as a a way to do self-localization in some closed space with landmarks: you integrate odometry and some visual feedback about your current distance to some known landmarks in order to recover your position in the room.
        This approach is probably the most efficient to do self-localization with robots (except when you use not-so-human sensory devices like GPS.)

        Basically, I would put the motion sensitive cells in another part of the brain which provides an input to the grid cells.

        Actually it would be interesting to see if the grid cells react if the rat is moving in some large and completely white space. I haven’t checked if it has been done.

        >I like the notion that they relate to episodic memory by enabling the association of memories with specific locations.

        This reminds me of the method of loci: a method to remember easily various stuffs by imagining them visually in some known location.

      • stevegrand says:

        > grid cells seem to be coding a relative location of the rat according to some reference point.

        Yes, but what I was suggesting is that this may be the wrong way to look at it. It’s true that the cell seems to code for a grid of specific points in space. But how does the rat compute these points? And why a grid? It does make it sound like rats have GPS. But if you turn it inside-out and think of the cell as a motion detector then its properties become obvious and logical (up to a point).

        Imagine a computer mouse of the old-fashioned kind with a ball in it. drill a hole in the ball and fill it with ink, and then run the mouse around a sheet of paper at random. It won’t actually create a grid (which is why I said “up to a point”) but it will show the general periodicity and apparent ability to “understand” absolute space that grid cells show. Whenever the hole in the ball touches the paper it’ll leave a mark. All it is actually doing is recording a motion – it knows nothing about where the mark is in space. But if you were a neuroscientist recording the receptive field of this mouse you’d come to the conclusion that the mouse knew where it was in space, because it only leaves ink at certain locations. And then you’d be puzzled by why it was a grid.

        I agree, it would be very interesting to know what it does in an unbounded space. It has been done in the dark, so they know that landmarks help to synchronize the system but aren’t needed for it to work. Again it’s evidence that they’re movement cells, not location cells.

      • torea says:

        Well, I understand your mouse ball example as equivalent to what is described in the article: the rat motion is integrated which results in a distance from a starting point.
        The distance is deduced by using speed, direction and time. All these 3 measurements are done internally and thus may include errors which is why it is better to use external perceptions such as a distance to some landmarks in order to do some correction.
        For the landmarks to be useful, you need to integrate the motions to obtain a location at some point so that your distance to the landmarks at different time steps can be useful.

        Anyway, it may be better to wait for more experimental data in order to grasp a little better these cells.

  12. Jason Holm says:

    In the News:

    Researchers develop new brain-like molecular processor

    “a molecular circuit that can evolve continuously”

    “The massively parallel circuit contains a layer of molecular switches (monolayer) that simultaneously interact in a manner similar to the information processing performed by the neurons in the human brain. That is, they can evolve to tackle complex problems. That’s because information processing circuits in digital computers are static, and operate serially.”

    “The molecular processor can also heal itself if there is a defect because of the self-organizing ability of the molecular monolayer. Like the brain, if a neuron dies, another neuron takes over its function.”

    • stevegrand says:

      Thanks, I’ll check that out.

      I’m a bit baffled by those pictures, though. The implication seems to be that, because the patterns in this molecular layer look similar to the patterns in the fMRI scans above, the molecular layer is doing something akin to what a brain is doing. But half the reason the brain patterns look as they do is that the brain has some stonking great ventricles and the spinal cord and suchlike getting in the way. Surely any similarity between the pattern in the molecular thingy and the pattern in the fMRI is entirely coincidental! Are they saying that they’ve hooked their molecular thingy up to a hundred million sensory inputs, arranged in the same way as the thalamocortical connections, and even taken the trouble to erase molecules in the pattern of the cephalic ventricles? I think not. So who is that picture meant to fool? And there are those damn memristors again – methinks someone is perhaps hyping up their research!

      • Jason Holm says:

        Not a fan of memristors? Interesting — maybe I’ve been buying into some kind of hype then, because I’ve been reading so much about them lately it’s sounded like the jump from vacuum tubes to transistors. Care to expand on your thoughts? Something we should know about them — how they don’t measure up to the claims? Or are they all they claim to be and you’re just sick of hearing about them every time you turn around?

      • stevegrand says:

        Oh, memristors are interesting enough, in their place. I just get irritated by the hype about how they’re the missing link in being able to create artificial brains. They may well turn out to be useful, but just their mere existence doesn’t mean they’re going to spontaneously assemble into an intelligent system, nor that we weren’t able to do some things before they were invented (it’s always been both possible and commonplace to perform similar functions in software, or even in hardware using a small circuit). There’s a lot of reductionist claptrap going around about them. I wrote a post on this a while ago:

  13. Paul Almond says:

    I have been following this discussion for a while. One observation I would make (which doesn’t seem very contentions to me – but we will see) is that we should expect the system to follow some kind of geometrical analogy in its organization most strongly near the inputs/outputs, as these are geometrically related – and as we go further away from the inputs/outputs, the geometrical analogy should get weaker. Things like light sensitive cells clearly follow a very strong geometry – each corresponds to some “real-world” position, dependent on its actual position. The layer of processing just “above” this should still exhibit some geometry – coordinates should tell us something useful about neurons – but it should be getting weaker. As we go further away from the input/output level, and things get more abstracted, the relevance of geometry to any description of how the system is working should get progressively weaker, until ultimately the coordinates of a cell mean little – we will just be interested in what it connects to and how far away it is from other cells (what I would call a “weak” geometry – one where distance might matter – because it determines what connections can be made – but coordinates don’t count for much, and don’t correspond to anything about the outside world.)

    • stevegrand says:

      I think the key thing here is that the MEANING of the geometry changes. At the periphery, geometry is literal – somatosensory maps, retinotopic maps, tonotopic maps (slightly abstract, but directly related to the geometry of the cochlea). As you progress further in, these explicit geometries fade out, but I suspect they’re replaced by abstract geometries – where percepts are grouped by similarity of features, or similarity of meaning. So I think the coordinates of the cell remain very important, it’s just that the maps use ever more abstract coordinate frames. There’s quite a bit of evidence for this.

      As a general principle, I think the whole of intelligence is about coordinate transforms: the sensory maps are in literal coordinates (body-related or scene-related) and the primary motor maps are also in literal coordinates (arranged by joint, with the head at the bottom and the feet at the top). If what we see eventually triggers a movement, this input information must logically be transformed from retinotopic coordinates into the somatotopic coordinates of the motor cortex. To do this it must pass through intermediate coordinate spaces – a morph. Some of these will be semi-literal frames (such as eye direction in relation to the body, as opposed to the more primitive eye position in relation to the head). Some will be abstract (arranged by word-form, or activity type, or affect). At each stage, a transform occurs, often a dynamic transform. If the movement depends on both what we see and what we hear, then there must be an intermediate coordinate frame at which both tonotopic and retinotopic information can meaningfully be merged.

      • Paul Almond says:

        Hi Steve

        Well, I think I am sort of saying that really. When I said that a “weak” geometry starts to take effect, I meant that grouping and proximity can still be important, but that the relationships between elements are so abstract that the actual coordinates do not mean much. Cell A may be near cell B, and may therefore may be close in “meaning” to it, but the connection may be so abstracted by this stage that the actual coordinates are not really relevant in any “human-level” description.

        A WAV file is an example of a system with a very strong geometrical analogy. Each element has a position in that file that maps onto some real world feature – in this case the time at which that part of the recording occurred – in a simple way.

        Going to the other extreme, if we encrypt that file we get something which is just a mess. The location of some part of that file would not correspond to anything in the real world in any interesting way. Now, you could say that geometry is still important, in the way that “where everything is matters”, but it is not important as a way of really understanding what the system is doing.

        Am I saying the brain or an AI is like the encrypted file? No! I agree with some of what you said. Things get groups, and things can be in proximity to other things that do relate to them in some useful way. This just does not happen with the encrypted file. If two neurons in the brain were close together, of course I would agree it is reasonable to think they may have some kind of connection in “meaning”. However, what I do not accept is that we must therefore think we are dealing with something that is describable in any non-trivial way by a 3D coordinate system. At a very high level in the hierarchy, proximity could be important, but the actual coordinates may be much less important – the relationships between different elements being far more abstracted. In some systems, the actual proximity may come from the coordinate system, but at a very high level, only the proximity will really matter.

        This is what I mean by a “weak geometrical analogy” – a system in which proximity matters, but there is no actual coordinate system – or if there is (as will happen in systems like brains) it is not of much interest to us. What I have said may seem to be a contradiction, but some examples should make it clear.

        Suppose two people have lots of friends in common. We might say those two people are “close” in some kind of “social network”. We might imagine some way of computing the “social closeness” of people. However, this would have no relation to any simple “coordinates”. We might consider it a “coordinate mapping” – from people’s physical locations into “social network space”, but it should be pointed out that “social network space” may not be sensibly described in coordinates.

        Google keeps records of which pages link to which other pages. We could use this to compute the “closeness” of two webpages, by looking at average lengths of link-paths between them, etc. Now, we might be able to project some simplification of this onto a 2D or 3D diagram, but the true system would be non-spatial. Elements would be grouped, but it would only make sense to talk about the “location” of an element in terms of what other elements it is “near”.

        Now, I fully accept that a weak geometrical analogy like this will operate at practically every level of the brain, but I think that the really strong geometry, where the coordinates themselves are of a lot of interest, should only be near the input/output level. As we go higher, things will get more abstract, and coordinates (but not proximity and grouping) will become increasingly inadequate to describe things. I do not view it so much as a change of “coordinate mapping” but as the gradual discarding of coordinates as proximity itself becomes the only important thing – and you can build complex networks with just that.

        With a human brain you could say that the coordinates still play some basic role. They determine where things are (obviously!) and therefore ultimately determine the proximity, but that is only because neurons are actually things that exist at specific points in space. In my view, implementing that in an artificial system is just a distraction. I accept we need the weak geometrical analogy though: We need some measure of proximity. For example, if we have some kind of element making the system, and we want to connect it, experimentally, to something else, we can hardly go searching all over the system for stuff to connect it to. In this sense, the way that the brain’s physical positioning of neurons determines proximity, and therefore constrains the search-space of possible connections that you can make locally is valuable. I say we can capture that more elegantly by using the concept of proximity without the actual coordinate system.

        This is what I have done in my own work. The “bottom level” – where the inputs/outputs occur, is made of elements that map onto real-world geometry. Above that, coordinates have no meaning – and elements do not even have coordinates. All we can say about an element is what it connects to, so we may say, for example, that element A receives inputs from elements B, C, D and E, but we don’t have any coordinates for A. However, for any two elements we can apply one of several algorithms to determine a “distance”. We can determine which elements are close to A and which are not, by doing things like looking at path lengths, etc. We can create a kind of “pseudo-space” which has distance but no actual space. Now, in the lower levels of a hierarchy, this will still have a lot of resemblance to the input/output layer. You could “almost” assign coordinates to elements, but as you go further away this becomes increasingly irrelevant. At the lower levels, you could easily draw “pictures” and think of things as arranged in arrays, and it would be too inaccurate. Higher up, you would have to distort the system a lot to make it fit something like this. However, nothing stops us using concepts like proximity to constrain the system’s various exploratory processes (which are whatever you think they are). We can say “Let’s try to make a new element, which gets inputs from two elements that are close together,” for example. The big advantage of this is that we are not imposing any “space” on the system. The equivalent of “coordinate transformation” is now just a feature of how these elements connect. This is in contrast to (for example) the STM system proposed by Hawkins, which seems to have a very rigid spatial view of ontology.

        I hope I made more sense with that description.

    • stevegrand says:

      Ah yes, I understand what you’re saying now. But I disagree with you. I’m willing to be proven wrong, though.

      I think perhaps there are three possibilities:

      1. The distance-only clustering you’re talking about is at one extreme – it doesn’t matter WHERE something is represented, only how close it is to something else.

      2. At the other extreme would be Cartesian or non-Cartesian (but spatial) geometries where the absolute coordinates are a vital part of meaning. I think there will be maps of this kind quite far from the periphery, which depend on the properties of geometry for their function. For example, maps of local landmarks, or maps of “reach-space”. Somehow the brain must convert information about the angle of gaze in relation to the head, into information about gaze direction relative to the body and to local space. These are internally generated maps, but they encode real spatial relationships. Perhaps these calculations are made in a non-geometric way, but given that the raw input data are often Cartesian (e.g. retinotopic) I’d guess the subsequent maps are computed geometrically, by a smooth, interpolable spatial transformation.

      3. In-between would come frames where the geometry is “emergent”. The meaning of a point would be defined, not just by how far it is from another point, but how far it is from ALL other points.

      There’s evidence, for instance, that there is a map of face-recognition cells, in which the mid-point represents a “standard” face and points around it vary from this archetype in a systematic way. Perhaps long faces to the left, short faces to the right, oval faces towards the front, flatter ones towards the back, etc. The systematic clustering of such maps is bound to be relevant to their function, especially in motor cortex.

      Different tasks/actions may be organized in a systematic way, such that an action can be modulated according to several parameters by sliding around the space on different axes. Maybe ways of touching someone are organized on some kind of social axis – from a loving stroke through a friendly pat to a punch. A second axis might differentiate the target body location, so a stroke of the cheek is at X,Y, while a punch to the stomach is at X1,Y1. Having a systematic organization like this enables the brain to decide on a course of action and then have it modulated by affect or circumstance. That’s far more efficient than an arbitrary representation in which each action has to be coded independently.

      Object recognition cells may cluster systematically into a hierarchy – a set of sets – in which properties vary systematically over space. Such a geometry has much more computational power than one based on simple proximity. Things like “to the left of” can have real meaning. For instance animals might be to the left, plants to the right; the plants might be subdivided by size into trees, shrubs, flowers, and then further into different species. The brain can make a statement about a particular species, or about shrubs in general, or about living things, just by choosing which portion of the space it considers. “Name a kind of tree”, “name a kind of rose”. “Is this plant a shrub?”

      My hunch is that systematic division of space is very relevant to the brain. Maybe I’m wrong, but I see hints of evidence, and thinking from an engineering perspective I can see many advantages.

      • Daniel Mewes says:

        “There’s evidence, for instance, that there is a map of face-recognition cells, in which the mid-point represents a “standard” face and points around it vary from this archetype in a systematic way.”

        Ha! That reminds me of some method called Principal Components Analysis, where variations within some class of vectors (say containing distances between face features) are represented in a multi-dimensional ellipse. The great thing about PCA models is that they can be easily created (by looking for Eigenvectors of a linear mapping). What is interesting about faces here is that the main axis of the corresponding ellipse actually transforms a face between female and male (id est a feature which has some natural importance).

        Yea I know this sounds awfully academic 😉 but the underlying methods might be simple enough for our brains to actually perform something similar (it’s no more than a linear mapping in the end).

        PCA also allows reducing many-dimensional data sets to only the most significant dimensions (with a corresponding “transformation basis” + average) efficiently.

  14. Feel free to disagree with this, since I’ve never been too good with neuroscience, but I’d like to add a thought to this discussion. You mentioned the ability to know what shape or letter is being traced on our skin when those shapes are learned visually. What if the brain builds shapes when it learns them? The reason we can manipulate images in our mind could be because we build them in our mind. This could then let us “build” a shape as it is traced on our skin allowing us to recognize it.

    This doesn’t apply to every question you’re asking, but it may help in some way; even if all it does is dismiss an idea and lead you to a better idea. 🙂

    • stevegrand says:

      I think that’s exactly right! Except I don’t know HOW. I’ve long felt that when we see a curved line we “feel” it, as if we’re moving along it. We feel the acceleration profile of it. And so in that sense the shape is a motor memory, as if we’re tracing it out with our finger. I’m not sure about letters, but for curves in general that may well be how we store them. That works, because as a memory of movement it’s in a scaleable and rotatable form, and it’s an active form of recognition. But it still leaves open the question of how we get from pixels in visual cortex or “taxels” in somatosensory cortex to this more vector-like representation (and back again), and what the representation actually looks like on a neural surface. I keep banging my head against this, but I think you’re probably right that we build these shapes and they exist in a common frame that’s neither visual nor somatosensory.

Leave a Reply to stevegrand Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: