In a previous post, I talked about how we don’t understand the brain. We’ve built machines that mimic it, but these machines are opaque and don’t give us a view into how the brain works. Really understanding the brain is not building machines that can build the brain; it’s actually building the brain ourselves.
The first step in building a brain is building a brain-augmenting interface. What’s that? It’s hypothetical. Remember Google Glass? (Uh, no?) Google Glass projected stuff onto a screen that you had outside your eyes, and your eyes would then see more stuff than they would see without Google Glass. That’s a type of augmenting. But the augmenting is happening outside of your eye. It’s happening on the light that hits your eye. In terms of brain-augmenting technology, that’s the same as a flashlight. So when the light from the world + Google Glass hits your eye, your eye sees the world + Google Glass cookie recipes, and it transmits that stuff to the brain. A brain-augmenting interface would let the light from the world hit your eye normally, then augment the signal AFTER your eye transmitted it to the brain. If you could build that, it would be a brain-augmenting interface.
Have you thought about what it means to build this interface with a brain? Like in Terminator. When you see those red wireframe overlays on motorbikes, clothes, boots and underwear as Schwarzenegger passes his eyes over them? Think about what it would take to get those red overlays.
I started thinking about this, and despite the fact that I know nothing about even the terminology for the parts of the eye, even less about terminology for nerves, and very little about what you call parts of properties of light like photos or whatever, I realized that just by thinking logically, you can get really far into designing the system you’d need to have a visual brain-augmenting interface. A brain-augmenting interface behind the eye. Let’s think that through. Let’s design that system logically.
We want you to look at something with your eyes: say you look at a cement truck. We want you to fully see that cement truck very clearly. Whether you need glasses or not it doesn’t matter: we want the image that you get in your brain to be clear, like it is today if you have good eyesight or good glasses. And… we want you to see a red wireframe overlay on the tank part of the truck, with statistics running down the side in a tiny font analyzing capacity, speed of rotation, probable weight, et cetera. Let’s assume we have a phone that already did all the statistics and figured out coordinates to draw the right shape of the truck. What would it take for you to see that red overlay on the truck?
Here’s what it would take. We would need to cut the cable between your brain and your eye. Now that cable has two dangling ends: the eye end and the brain end. We would need to attach the dangling eye end to the input of a chip, let’s assume a chip we could comfortably embed in your skull. And we would need to attach the brain end of that cable to the output of that chip. So let’s assume this is all small and nicely hidden in your head. Bone is thick right so there’s plenty of room to hide this in there, right? (I’ve actually read it’s not thick at all, but we’re not really thinking through aesthetics today.) So we’ve got a chip intermediating in the “eye cable.”
What would that chip need to do? That chip would need to see what you see, add red wireframe overlays, and output exactly what you see plus the red wireframe overlays perfectly aligned over what you see.
We’re not finished yet, but if we had that, we could “easily” let you flip on and off your red wireframe overlays anytime, and you could still see perfectly clearly but with augmented stuff whenever you wanted. And notice: nobody would be able to know what you were looking at unless you decided to somehow hook up to a projector and show it to them. What I’m saying is that this would be a pretty perfect augmented vision system.
But to actually get that chip to do what it had to do, we would need two fundamental things: 1) we would need to perfectly decode the signal that your eyes send down that cable to your brain. To JPEG. And 2) we would need to perfectly encode the JPEG image that the chip produced (original image plus red wireframe overlay) in the same way that your eyes encoded images sent down to your brain. We need to decode to be able to modify the JPEG… and then we need to encode because by changing it, we’ve produced a new JPEG.
A JPEG would probably be a bad choice, by the way. It’s a euphemism.
Your eye on a TV
Fine. So we want to make a JPEG out of what your eye is pointing at. Before it hits your brain. What would we need to write a decoder for images coming from your eyes and going to the brain? Well, we would know that we had a decoder once we could attach a TV to that chip connected to your eye, and on that TV we could see a perfectly clear image of a cement truck.
The thing is that it couldn’t just be clear; it would have to be perfectly framed and aligned exactly the way you were seeing it. And: since we had disconnected the cable going from your eye to your brain: you wouldn’t be seeing it! We would have no reference point to compare the image we were decoding on the TV and the image you were seeing. Oops!
Just picture exactly the laboratory set-up here: There’s a girl sitting there with a bloody vein-tube pulled out of her head and attached to a metal box which is taped to some hospital IV drip rack so that it stays near her head, and there’s cables coming out of the other side of the metal box attached to a TV. The girl moves her eyes around, and when she does the image on the TV changes. But damn, it doesn’t look like a cement truck.
There are a bunch of programmers sitting on cheap swivel chairs with laptops, uploading code to the metal box, and each time they upload a new patch the image on the TV also changes, but while some blobs seem to solidify and some colors become more grey, only the programmers’ moms say that it’s looking more like a cement truck.
They keep trying. They keep trying because although it’s really really hard, the day that they actually do get an image of a cement truck on that TV screen, they will then hit the button with that 3¼" diskette icon on it and they will never have to do that work again. They will have themselves one smoking fresh algorithm for decoding the signal from that girl’s eye-cable. And maybe, just maybe, that algorithm, written in code that they can understand and debug and step through, might just have a chance of working on decoding other peoples’ eye-signals. If not, they rinse and repeat on a bunch of other people until they have something generic.
A picture for your brain
Now getting this image back into your brain is the “encoding” part.
Encoding would be similar. Given the image of the cement truck, we would need to create a signal that made the brain think it was seeing a cement truck. In this case the programmers can see the cement truck clearly. But what they don’t know is what the brain is seeing once they feed it their signal.
You could cheat. You have the cable coming out of the eye. Well, if you splice it off before it gets to the metal box, and re-attach it to the brain, then you can be sure (providing you connected all the bloody veins and nerves and synapses) that the brain is seeing what the eye is sending it. But that doesn’t help because that wouldn’t let you add the red wireframe Terminator overlays. You need to pass the image through the chip to do that.
So you could do this: You could take a new person—one that does not currently have any cable sticking out of their head—and once more snip the cable that is connecting their eye to their brain. Now that vein in the guy’s head has two bloody ends again: the eye end and the brain end. You let him look at a cement truck, and you record what is coming out of the eye end. It’s probably just electricity, so recording the strength and frequency of whatever pulses you detect should be possible. The recording has to have perfect precision of course.
If you were to then remove the cement truck, but play the recorded signal back into the brain end, the person should say “I see a cement truck.” Their brain should see exactly the image that they were looking at before you made the recording.
The only thing is: the guy didn’t see the cement truck while you were recording, because you had cut his eye cable. So the first time he sees the cement truck is when you play the recording. And thus there’s a problem: what if the original cement truck was yellow, but when you play the recording the guy says the cement truck is blue? Something is wrong with your recording. But there’s no way to know what.
Sure, there’s a way. You try changing every bit in every single combination and permutation of signals until you make the guy really see a yellow cement truck. But how do you know that his yellow cement truck is really the same size and proportion and shading and brand as the original cement truck? You don’t know because you don’t see what his brain is seeing.
We could compare your left eye to the right eye: they should be generating similar signals I guess. But what if they’re not? What if a tiny miniscule difference in the signal from the right eye results in static when projected from the left eye? What if it has to be perfect? We don’t know if we have a margin of error or not. Even though we know the brain can adapt over time to crappy signals, there would still be no point in augmenting perfect vision if the end result was worse than what we had started with.
And the recording, although it’s probably just electricity, is probably not as simple as checking what comes out of one cable. There are probably a thousand little veins all co-operating to send signals between the eye and the brain and the pulses they send are probably more or less important. Some might send colour information and other might send brightness or contrast information. Or they might send even more complex things like transformation information: maybe eyes know when they’re sending things in the center or edges of vision and they send signals to indicate the radius of each pixel. Or even more complicated: some signals might sent state information saying things like “Ok, now we’re going to tone the green of the entire image down 60% for all following colour signals until further notice.” There could be information for movement versus static images, too. All this could easily be encoded in electric pulses, and that means that your recording would not only have to have the perfect signal at any one moment, but it would have to be a series of perhaps millions of perfect signals for just one second of viewing. Our eyes are a permanent movie, after all, aren’t they?
So that’s a problem. There is no reference image. The output of your chip is an image in brain-code. It’s just a … pulse. Or a series of pulses. Your chip can produce a brain code, but you don’t know what a cement truck looks like in brain code. You have nothing to compare it to.
Hmm. What if you did something funny, though? What if you had spent tons of time on your decoder and you nailed it and you really had a perfect decoder (the end initially connected to the eye)? Well… if you really trusted your decoder you could then snip the cable coming out of the chip and going into your brain, and instead of the brain you connect it to a second decoder.
The setup you want in the end is this:
eye –– vein –– decoder –– chip with red wireframe overlays –– encoder –– vein –– brain
But you could do this:
eye –– vein –– 1st decoder –– chip –– encoder –– 2nd decoder –– TV!
The 1st decoder and 2nd decoder are running the same code.
In that setup, you could project the image that you were making with the chip onto the TV! You would be able to see your cement truck and the red wireframe overlays on it.
If you did not see the image you were expecting in this setup, and you trusted your decoder, you would fiddle with the encoder only. But you really have to have a perfect decoder first.
Without knowing terminology for: eyes, optics, brains, veins, electricity or even cables, this is a basic logical run-through of the kind of laboratory you would need to build augmented brain-vision.
It also shows the difference between letting a machine learn how to program vision and understanding yourself how to program vision. Because although this laboratory depends on a perfect decoder, getting that perfect decoder to work is a huge trial-and-error effort.
The advantage is that once you have done the trial-and-error, you have code that you can debug and change and understand. You can rewrite it as simpler modules and write comments in it and release it on github. With code made by machine learning, you can’t do that. If a machine learns wrong, you have to start the learning all over again.
There’s still something nagging at me though: we have coded the decoder, but have we really understood our encoder? If we’ve set up our laboratory like this, then yes, we have, because we have manually programmed our encoder, also through trial-and-error, until we could—at will—send any image we wanted at all to the brain just by showing it on a computer. It’s just that programming this encoder, seeing as it depended on having programmed an absolutely perfect decoder, would be astronomically difficult. And yes, if we really did it, we would come out of it with an understanding of brain-code. At least the part for the eyes. We’d have to deal with the other senses later.
So who's in?