Deep Learning – So Simple, it’s Stupid

Have you heard the term “deep learning” and wondered what it means? The answer is so simple, it’s almost counter intuitive.

Deep learning is more like teaching a 4-year old to read than solving a math problem.

For those of us most familiar with the world of if..then statements and algorithmic computer processes, deep learning can be an especially foreign concept at first glance.

Deep learning systems are missing the complex explicit logic and millions of lines of code that normally accompany a large system used to solve our most interesting problems. The Hollywood theatrics of lines of code flying by on the screen at break-neck speeds, images of faces flipping onscreen as matching algorithms search for a match are attempts to visualize the complex algorithms at work. To be honest, they aren’t good constructs to think about traditional programs with, if…then logic, let alone the other-worldly constructs of deep learning.

The secret really lies between your ears. Not your brain’s actual structure of cells and chemistry, but the computational process your brain uses to solve problems and learn. Your own grey matter and the model used in your head to recognize this very word on this screen is a much closer approximation to deep learning than what we typically consider computer programming used to execute commands in an application.

You see, deep learning is another name for a neural network. A neural network, as the name suggest, is programmed to simulate the neural pathways present in your brain. This isn’t about rearranging logic gates and inventing new CPUs that have biological juices powering problem solving. This is done in software, albeit in many cases running on incredibly powerful hardware fine-tuned for this specific task.

Deep learning, in fact, is modeled on the way you and I think. When you see your mother, father, brother, sister, son or daughter, friend or foe, you recognize their face. There is no “screen” flipping in your mind indexing the list of existing faces and looking for a match like they do in the TV series 24 when trying to identify “the bad guy”. This process is the most familiar to all of us, even if we can’t explain how it actually works. There are no algorithms or binary templates, it is all based on learning and neural pathways.

Learning for humans if fundamental to our development. Learning is the key to the neural networks. Deep Learning is one of the most revolutionary trends in computing and will transform the way you and I interact with machines.

Deep learning has already accomplished, and will accomplish tasks we once thought only possible for humans. Image recognition, driving cars, full human to machine conversations, live speech translation, and complex decision making. IBM Watson, one of the most advanced neural networks in the world, can already diagnose many diseases much better than the best doctor.

The premise behind deep learning is so simple, it really is stupid simple.

That isn’t to say the actual implementation and creation of a system capable of deep learning is simple, on the contrary. But how to understand the bits and bytes is beyond the scope of this post. (If you are interested, here is a link to an interesting technical article)

At a high level, here is how it works.

Inputs are provided to the system. Outputs are mapped. The broader the breadth of inputs mapped to outputs, the more robust the data. Eventually, given enough inputs mapped to outputs, the system can start to predict outputs based on the inputs it has never been presented.

Here is a very simple example. Say you provide an algorithmic system the equation “2+2=”. A traditional algorithm would look at the text, and simply find the 2+2 and add it together to make 4.  But what happens when you pass the system “2+2” and forget the “=”. A traditional system would need to “know” about the possibility of missing the “=”, and account for it in logic. However, in the real world, there are many many possible inputs. Things like “two plus two equals” and “2 + two equals” and “2+2 equals”, and the sound wave of “two plus two”, in binary form. You get the picture. To have to be explicit about every option based on rules and algorithms isn’t feasible for incredibly complex problems, let alone simple ones like this one.

A neural network is different. Instead of algorithmically handling the input, “2+2=”, you map it to the answer of “4”. Then, you give it A LOT of inputs. All of the variations from above get fed into the system, and mapped to “4”. “tw plus two”, and “2 pls 2”, etc.

Here is the punchline.

Let’s say you’ve done 1000 variations of this mapping, and you get the 1001st variation.  The system has never encountered this 1001st variation. The input is new, but most likely not completely unique. This 1001st variation is surprisingly similar, with small differences, to many of the previous variations. With a neural network, or deep learning system, it can with a high degree of certainly still provide you with the right output, 4. An algorithmic system could never do this without explicit rules and logic to handle the variation in some way. Deep learning can deal with uncertain inputs, just like people.

This is a paradigm shift in computing. This is the crux of the advantage of deep learning. It learns. It’s trained. Once trained, it can recognize outputs from inputs it has never before been presented with. Pretty cool, huh? (yeah, of course there is also a creepy side to this. But every technological advancement always has two sides. I’m an optimist. This will be a net positive for humankind)

Here is a more real-world example. Imagine a sound recognition task. Something that is required to recognize audio. Imagine the task of recognizing the English words “hello world”. An audio file, which stores a binary representation of the waveform of the audio representing the words “hello world”, is created. This file is then fed into the neural network, the output would be mapped as “hello world”, the text. Given one sample, the system in theory should be able to recognize that exact audio file every time. It’s like “labeling” the file with metadata, “hello world”. However, that isn’t really an interesting problem to solve. The next audio file that is slightly different, perhaps a different speaker, cadence, pitch, background noise, etc., would be unrecognizable. In order for the system to function, it must be fed an enormous set of audio samples with the words “hello world”. The system would be fed hundreds of thousands, if not millions, of samples of these two words “hello world”. Some from women, some from men, some from children, some in a busy cafe. The system would then be trained to map this input to the output “hello world”.

Algorithmic processing falls far short of the task of this input in this environment of uncertainty. Deep learning, on the other hand, is perfectly suited for this type of task. Same with image recognition.

Let’s drive home the concept with one last example by looking at self-driving cars (see what I did there?)

Have you heard of Tesla’s new self driving car? Do you think the requirement to have your hand on the steering wheel at all times is a safety measure meant to ensure you don’t fly off the road to your death? Ok, trick question. It is actually a safety measure used to keep you alive. But it is so much more than that. Every time you touch the wheel and correct the car, is another mapping of input to output. It is learning thousands of inputs against thousands of outputs. And what it knows is that you are teaching it everything about what driving a car means. As hundreds, then thousands, then hundreds of thousands and millions of us eventually control self-driving cars, we will all teach them how to drive on unexpected curves, avoid pedestrians, slow down if ice is likely, and most importantly avoid accidents.



Consumers are the key to teaching these new deep learners. Humans will be the teachers. Eventually, the combined training of the neural network will be more accurate than any given human.  

A self driving car will be superior to a human-driven car–eventually after we have all had enough time to train it and share that data across all self-driving cars.

That won’t stop the car from having accidents, fender-benders, or even kill people. The chances of this are just unlikely.

The question is, when will people be ready for computers that make mistakes? But that’s a topic for another day.