An overview of the future
Machine learning, artificial intelligence (AI), self aware spaceships. All these come to mind when the term is brought up but how does it all work? Well at a high level, machine learning is just about the way to transform computer behavior from imperative to independent. What this means is training a system to make its own decisions based on its environment instead of us telling it what to do. For example, instead of telling a car to turn left 300 meters after a certain street has been passed. We could let it crash into a tree and in the future it would know to turn left when it sees something similar to a tree in front of it. While that’s an extreme example it highlights the difference between how the car’s AI is working. In the first example, it is just being told by pre-programmed instructions to turn. While in the second example, it is learning and using knowledge to make future decisions. So instead of having to know every street in advance, now the car just has to know about its environment to drive safely (Just like a human).
Okay, so now we get why this is useful, but how does it work in 1’s and 0’s? One answer: neural networks. They’re great for comparing and distinguishing things. Put enough neural nets together doing different things and it may look like an actual sentient machine. There are many great books and websites which discuss this, but being a minimalist, I’m just going to boil it down.
Implementation of Neural Networks
While at first you might think it’s very complicated and requires multiple years of experience (okay maybe it does), it’s honestly just a bunch of weighted averages with esoteric functions and parameters. Just like statistics, biology or electrical engineering, the computations are trivial. The difficult part is the concepts. Thankfully, neural networks are easy enough to digest without professional instruction. Here are the steps:
- Find some data to compare
- Get a good amount of that data to train the neural net
- Pick your functions and parameters without over-fitting to the training data
So let’s assume we want to compare a male and a female voice. Specifically for this case, the data we would want to compare are the peak frequencies of the voices. If you’ve ever seen one of those cool audio graphs with the frequencies graphed on a spectrum, that’s the kind of data we’re comparing. Remember our end goal is to have a computer which can detect whether it’s heard a male voice or a female voice.
So let’s say we have a scatterplot of one male’s peak frequencies and one female’s peak frequencies (below):
Don’t mind the units. They’re arbitrary. Let’s say the red points are female and the blue ones are male. In the future we want a computer to be able to tell which points are from a male and which points are from a female (say all the points were black and you had to pick out which ones were likely to be male/female).
To do this, we train it with this plot. This is done by fitting some kind of function around this data. The trick is to get a function that fits it well, without losing its generality. In other words, form a function that’s too specific and it’s just a copy of the data. Pick one that’s too general and it won’t make accurate predictions in the future. In machine learning, we have inputs (data given), outputs (data out) and what’s known as a “hidden layer”. Essentially, the hidden layers each just check conditions. For example, if we set the criteria: Need to have X, Y and Z frequency to be male, then there you have it–3 hidden layers in our neural net.
Increasing function complexity in machine learning equates to increasing the dimensionality (or the number of nodes/criteria) within the hidden layer:
Sometimes a computer can pick the right function, but ultimately an experienced human who knows what kind of data is being worked with is best. For this case, we can use tanh(x). And we’ll pick a layer of 3 for the hidden nodes:
So what this means is that points in the blue are likely to be male frequencies and points in the red are likely to be female. Now depending on the test data and functions we pick, this can alter the kind of curve generated. But what’s so powerful about this process is that it works to generalize any kind of data (facial recognition, crime rates, game theory, handwriting recognition, etc.) as long as the test data is legitimate.
Tweetsense uses this kind of machine learning to make more and more accurate predictions on what people are feeling when they’re posting on the internet. In our case, the scatter plots we use are about whether or not a certain emotion is applicable to a sentence. Tweetsense aggregates several of those emotions and eventually makes a prediction on what one sentence or paragraph’s meaning was. Add political biases as a parameter and the data becomes even more predictable.
Well, that’s machine learning for you, the extensible technology that’s predicting the world, one election at a time. If you want to play around with your own neural networks, there’s this really cool online tool called tensorflow that will let you build your own right on the browser: http://playground.tensorflow.org/
(If you would like to learn more about this, I recommend a healthy dose of the internet and a Computer Science or Computer Engineering degree!)