Tag Archives: data science

Human augmentation through artificial intelligence

Last week, I published two posts that relate to a very interesting and socioeconomically relevant machine learning topic: human augmentation through artificial intelligence. There have been many examples that made their way in the mainstream media where we’ve seen AI push the limits of what was possible, especially since deep learning really took off. One of the big recent examples is of course, AlphaGo.

But human augmentation isn’t about what AI can do by itself. Rather, it’s about how AI can be used to assist a human so that he or she can be much more efficient at performing a given task. It can be to make humans faster at it, or allow them to produce work of higher quality. It’s about using AI as a tool, just like any one of the hundreds of tools we all use in our everyday life.

The first post was a guest post on KDNuggets (with a companion notebook) on MLDB, showing how deep and transfer learning can be used to quickly train a model to classify if the car in a picture is a Tesla, a BMW or an Audi. It lays some of the technical foundation used in the second post.

The second post is about DeepTeach, the interactive Deep Image Classifier Builder. DeepTeach is an MLDB plugin, that is open source, and implements a human augmentation workflow allowing a user to quickly build an image classifier from unlabelled data within minutes.

As the post explains, the plugin uses a series of machine learning techniques, like deep, transfer and active learning. But what’s the most interesting about it is the workflow it implements, allowing a user to go from thousands of unlabelled images to a working binary image classifier in a minute. The same workflow could be implemented to deal with any type of data, like sound or text, and it concretely shows how AI can be used to make humans more productive.

The video below shows a demo of the plugin in action:


Actually, Marty didn’t go Back To The Future: Graphing the train sequence of BTTF3

In a hurry? Go straight to the graphs.

The dataset and notebook detailing how this was done are available in the companion repository.

Two weeks ago was Back To The Future Day. October 21st, 2015 is the day Marty and Doc Brown travel to at the beginning of the second movie. The future is now the past. There were worldwide celebrations and jokes, from the Queensland police deploying a hoverboard unit, Universal Pictures releasing a Jaws 19 trailer and even Health Canada issuing an official recall notice of DeLorean DMC-12 because of a flux capacitor defect that could prevent the car from traveling through time.

I love the trilogy and as many people probably did that week, I rewatched the movies. I also wondered if there was any fun BTTF data science project I could do. While watching the climactic sequence at the end of the third movie, I realized that as the steam locomotive pushes the DeLorean down the tracks, we get many data points as to the speed of the DeLorean. Marty is essentially reciting a dataset, all the way  from 1885.

That made me ask the 1.21 Giga Watts question: Do they really make it to 88 miles per hour before they run out of tracks?

Doc’s Plan

For those not familiar with the movies, Marty and Doc are trapped in the old West without any gas to power the gasoline engine of the DeLorean, their time-machine. That means they can’t drive it to 88 miles per hour, the  speed required to activate the flux capacitor, and travel back to 1985. The plan they come up with is to commandeer a steam locomotive and use it to push the DeLorean to the required speed.

Doc spells out the plan to make it back to the future:

Tomorrow night, Sunday, we’ll load the DeLorean on to the tracks here on the spur right by the old abandoned silver mine. The switch track is where the spur runs off the main line 3 miles into Clayton… Shonash Ravine. The train leaves the station at 8:00 Monday morning. We’ll stop it here, uncouple the cars from the tender, throw the switchtrack, and hijack – borrow the locomotive and use it to push the time machine. According to my calculations we’ll hit 88 miles per hour just before we hit the edge of the ravine, at which point we’ll instantaneously arrive in 1985 and coast safely across the completed bridge.

Doc Brown's master plan

If you think of it, it’s a shame Doc didn’t equip the DeLorean with Tesla electric motors when he visited 2015. That would have made things easier considering the DeLorean was equipped with a working Mr. Fusion generator in 1885.

The dataset

To assemble the dataset, I simply watched the train sequence and took down each time Marty said the speed, or that we saw it on the speedometer, along with the time in the movie. The tiny dataset is available in the companion repository.

Also, Doc tells us twice in the movie that they have 3 miles of tracks before they hit the ravine. Finally, like Jack Bauer would say, we assume events occur in real-time. This is a critical assumption because I’ll base the distance calculations off this. So if they were to go at a steady 25 miles per hour for one hour in the movie, that would mean that they traveled 25 miles during that period.

Graphing the sequence of events

For simplicity, I assumed a linear progression between each actual data point, meaning we’re assuming a uniform acceleration between data points. The following graph shows the sequence of events as they occur in the movie.


The x-axis represents the number of minutes since the beginning of the train sequence and the y-axis is their speed. The landmarks along the tracks have been labeled in red at the bottom. Finally, the period for which each of Doc’s 3 presto logs burnt have been marked as horizontal lines.

We can see that the whole sequence lasts about 7 minutes and they successfully reach 88 miles per hour just before reaching the ravine, exactly as Doc predicted.

But did they?

How far did they go?

The question we have is about the actual distance it takes them to get to that critical velocity. Since we know at what speed they were going and for how long, we can essentially integrate the time vs speed graph above to get the distance they really traveled.

Doing this gives us the distance vs speed graph that we can use to determine if they really reached 88 miles per hour before having traveled 3 miles. In other words, does the blue speed line get to the green future line before reaching the red ravine line ?


Great Scott! They actually run out of tracks just shy of 70 miles per hour. This means the ravine is actually rightly renamed Eastwood Ravine because Marty does end up at the bottom of it!

Here is another way to look at it:


Can we fix this?

Looking at their acceleration over time sheds some light as to why they got in trouble. Remember we’re assuming a uniform acceleration, meaning that between each actual data point, we assume a linear speed progression between each pair of points.

The following graph shows the acceleration:

accelerationThe very narrow spikes are when the speedometer is shown on screen and you see the speed go up by 1 mph within 2 seconds. The wider and lower periods are the result of the speedometer not being shown for a while and the speed not having gone up that much in the meantime.

My expectation would have been that as the yellow and especially red logs catch fire, we get a higher and higher acceleration. In reality, acceleration correlates with dramatic moments in the story and with when the speedometer is shown on screen. It’s a movie; I know.

Let’s give Doc a hand and figure out the acceleration his presto logs would have needed to provide in order to make this work. By assuming that we can only influence the acceleration from the point where the green presto log catches fire, we can determine the acceleration we need to get to the right speed in time and derive the following modified distance vs speed graph:


This allows us to plot the new speed curve on the initial graph that showed the sequence of events.


In this scenario, Marty safely goes back to the future after 2 minutes and 8 seconds, a mere 39 seconds after the green log caught fire.

Unfortunately, since Clara came on board exactly when the green log caught fire, she most probably would have made the jump with the locomotive. In the movie, it took her 1 minute and 51 seconds to get to the locomotive’s whistle, so she would not have had time to call for help. Doc, who had to put the presto logs in the firebox while the train was moving, would have had to rush to the DeLorean but it’s possible he would have made it.

In the end…

We’re forced to conclude that Doc’s calculations were off and Marty couldn’t have made it back to the future. The fact that he did may mean that we are currently in a “time paradox, the results of which could cause a chain reaction that would unravel the very fabric of the space-time continuum and destroy the entire universe.”

But in any case, time travel can be a risky business. As a word of advice: maybe where you’re going, you don’t need roads… but where you came from, always make sure you had enough tracks.


More comments are available on:

Mapping Press Releases in the 2015 Canadian Federal Election

The 2015 Canadian federal election is in its final stretch and college and I thought it would be a great opportunity to collect some data and do some machine learning. Citizen data science in action!

We looked at the press releases of non-regional Canadian federal political parties using Datacratic’s Machine Learning Database (MLDB). The image below is a map with 620 dots each representing one English-language press release, colored by each party’s official color. The closer two dots are, the more similar the text of the press releases they represent. The white text labels were placed by hand to give a sense of what the various groupings mean.

A lot of interesting insights about each party’s communication strategy can be derived from the visualization.

Check out the complete blog post for more details as well as an interactive version of the graph.


Hacking an epic NHL goal celebration with a hue light show and real-time machine learning

See media coverage of this blog post.

In Montréal this time of year, the city literally stops and everyone starts talking, thinking and dreaming about a single thing: the Stanley Cup Playoffs. Even most of those who don’t normally care the least bit about hockey transform into die hard fans of the Montréal Canadiens, or the Habs like we also call them.

Below is a Youtube clip of the epic goal celebration hack in action. In a single sentence, I trained a machine learning model to detect in real-time that a goal was just scored by the Habs based on the live audio feed of a game and to trigger a light show using Philips hues in my living room.

The rest of this post explains each step that was involved in putting this together. A full architecture diagram is available if you want to follow along.


The hack

The original goal (no pun intended) of this hack was to program a celebratory light show using Philips hue lights and play the Habs’ goal song when they scored a goal. Everything would be triggered using a big Griffin PowerMate USB button that would need to be pushed by whoever was the closest to it when the goal occurred.

That is already pretty cool, but can we take it one step further? Wouldn’t it be better if the celebratory sequence could be triggered automatically?

As far as I could find, there is no API or website available online that can give me reliable notifications within a second or two that a goal was scored. So how can we do it very quickly?

Imagine you watch a hockey game blindfolded, I bet you would have no problem knowing when goals are scored because a goal sounds a lot different that anything else in a game. There is of course the goal horn, if the home team scores, but also the commentator who usually yells a very intense and passionate “GOOOAAAALLLLL!!!!!”. By hooking up into the audio feed of the game and processing it in real-time using a machine learning model trained to detect when a goal occurs, we could trigger the lights and music automatically, allowing all the spectators to dance and do celebratory chest-bumps without having to worry about pushing a button.


Some signal processing

The first step is to take a look at what a goal sound looks like. The Habs’ website has a listing of all previous games with ~4 minutes video highlights of each game. I extracted the audio from a particular highlight and used librosa, a library for audio and music analysis, to do some simple signal processing. If you’ve never played with sounds before, you can head over to Wikipedia to read about what a spectrogram is. You can also simply think of it as taking the waveform of an audio file and creating a simple heat map over time and audio frequencies (Hz). Low-pitched sounds are at the lower end of the y-axis and high-pitched sounds are on the upper end, while the color represents the intensity of the sound.

We’re going to be using the mel power spectrogram (MPS), which is like a spectrogram with additional transformations applied on top of it.

You can use the code below to display the MPS of a sound file.

This is what the MPS of a 4 minutes highlight of a game looks like:


mel power spectogram of a 4 minutes highlight


Now let’s take a look at an 8 seconds clip from that highlight, specifically when a goal occurred.


mel power spectrogram of a goal by the Canadiens


As you can see, there are very distinctive patterns when the commentator yells (the 4 big wavy lines), and when the goal horn goes off in the amphitheater (many straight lines). Being able to see the patterns with the naked eye is very encouraging in terms of being able to train a model to detect it.

There are tons of different audio features we could derive from the waveform to use as features for our classifier. However, I always try to start simple to create a working baseline and improve from there. So I decided to simply vectorize the MPS, which was created by using 2 second clips with frequencies up to 8KHz with 128 Mel bands at a sampling rate of 22.5KHz. The MPS have a shape of 128×87, which results in a feature vector of 11,136 elements when vectorized.


The machine learning problem

If you’re not familiar with machine learning, think of it as building algorithms that can learn from data. The type of ML task we need to do for this project is binary classification, which means making the difference between two classes of things:

  • positive class: the Canadiens scored a goal
  • negative class: the Canadiens did not score a goal

Put another way, we need to train a model that can give us the probability that the Canadiens scored a goal given the last 2 seconds of audio.

A model learns to perform a task through training, which is looking at past examples of those two classes and figuring out what are the statistical regularities in the data that allow it to separate the classes. However, it is easy for a computer to learn things by heart. The goal of machine learning is producing models that are able to generalize what they learn to data they have never seen, to new examples. What this means for us is that we’ll be using past games to train the model but what we obviously want to do are predictions for future games in real-time as they are aired on TV.

Building the dataset

As with any machine learning project, there is a time when you will feel like a monkey, and that is usually when you’re either building, importing or cleaning a dataset. For this project, this took the form of recording the audio from multiple 4 minute highlights of games and noting the time in the clip when a goal was scored by the Habs or the opposing team.

Obviously, we’ll be using the Canadiens’ goals as positive examples for our classifier, since that is what we are trying to detect.

Now what about negative examples? If you think about it, the very worst thing that could happen to this system is for it to get false positives (falsely thinking there is a goal). Imagine we are playing against the Toronto Maple Leafs and they score a goal and the light show starts. Not only did we just get scored and are bummed out, but on top of that the algorithm is trolling us about it by playing our own goal song! (This is naturally a fictitious example because the Leafs are obviously not making the playoffs once again this year) To make sure that doesn’t happen, we’ll be using all the opposing team’s goals as explicit negatives. The hope is that the model will be able to distinguish between goals for and against because the commentator is much more enthusiastic for Canadiens’ goals.

To illustrate this, compare the MSP of the Habs’ goal above with the example below of a goal against the Habs. The commentator’s scream is much shorter and the goal horn of the opponent’s team amphitheater is at very different frequencies than the one at the Bell Center. The goal horn only goes off when the home team scores so the MSP below is taken from a game not played in Montréal.


mel power spectrogram of a goal against the Canadiens


In addition to the opposing team’s goals, we’ll use 50 randomly selected segments from each highlight that are far enough from an actual goal as negatives, so that the model is exposed to what the uneventful portions of a game sound like.

False negatives (missing an actual goal) are still bad, but we prefer them over false positives. We’ll talk about how we can deal with them later on.

Note that I did not do any alignment of the sound files, meaning the commentator yelling does not start at exactly the same time in every clip. The dataset ended up consisting of 10 games, with 34 goals by the Habs and 17 goals against them. The randomly selected negative clips added another 500 examples.


Training and picking a classifier

As I mentioned earlier, the goal was to start simple. To that effect, the first models I tried were a simple logistic regression and an SVM with an rbf kernel over the raw vectorized MPS.

I was a bit surprised that this trivial approach yielded usable results. The logistic regression got an AUC of 0.97 and an F1 score of 0.63, while the SVM got an AUC of 0.98 and an F1 score of 0.71. Those results were obtained by holding out 20% of the training data to test on.

At this point I ran a few complete game broadcasts through the system and each time the model detected a goal, I wrote out the 2 seconds corresponding sound file to disk. A bunch were false positives that corresponded to commercials. The model had never seen commercials before because they are not included in game highlights. I added those false positives to the negative examples, retrained and the problem went away.

However the AUC/F1 score were not an accurate estimation of the performance I could expect because I was not necessarily planning to use a single prediction as the trigger for the light show. Since I’m scoring many times per second, I could try decision rules that would look at the last n predictions to make a decision.

I ran a 10-fold cross-validation, holding out an entire game from the training set, and actually stepping through the held out game’s highlight as if it was the real-time audio stream of a live game. That way I could test out multi-prediction decision rules.

I tried two decision rules:

  1. average of last n predictions over the threshold t
  2. m positive votes in the last n predictions, where a YES vote requires a prediction over the threshold t

For each combination of decision rule, hyper-parameters and classifier, there were 4 metrics I was looking at:

  1. Real Canadiens goal that the model detected (true positive)
  2. Opposing team goal that the model detected (really bad false positive)
  3. No goal but the model thought there was one (false positive)
  4. Canadiens goal the model did not detect (false negative)

SVMs ended up being able to get more true positives but did a worst job on false positives. What I ended up using was a logistic regression with the second decision rule. To trigger a goal, there needs to be 5 positives votes out of the last 20 and votes are cast if the probability of a goal is over 90%. The cross-validation results for that rule were 23 Habs goals detected, 11 not detected, 2 opposing team goals falsely detected and no other false positives.

Looking at the Habs’ 2014-15 season statistics, they scored an average of 2.61 goals per game and got scored 2.24 times. This means I can loosely expect the algorithm to not detect 1 Habs goal per game (0.84 to be more precise) and to go off for a goal by the opposing team once every 4 games.

Note that the trained model only works for the specific TV station and commentator I trained on. I trained on regular season games aired on TVA Sports because they are airing the playoffs. I tried testing on a few games aired on another station and basically detected no goals at all. This means performance is likely to go down if the commentator catches a cold.


Philips hue light show

Now that we’re able to do a reasonable job at identifying goals, it was time to create a light show that rivals those crazy Christmas ones we’ve all seen. This has 2 components: playing the Habs’ goal song and flashing the lights to the music.

The goal song I play is not the current one in use at the Bell Center, but the one that they used in the 2000s. It is called “Le Goal Song” by the Montréal band L’Oreille Cassée. To the best of my knowledge, the song is not available for sale and is only available on Youtube.

Philips hues are smart LED multicolor lights that can be controlled using an iPhone app. The app talks to the hue bridge that is connected to your wifi network and the bridge talks to the lights over the ZigBee Light Link protocol. In my living room, I have the 3 starter-kit hue lights, a light-strip under my kitchen island and a Bloom pointing at the wall behind my TV. Hues are not specifically meant for light shows; I usually use them to create an interesting atmosphere in my living room.

I realized the lights can be controller using a REST API that runs on the bridge. Using the very effective phue library, we can interface with the hue bridge API from python. At that point, it was simply a question of programming a sequence of color and intensity calls that would roughly go along with the goal song I wanted to play.

Below is an example of using phue to make each light cycle through the colors blue, white and red 10 times.

I deployed this up as a simple REST API using bottle. This way, the celebratory light show is decoupled from the trigger. The lights can be triggered easily by calling the /goal endpoint.


Hooking up to the live audio stream

My classifier was trained on audio clips offline. To make this whole thing come together, the missing piece was the real-time scoring of a live audio feed.

I’m running all of this on OSX and to get the live audio into my python program, I needed two components: Soundflower and pyaudio. Soundflower acts as a virtual audio device and allows audio to be passed between applications, while pyaudio is a library that can be used to play an record audio in python.

The way things need to be configured is the system audio is first set to the Soundflower virtual audio device. At that point, no sound will be heard because nothing is being sent to the output device. In python, you can then configure pyaudio to capture audio coming into the virtual audio device, process it, and then resend it out to the normal output device. In my case, that is the HDMI output going to the TV.

As you can see from the code snippet below, you start listening to the stream by giving pyaudio a callback function that will be called each time the captured frames buffer is full. In the callback, I add the frames to a ring buffer that keeps 2 seconds worth of audio, because that is the size of the training examples I used to train the model. The callback gets called many times per second. Each time, I take the contents of the ring buffer and score it using the classifier. When a goal is detected by the model, this triggers a REST call to the /goal endpoint of the light show API.

Full architecture


My TV subscription allows me to stream the hockey games on a computer in HD. I hooked up a Mac Mini to my TV and that Mac will be responsible for running all the components of the system:

  1. displaying the game on the TV
  2. sending the game’s audio feed to the Soundflower virtual audio device
  3. running the python goal detector that capture the sound from Soundflower, analyses it, calls the goal endpoint if necessary and resends the audio out to the HDMI output
  4. running the light show API that listens for calls to the goal endpoint

Since the algorithm is not perfect, I also hooked up the Griffin USB button that I mentioned at the very beginning of the post. It can be used to either start or stop the light show in case we get a false negative or false positive respectively. It was very easy to do this because a push of the button simply calls the /goal endpoint of the API that can decide what it should do with the trigger.


Production results and beyond

After two playoff games against the Ottawa Senators, the model successfully detected 75% of the goals (missing 1 per game) and got no false positives. This is in line with the expected performance, and the USB button was there to save the day when the detection did not work.

This was done in a relatively short amount of time and represents the simplest approach at each step. To make this work better, there are a number of things that could be done. For instance, aligning the audio files of the positive examples, trying different example length, trying more powerful classifiers like a convolutional neural net, doing simple image analysis of the video feed to try to determine on which side of the ice we are, etc.

In the mean-time, enjoy the playoffs and Go Habs Go!



In the media