Boston Music Hackday
I was thrilled to attend the Boston Music Hackday this week-end. A lot of people hacked up some pretty cool projects, many of us coding until the very early morning Sunday (aka 4am), only to get back up a few hours later (aka 8am) to keep at it until the dreaded 15h45 deadline, when we all had to submit our demos. The organisers did a wonderful job and the event was a success at every level.
The hack I did was called the PartyLister. The goal was mainly trying to come up with a way to generate steerable playlists that would also be personalized for a group of people (ie.: taking into account each of their musical taste and making sure everyone gets a song he likes once in a while). Given the very limited amount of time available to hack this up, I had to keep things simple and so I decided to use only social tags to do all the similarity computation. I excepted the quality of the playlists would suffer but the goal was really to develop a way to include multiple listeners in the track selection process. The algorithm should then be used in conjunction with something like the playlist generation model I presented at this year’s ISMIR.
My hack: PartyLister
Imagine you’re hosting a party and using the PartyLister as DJ for the night. Each of your guests will need to supply the software with his last.fm username and we’ll be good to go.
We go out and fetch from the last.fm API the social tags associated with the artists (and their top tracks) that our listeners know about. We also use the EchoNest API to get similar artists so we can present new artists to our listeners. From a user’s top artists, we can create a tag cloud that represents the user’s general musical taste (UMT). We’re also allowing each user to specify a set of tags that represent their current musical taste using a steerable tag cloud.
Suppose you have 3 guests at your party, where two like pop and the other likes metal. By doing a naive combination of the users’ musical taste, we’ll probably end up playing pop music, leaving our metalhead friend bored. To solve this, I added a user weight term which is determined by looking at the last 5 songs that played and computing the average similarity between the user’s musical taste and those songs. If we’re only playing pop songs, the metalhead will have a very low similarity between his taste and what played and so we’ll increase his weight and lower the pop lovers’ weights. When we pick the next song, this weighting scheme will allow the metalhead’s taste to count more than the pop lovers’, even if there are more of them. This will make us play a more metal-like track. After a while the weights will equal out and we’ll start playing pop music again.
For sparseness reasons, I operated on artists instead of tracks. A simplified version how I weighted each candidate artist is below. Lambda is simply a knob to determine how much the users’ musical taste will count, cd() represents the cosine distance and UMT represents a combination of the user’s general musical taste and his steerable cloud.
The following plot represents a running average of the cosine distance (dissimilarity) between users’ musical taste and the last 5 songs that played. It represents a 160 songs playlist with 3 listeners in the system.

As you can see, as a user’s running average increases, his weight is also increased so that we start playing more songs that fit his taste. His average then decreases as the other users’ weights go up forcing a return to music that fits their taste a little more. The plot shows that the system seems to be doing what we want, that is taking into account the musical taste of multiple users and playing music that each person will like once in a while. Integrated in a real playlist generation model, I believe this could produce interesting results.
I also played with a discovery setting, where users could specify if they wanted to discover new songs or stick to what they know. This was achieved by adding a bonus or penalizing each candidate’s score, based on the discovery setting (float between 0 and 1) and the proportion of users who knew (had already listened to) the artist in question.
PartyLister was not a very visually or sonically attractive hack like some of the others but I still managed to win a price based on popular vote. Thanks to all the great sponsors, there were a lot of prizes and so lots of winners.
Below is the Université de Montréal delegation, Mike Mandel (who also won a price for his Bowie S-S-S-Similarities) and myself, with our bounty.

I really hope to attend another hackday soon as it was all a lot of fun. Time to go get some sleep now.
ISMIR 2009
I had a paper accepted as an oral presentation at this year’s ISMIR held in Kobe, Japan. The paper is called Steerable Playlist Generation by Learning Song Similarity from Radio Station Playlists and is co-authored with Eck, Desjardins and Lamere. It outlines two new ideas:
- Using commercial radio station playlists to learn a similarity space from audio features
- Use a steerable tag cloud to allow the user to influence the playlist generation
Here is the abstract:
This paper presents an approach to generating steerable playlists. We first demonstrate a method for learning song transition probabilities from audio features extracted from songs played in professional radio station playlists. We then show that by using this learnt similarity function as a prior, we are able to generate steerable playlists by choosing the next song to play not simply based on that prior, but on a tag cloud that the user is able to manipulate to ex- press the high-level characteristics of the music he wishes to listen to.
My time at Sun Labs and pyaura
Posted by mailletf in Programming, Voyage on October 9, 2009
My internship at Sun Microsystems Labs, which has been going on for about 15 months – 9 of those full time at their campus in the Boston area – is coming to an end. During the course of those months, I’ve met a lot of very smart and fun people, I’ve worked on very challenging and stimulating problems and I’ve discovered a bunch of really good New England beers.
All my work has been centered around the Aura datastore, an open-source, scalable and distributed recommendation platform. The datastore is designed to handle millions of users and items and can generate content-based recommendations based on each item’s aura (aka tag cloud).
Last summer, under the supervision of Paul Lamere, I worked a lot more on our music recommendation web application, called the Music Explaura and designed a steerable recommendation interface. (We also have a Facebook companion app to the Explaura that was created by Jeff Alexander.)
This summer, I worked with Steve Green on many different things, including what I’d like to talk about in this post, pyaura, a Python interface to the datastore.
pyaura
The idea behind pyaura is to get the best of both world. While the datastore is very good at what it does – storing millions of items and being able to compute similarity between all of them very quickly – the Java framework surrounding it is a bit too rigid to quickly hack random research code on top of it. While my actual goal was to experiment with ways of doing automatic cleanup and clustering of social tags, I felt I was missing the flexibility I wanted and was used to getting when working on projects using Python’s interactive environment.
Without going into details, since the datastore is distributed and has many different components, it uses a technology called Jini to automatically hook them all up together. Jini takes care of automatic service discovery so you don’t have to manually specify IP adresses and so on. It also allows you to publicly export functions that remote components can call. A concrete example would be the datastore head component allowing the web server component to call it’s getSimilarity() function on two items. The computation goes on in the datastore head and then the results get shipped across the wire to the web server so it can serve its request. However, Jini only supports Java leaving us no direct way to connect to the datastore using Python.
After looking around for a bit, I stumbled upon a project called JPype, which essentially allows you to launch a JVM inside Python. This allows you to instantiate and use Java objects in a completely transparent way from within Python. Using JPype, I built two modules which together, allow very simple access to the datastore though Python.
- AuraBridge: A Java implementation of the Aura datastore interface. The bridge knows about the actual datastore because it can locate it and talk to it using Jini.
- pyaura: A set of Python helper functions (mostly automatic type conversion). pyaura instantiates an AuraBridge instance using JPype and uses it as a proxy to get data to and from the datastore.
Example
To demonstrate how things become easy when using pyaura, imagine you are running an Aura datastore and have collected a lot of artist and tag information from the web. You might be interested in quickly seeing the number of artists that have generally been tagged by the each individual tag you know about. With these few lines of code, you can get a nice histogram that answers just that questions:
import pyaura.bridge as B import pylab as P aB = B.AuraBridge() counts = [len(tag.getTaggedArtist()) for tag in aB.get_all_iterator("ARTIST_TAG")] P.hist(counts)
The above code produces the following plot:

This is the result we expect, as this was generated with a datastore containing 100,000 artists. As less and less popular artists are added to the datastore, the effects of sparsity in social data kick in. Less popular artists are indeed tagged with less tags than popular artists, leading to the situation where very few tags were applied to more than 5000 artists.
This is a small example but it shows the simplicity of using pyaura. With very few lines of code, you can do pretty much anything with the data stored in Aura. This hopefully will make the Aura datastore more accessible and attractive to projects looking to take advantage of both its scalability and raw power as well as have the flexibility to quickly hack on top of it.
Restore sparsebundle extended attributes after rsync
If you rsync a sparse bundle to another Mac without the -E flag (or if you copy it to a non Mac system), you will loose the ability to double click on it in the Finder to mount it. This is because the extended attributes telling the Finder the folder is actually a bundle are lost in the transfer.
The following piece of code I found here fixed the problem for me. Simply compile and run the code giving the path to the bundle in parameter and it will restore the attributes.
/* setxattrs - bubbaATbubba.org Properly sets sparse bundle extended attributes lost when rsyncing sparse bundle data to a platform that does not support extended attributes. This is only need when restoring/retrieving the bundle. To use, sync/copy all bundle files, then run this tool on the sparse bundle: % gcc -o setxattrs setxattrs.c % ./setxattrs mybackup.sparsebundle % xattr -l mybackup.sparsebundle */ #include #include int main (int argc, const char * argv[]) { if (argc != 2) { printf("Usage: %s [sparse bundle directory]\n", argv[0]); printf("Sets extended attributes for sparse bundle disk image\n"); return 0; } int sxr; int options = XATTR_NOFOLLOW; char theValue1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; char theValue2[] = { 0x59, 0x48, 0x60, 0x38, 0x65, 0x7C, 0x09, 0x22, 0x33, 0xAD, 0xA5, 0x73, 0x12, 0xAD, 0xF3, 0x7F, 0xEE, 0x90, 0x5B, 0x92}; size_t theSize1 = sizeof(theValue1); size_t theSize2 = sizeof(theValue2); sxr = setxattr(argv[1], "com.apple.FinderInfo", (void *)theValue1, theSize1, 0, options); sxr = setxattr(argv[1], "com.apple.diskimages.fsck", (void *)theValue2, theSize2, 0, options); return 0; }
trafshow : Display current network traffic
Posted by mailletf in Technology on February 26, 2009
trafshow is a simple little program that displays the current traffic on a network interface.

It listens on a given interface in promiscuous mode and displays information on each connection, its remote address and the amount of traffic going back and forth.
It can easily be installed on a Mac via Macport.
Make OSX’s top behave like Linux’s top
Posted by mailletf in Apple, Technology on January 24, 2009
OSX’s top program doesn’t quite behave like its Linux counterpart out of the box. For me, the two biggest problems are that processes aren’t sorted by CPU usage and the top program itself uses 10% of the CPU because it calculates all sorts of statistics about memory and shared library usage that I personally don’t care about.
There are a series of flags that you can pass to OSX’s top to have its behavior be closer to Linux’s top. I have created the following alias to that effect :
alias mytop='top -s1 -o cpu -R -F'
The display is updated every second, processes sorted by CPU usage and no unnecessary statistics are calculated. Instead of 10%, top uses only 2% of the CPU.
Steve Jobs and Bill Gates at the same table
Posted by mailletf in Apple, Technology on January 19, 2009
Stumbled upon this very interesting series of videos that are about 2 years old.
In their rare joint appearance at All Things Digital 5, Steve Jobs and Bill Gates discuss their contributions to the technology industry, the qualities they most respect in one another.
You rarely get the perspective on a series of topics from these two giants. There are eleven videos in total. The first one is embedded below and you will get the following ones recommended at the end of clip.
pfSense : a software alternative to your old router/firewall
Posted by mailletf in Linux, Technology on January 13, 2009
My old D-Link router, like pretty much every other router I’ve ever owned, wasn’t very reliable in some way and so I was looking for open-source alternative firmwares like Tomato to flash it with. With the clear lack of effort put into the official firmwares, I thought it couldn’t hurt to try. Unfortunately, my router wasn’t supported by any third party firmware.
During my search, I however stumbled upon pfSense, a Free-BSD based router/firewall distro. It’s small (<100mb), runs on a 100MHz PC and includes all the features you would get on a very expensive commercial router (Firewall, NAT, VPN server, usage graphs, dynamic DNS support, per-ip bandwidth usage, QoS, etc).
I already had a dedicated fileserver so I installed pfSense as a VM on it using VMWare (I could also have done it with VirtualBox, a free alternative to VMWare). All you need are two NICs. I now only use my old router as a wireless access point because pfSense naturally has a DHCP server. I could even completely let go of my D-Link router if I added a wireless NIC in my server.
If you have an old PC lying around or one that could be a host to a pfSense VM, all you might need is an extra NIC to get an enterprise-grade router that will cooperate a lot more than any cheap 50$ D-Link/Linksys/Netgear/etc router.
Using filemerge for mercurial diffs
Posted by mailletf in Programming on January 8, 2009
A friend of mine found a script that brings up OSX’s FileMerge program instead of the text-based file comparisons you get with mercurial with doing an “hg diff”.
- download this script and make sure its location is in your PATH
- add the following to .hg/hgrc:
[extensions] hgext.extdiff = [extdiff] cmd.opendiff = fmdiff
- Now type hg opendiff <filename> (hg op is enough), instead of hg diff <filename>
Gondoles qui cèdent à Whistler
Comme vous le savez sûrement déjà, un poteau de la gondole Excalibur à Whislter a cédé mardi dernier. Heureusement, aucun blessé sérieux.
Drôle de hasard, je skiais à Whislter cette journée-là et je m’apprêtais à prendre cette gondole à la fin de ma journée. Étant donné qu’il n’y avait pas beaucoup de neige, le bas de la montagne était fermé et l’on devait monter et redescendre le bas en gondole. En descendant à pied le bas de la montagne, j’ai pu apercevoir au loin le poteau qui a fendu. Être descendu de la montagne 20 minutes plus tôt, je me serais sûrement fait prendre.
Je vous partage aussi deux belles photos de cette fantastique montagne pour ne pas uniquement se souvenir de ce malheureux épisode :




