Tuesday, October 27, 2009

The way of the program



You're a scientist---like maybe a biologist. Why should you become a Python coder?



Because the future of Biology is data, and lots of it. Unfortunately, data does not always come in the form you would wish, or that the software you want to use expects. Python's primary purpose is to massage data to get it the way you need it and like it.

And the secret about Python is that it's very easy to use. What you do every day is hard, and you're smart. Python is easy.

Now, if you've never thought at all about how computer programs do what they do, it'll take you a little while to wrap your head around that, a few hours perhaps. But learning to use Python productively is also just a matter of a few hours. An afternoon or two, tops. Can you spare them?

Here's an example. I'm in iTunes,



having just loaded a bunch of songs so that I can put them on an iPod to play in the car. I get to thinking, I wonder if iTunes can export a playlist? Of course it can!

From the File menu do Library > Export Playlist…






You could bring it up in Word (just kidding) or … How about TextEdit? Just drop the file on the application icon. It looks like this:



Hmm… lots of what look like they could be columns (because they are separated by what look like they could be tabs). How about Numbers? Drop the same file on that application icon (or Excel, if you insist) and it looks like this:



I notice that there is a column that I haven't seen in the playlist, it's called year (year of recording). To make things simpler, select that column in Numbers, then copy and paste it into a new Numbers document. It looks like this:



Now what? Numbers can export too. It looks like this:



Now bring that file up in TextEdit. It looks like this:



That file sure has a lot of commas in it. Well, sh*t. This is a job for … Python. Put the following in a file called script.py in the same directory as the file with the commas (like your Desktop). I named the file: "fileWithLotsOfCommas", it has a .csv extension if you look using get Info.

The script.py file has:


FH = open('fileWithLotsOfCommas.csv')
data = FH.read()
data = data.replace(',','')
data.strip()
print data


Save it in a file made with TextEdit that is on your Desktop, named script.py. You want plain text. (If you have what looks like this in Text Edit:



choose the appropriate item from the menu, then do Save.

Now go to the Terminal and do:

python script.py > results.txt

The file looks like this:



Remove the two lines at the top by hand. Now, you probably don't have R installed yet. But I do. I go to R and do:

setwd('Desktop')
v = read.table('results.txt',head=F)
hist(v[,1], col='magenta',
breaks=50, xlim=c(1960,2010))




My formula for the user's age: take the first peak and subtract 18.

Here is what Numbers gives. It's not nearly so pretty.