Python for Bioinformatics: Python csv module

Tuesday, January 26, 2010

Python csv module

I'm trying to do what Steven Lott recommends on SO, to use the standard library. He also often says:

"Use the source, Luke."

In this case, the module of interest is csv. We don't need the source here. Normally you would use it something like this:

import csv
fn = 'mydata.csv'
reader = csv.reader(open(fn,'rb'))
for line in reader:
    dosomething(line)

csv.reader is initialized with a file object (what I learned to call a file "handle"), for which I typically use the variable name FH. But it will also work with data. So we can use it as shown here:

>>> import csv
>>> data = 'a,b\n1,2\n3,4\n'
>>> reader = csv.reader(data.split('\n'))
>>> for row in reader:
...     print row
... 
['a', 'b']
['1', '2']
['3', '4']
[]

We should have done strip() on the data before split(), that will get rid of the extra element which is preserved as an empty list in the output. csv also has cool DictReader (and DictWriter) objects:

>>> import csv
>>> data = 'a,b\n1,2\n3,4\n'
>>> reader = csv.DictReader(data.split('\n'))
>>> for D in reader:
...     print D
... 
{'a': '1', 'b': '2'}
{'a': '3', 'b': '4'}

I don't know a way to get DictWriter to write the header. I guess you'd just do it separately with FH.write() when the file is first opened.

>>> import csv
>>> data = 'a,b\n1,2\n3,4\n'
>>> reader = csv.DictReader(data.split('\n'))
>>> L = [D for D in reader]
>>> print L
[{'a': '1', 'b': '2'}, {'a': '3', 'b': '4'}]
>>> FH = open('result.csv', 'w')
>>> writer = csv.DictWriter(FH, fieldnames = ['a','b'])
>>> writer.writerows(L)