Python for Bioinformatics: defaultdict

Tuesday, April 24, 2012

defaultdict

A common use case for dictionaries is to hold a count of the occurrences of a key in some other object, for example, words in a file. In this case, one needs to deal with missing keys:

>>> L = list('abcaba')
>>> D = dict()
>>> for k in L:
...     try:
...         D[k] += 1
...     except KeyError:
...         D[k] = 1
... 
>>> D
{'a': 3, 'c': 1, 'b': 2}

Another solution is to use the setdefault method of dicts:

>>> D = dict()
>>> for k in L:
...     D.setdefault(k,0)
...     D[k] += 1
... 
0
0
0
1
1
2
>>> D
{'a': 3, 'c': 1, 'b': 2}

A better way (certainly better than this second approach) is to use a defaultdict:

>>> from collections import defaultdict
>>> D = defaultdict(int)
>>> for k in L:
...     D[k] += 1
... 
>>> D
defaultdict(, {'a': 3, 'c': 1, 'b': 2})

Some of the magic involves __missing__:

>>> 'd' in D
False
>>> D.__missing__('d')
0
>>> 'd' in D
True
>>> D
defaultdict(, {'a': 3, 'c': 1, 'b': 2, 'd': 0})

For more you can go to the docs or here or here, or just do this:

>>> print defaultdict.__missing__.__doc__
__missing__(key) # Called by __getitem__ for missing key; pseudo-code:
  if self.default_factory is None: raise KeyError((key,))
  self[key] = value = self.default_factory()
  return value

[ UPDATE: The collections.Counter class was mentioned in a comment. Don't know why I didn't mention it in this post, I've used it in many other posts here. If it's convenient to convert the data to a list of keys, that's the way to go. ]