Friday, January 29, 2010

Flatten a list in Python

I asked another question on Stack Overflow, this time about "flattening" a list of lists. If the input is

L = [[[1, 2, 3], [4, 5]], 6]

the output should be

[1, 2, 3, 4, 5, 6]

The (simple) solution that I liked the best is a generator (i.e. it returns an iterator). I changed the variable names slightly:


import collections
It = collections.Iterable

def flatten(L):
for e in L:
if isinstance(e, It) and not isinstance(e, basestring):
for sub in flatten(e):
yield sub
else:
yield e


In the interpreter:

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> flatten(L)
<generator object flatten at 0x100492140>
>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]


As you can see, this works, and it also works if one of the elements is a string or unicode object (that's the reason for checking isinstance(e, basestring).

>>> list(flatten([[['abc',2,3],[4,5]],6]))
['abc', 2, 3, 4, 5, 6]


Recursion will fail if the "depth" is too great. SO user unutbu came up with another version that doesn't use recursion, which Alex Martelli simplified:

def genflat(L, ltypes=collections.Sequence):
L = list(L)
while L:
while L and isinstance(L[0], ltypes):
L[0:1] = L[0]
if L: yield L.pop(0)


There's a bit of trickery here. Suppose we're just starting with

L = [[[1, 2, 3], [4, 5]], 6]

The first element, L[0], is [[1, 2, 3], [4, 5]]

so we go into the second while and we repeatedly do

L[0:1] = L[0]

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> L[0]
[[1, 2, 3], [4, 5]]
>>> L[:1]
[[[1, 2, 3], [4, 5]]]
>>> L[1:]
[6]


We continue until L[0] is not an iterable…

>>> L[0:1] = L[0]
>>> L
[[1, 2, 3], [4, 5], 6]
>>> L[0:1] = L[0]
>>> L
[1, 2, 3, [4, 5], 6]


Assigning L[0:1] to be = L[0] removes the nesting!

Note that this will choke on a string element (we need to guard against that as we did in the first version.

And a little matter of style. For short-lived variables I prefer simple names: i (an index), N (a numerical constant), s (a string), L (a list), D (a dictionary). For many people, capitalized words (even words of a single letter) should be reserved for class objects. This instinct is so strong in them that the people posting on SO followed my usage of a single letter "l" for the list variable, but made it lowercase. Of course, this is also a bad idea, because it can be confused with the digit "one", but they probably figured that was my problem.