## Friday, January 29, 2010

### Flatten a list in Python

I asked another question on Stack Overflow, this time about "flattening" a list of lists. If the input is

`L = [[[1, 2, 3], [4, 5]], 6]`

the output should be

`[1, 2, 3, 4, 5, 6]`

The (simple) solution that I liked the best is a generator (i.e. it returns an iterator). I changed the variable names slightly:

 `import collectionsIt = collections.Iterabledef flatten(L): for e in L: if isinstance(e, It) and not isinstance(e, basestring): for sub in flatten(e): yield sub else: yield e`

In the interpreter:

 `>>> L = [[[1, 2, 3], [4, 5]], 6]>>> flatten(L)>>> list(flatten(L))[1, 2, 3, 4, 5, 6]`

As you can see, this works, and it also works if one of the elements is a string or unicode object (that's the reason for checking isinstance(e, basestring).

 `>>> list(flatten([[['abc',2,3],[4,5]],6]))['abc', 2, 3, 4, 5, 6]`

Recursion will fail if the "depth" is too great. SO user unutbu came up with another version that doesn't use recursion, which Alex Martelli simplified:

 `def genflat(L, ltypes=collections.Sequence): L = list(L) while L: while L and isinstance(L[0], ltypes): L[0:1] = L[0] if L: yield L.pop(0)`

There's a bit of trickery here. Suppose we're just starting with

`L = [[[1, 2, 3], [4, 5]], 6]`

The first element, `L[0]`, is `[[1, 2, 3], [4, 5]]`

so we go into the second while and we repeatedly do

`L[0:1] = L[0]`

 `>>> L = [[[1, 2, 3], [4, 5]], 6]>>> L[0][[1, 2, 3], [4, 5]]>>> L[:1][[[1, 2, 3], [4, 5]]]>>> L[1:][6]`

We continue until `L[0]` is not an iterable…

 `>>> L[0:1] = L[0]>>> L[[1, 2, 3], [4, 5], 6]>>> L[0:1] = L[0]>>> L[1, 2, 3, [4, 5], 6]`

Assigning `L[0:1]` to be = `L[0]` removes the nesting!

Note that this will choke on a string element (we need to guard against that as we did in the first version.

And a little matter of style. For short-lived variables I prefer simple names: i (an index), N (a numerical constant), s (a string), L (a list), D (a dictionary). For many people, capitalized words (even words of a single letter) should be reserved for class objects. This instinct is so strong in them that the people posting on SO followed my usage of a single letter "l" for the list variable, but made it lowercase. Of course, this is also a bad idea, because it can be confused with the digit "one", but they probably figured that was my problem.