I wrote a script that I explained briefly in a previous post (download here: bloggerScript.py), which uses ElementTree in Python to parse content from a blogger XML archive for image links. If you're editing a post, or just using your browser in the normal way and looking at a page's source which contains an image, you'll see something like
<a onblur="try ...
We use the string method "find" to get the indexes flanking the substring of interest. A subtle bug that I had in a previous version was that I failed to specify i+1 in the line:
j = s.find('/a>',i+1)
The result was code that apparently worked, but found only the first image if there was more than one. This happens because, on the second time through the loop, i was correctly set to the start of the second element, but j was still the first end-tag, and with
j < i, we get an empty string with
The second part of the function finds the href for each link and saves it in a list to return. I end up with a containing the title of each post that has at least one image, and all the image links. I just saved them in a text file.