Python for Bioinformatics: Flags, detail

Sunday, March 27, 2011

Flags, detail

A quick note about the flags. There are enough now that it's hard to be sure I got them all. So.. I went to the Flag Counter site (this page, and the next). Rather than do anything fancy, I just copied the text to a file and then processed it with Python. My database of flag images is from Wikipedia, and I shortened the file names by the country code. Since I can't remember a number of them, I wrote a Python script to harvest the Flag Counter entries and match them with the country codes (from here).

I checked the directory with the flag images by eye, which is almost certainly a mistake.

Here is the list of countries from which visitors to this site have come, in alphabetical order. The script is at the end. It shows a number of typical issues you run into with this kind of processing.

AE  United Arab Emirates
AL  Albania
AN  Netherlands Antilles
AR  Argentina
AT  Austria
AU  Australia
BB  Barbados
BE  Belgium
BG  Bulgaria
BH  Bahrain
BR  Brazil
BY  Belarus
CA  Canada
CH  Switzerland
CL  Chile
CN  China
CO  Colombia
CR  Costa Rica
CS  Serbia
CV  Cape Verde
CY  Cyprus
CZ  Czech Republic
DE  Germany
DK  Denmark
DZ  Algeria
EC  Ecuador
EE  Estonia
EG  Egypt
ES  Spain
FI  Finland
FR  France
GH  Ghana
GR  Greece
HK  Hong Kong
HR  Croatia
HU  Hungary
ID  Indonesia
IE  Ireland
IL  Israel
IN  India
IS  Iceland
IT  Italy
JM  Jamaica
JP  Japan
KR  South Korea
LT  Lithuania
LU  Luxembourg
MA  Morocco
MD  Moldova
MT  Malta
MU  Mauritius
MX  Mexico
MY  Malaysia
NL  Netherlands
NO  Norway
NZ  New Zealand
PA  Panama
PE  Peru
PH  Philippines
PK  Pakistan
PL  Poland
PR  Puerto Rico
PT  Portugal
QA  Qatar
RO  Romania
RU  Russia
SA  Saudi Arabia
SE  Sweden
SG  Singapore
SI  Slovenia
SK  Slovakia
SV  El Salvador
TH  Thailand
TN  Tunisia
TR  Turkey
TT  Trinidad and Tobago
TW  Taiwan
UA  Ukraine
UK  United Kingdom
US  United States
UY  Uruguay
VE  Venezuela
VN  Vietnam
ZA  South Africa

from utils import load_data

specials = { 'South_Korea':'Korea_(South)',
             'Russia':'Russian_Federation',
             'New_Zealand':'New_Zealand_(Aotearoa)',
             'Serbia':'Serbia_and_Montenegro',
             'Croatia':'Croatia_(Hrvatska)',
             'Vietnam':'Viet_Nam' }

data = load_data('country-codes.txt')
D = dict()
for line in data.strip().split('\n'):
    L = line.strip().split()
    D['_'.join(L[1:])] = L[0]

cL = list()
data = load_data('scraped.txt')
for line in data.strip().split('\n'):
    L = line.strip().split()
    i = len(L) - 4
    country = '_'.join(L[1:i])
    cL.append(country)
    if country in specials:
        k = specials[country]
        D[country] = D[k]

def f(k):  return D[k]
for country in sorted(cL, key=f):
    print D[country],'\t', country.replace('_',' ')