Wednesday, June 19, 2013

matplotlib followup for Python 3 on OS X 10.8

In my last post, I reported success in installing matplotlib on OS X. I realize now that this is not anything new to report, since the currently recommended install process (which I read belatedly today) in the README is to use Homebrew (or MacPorts), which is what I did. I'm just happy to know that it works.

I thought I'd say a word about Python 3 here.

I did

$ brew install python3 --framework

which first installed readline and sqlite (linked into /usr/local/opt), as well as gdbm and xz, and also Distribute and pip.

So now we have:

$ ls /usr/local/lib/python3.3/site-packages/
__pycache__    setuptools-0.6c11-py3.3.egg-info
distribute-0.6.45-py3.3.egg  setuptools.pth
easy-install.pth   site.py
pip-1.3.1-py3.3.egg   sitecustomize.py


$ python3
Python 3.3.2 (default, Jun 19 2013, 15:41:32) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from test import autotest
== CPython 3.3.2 (default, Jun 19 2013, 15:41:32) [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
==   Darwin-12.4.0-x86_64-i386-64bit little-endian
==   /usr/local/bin
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1)
[  1/372] test_grammar
[  2/372] test_opcodes
[  3/372] test_dict
[  4/372] test_builtin

The tests hang at this point, and I haven't figured out why yet. They do fine with my system Python. Then I did:

$ pip3 install numpy
$ pip3 install nose

since we need numpy for matplotlib (and don't get it for free as with System Python).

Following this advice, I did:

$ python3
..
>>> import numpy
>>> numpy.test('full')

Ran 4808 tests in 64.611s

OK (KNOWNFAIL=6, SKIP=6)
<nose.result.TextTestResult run=4808 errors=0 failures=0>

Now to matplotlib

$ git clone git://github.com/matplotlib/matplotlib.git
$ cd matplotlib
$ python3 setup.py build
$ sudo python3 setup.py install

(Grabs pyparsing, dateutil, tornado).

$ cd..
$ python3
..
>>> import matplotlib.pyplot as plt
>>> Y = [1,4,9,16]
>>> plt.scatter(range(len(Y)),Y,s=250,color='r')
<matplotlib.collections.PathCollection object at 0x1094e4650>
>>> plt.savefig('example.png')

works!

$ pip3 install cython

had a permissions problem, so I did

$ sudo chmod -R 755 /usr/local/lib/python3.3/site-packages/
$ pip3 install cython

Now, for scipy

$ git clone git://github.com/scipy/scipy.git scipy
$ cd scipy
$ python3 setup.py build
$ sudo python3 setup.py install

$ cd ..
$ python3
..
>>> from scipy.stats import norm
>>> norm.cdf(2)
0.97724986805182079

>>> import scipy
>>> scipy.test('full')
..

Ran 7486 tests in 725.812s

FAILED (KNOWNFAIL=42, SKIP=296, failures=82)


Some failures, but all in all it looks good, and a very easy install.

Tuesday, June 18, 2013

Matplotlib (and SciPy) on OS X Mountain Lion

Yesterday I installed matplotlib on my MacBook---for about the 10th time. This can be a complex undertaking, but a basic installation is relatively easy, so I thought I would outline it here.

Some things I did that made it easier:

• I used a clean install of OS X. I did this to make sure cruft (links, alternative versions of libraries from old installs) does not interfere. Also, this way I can be sure that things work for the reasons I have written about here.

To do the install I just made a second partition on my hard drive and installed OS X on it from a USB installer. If you've never done this before, good instructions for the USB part are here.

It's quite straightforward. Use the App store to update when the install is complete. I now have OS X 10.8.4.

Although it's not necessarily good practice, I used the same username and password for my accounts on OS X on both partitions, this allows me to read and write files in both accounts easily using the Finder.

• I used the System Python. This doesn't appear to be a very popular choice, but it avoids the headache of having multiple Pythons and various $PATH issues, etc. If you do install a second Python, make sure it's a framework version, since even saving a plot as a png using matplotlib will fail otherwise. I may do some more experiments later, but I think this is the right choice

My Python is

$ which python
/usr/bin/python

(well, it's not really the whole of Python on OS X, but that is a whole 'nother story).

$ python
Python 2.7.2 (default, Oct 11 2012, 20:14:37)

A bonus is that we already have numpy

>>> import numpy
>>> numpy.__version__
'1.6.1'

• I used Homebrew to get the most important matplotlib prerequisites, libpng and freetype. zlib is also a prerequisite but comes with OS X.

The instructions for Homebrew are here. Weirdly, I got a prompt for an admin password, which happened because the directory where Homebrew puts stuff, /usr/local, did not exist yet. So I bailed from the process and first did:

sudo mkdir /usr/local

Now, look for issues revealed by brew doctor and fix them if needed. Finally:

$ brew doctor
Your system is ready to brew.
$ brew update
Already up-to-date.
$ brew install pkgconfig libpng freetype
..
$ brew list
freetype libpng  pkg-config

the "Bottles" mentioned in the output I cut means these are pre-built.

I got pkgconfig because it helped with building/linking problems previously, but should not really be necessary when the libraries are in /usr/local. Not sure why it is some times pkg-config and other times pkgconfig, but OK:

$ ls /usr/local/bin/pkg-config 
/usr/local/bin/pkg-config
$ ls /usr/local/lib/pkgconfig/
freetype2.pc libpng.pc libpng15.pc

(If you use e.g. the XQuartz versions of these two libraries, you can set the PKG_CONFIG_PATH environment variable to point to the correct directory, and pkg-config will do the rest).

Now we're ready for matplotlib.

$ git clone git://github.com/matplotlib/matplotlib.git
..
$ cd matplotlib
$ python setup.py build
$ sudo python setup.py install

And test it:

cd ..
python 
>>> import matplotlib.pyplot as plt
>>> Y = [1,4,9,16]
>>> plt.scatter(range(len(Y)),Y,s=250,color='r')
<matplotlib.collections.PathCollection object at 0x103963110>
>>> plt.savefig('example.png')

Works!

>>> import matplotlib
>>> matplotlib.test()
..KK.KK.....KK.KK..KK.KK.KK.KK.KK  etc.

I didn't test extensively but there don't seem to be massive failures. And matplotlib has lots of optional prerequisites I didn't install that would explain some failures.

It's worth pointing out that the matplotlib install process grabs several additional packages:

$ ls /Library/Python/2.7/site-packages
README
distribute-0.6.28-py2.7.egg
easy-install.pth
matplotlib-1.4.x-py2.7-macosx-10.8-intel.egg
nose-1.3.0-py2.7.egg
pyparsing-1.5.7-py2.7.egg
setuptools.pth
tornado-3.1-py2.7.egg
$

Let's get a few more things while we're at it:

$ sudo easy_install pip
Password: ****
Searching for pip
Reading http://pypi.python.org/simple/pip/
Best match: pip 1.3.1
..
Installing pip script to /usr/local/bin
Installing pip-2.7 script to /usr/local/bin
..

Come to think of it, SciPy would be a great addition. For that we need gfortran (Fortran compiler) and Cython:

$ brew install gfortran
(takes a while, several dependencies)..

Homebrew doesn't have Cython so:

$ sudo pip install cython

And now for SciPy:

$ git clone git://github.com/scipy/scipy.git scipy

$ cd scipy
$ python setup.py build
$ sudo python setup.py install

$ cd ..
$ python
..
>>> from scipy.stats import norm
>>> norm.cdf(2)
0.97724986805182079
If you check for the libraries loaded (I used export DYLD_PRINT_LIBRARIES=1), you'll see we're using Accelerate.

Finally, get PyCogent too..

$ git clone git://github.com/pycogent/pycogent.git
$ cd pycogent
$ python setup.py build
$ sudo python setup.py install

That was pretty easy!

Thursday, August 9, 2012

See ya

Posting has become much rarer here recently, as my energy has flagged.

And this morning I have a rather preemptory request that I fix the old links that were broken as a result of Apple's ditching iDisk. While I still have the files, somewhere, and I could theoretically put them on Dropbox, I'm really not ready to fix what must be several hundred links.

So I thought I would post this warning as what will probably be my last post. It's been fun, and I'm grateful to my readers for having taken an interest. We had more than 50,000 unique visitors and over 150 flags. I just wish someone from Tonga had come by.

If you really must have an old file, I can probably find it.

[ UPDATE: I made a zip of the directory tree which was used to be on iDisk under
http://web.mac.com/telliott99/Python/

It is on Dropbox here and weighs in at 17.8 MB. If you check the old links that don't work, the file should be in that archive. Let me know if you can't find something you'd like.

Tuesday, July 31, 2012

More fun with geometry

I ran into a fun problem with parallelograms the other day.


The graphic shows a parallelogram broken up into pieces by its diagonals.

Since the opposing sides of a parallelogram are parallel and equal, it's easy to show that the top and bottom triangles are congruent, just rotated, likewise with the other two.

The problem posed was to express the total area in terms of the obtuse angle at the center, the one which is > 90 degrees. Let's call that angle A, and its supplementary angle (the acute angle at the center) B.

The reason why I've drawn the figure in this way is that, as shown below, the triangles can be rearranged to give a different paralellogram with the same area.



x = length of long diagonal (orange + maroon)
y = length of short diagonal
y/2 = 1/2 length of short diagonal (blue)

Area = x (y/2) sin B

The sine values for supplementary angles are equal.

sin (π/2 - θ) = sin θ (ref)

So:

Area = x (y/2) sin A

Nice! Much better than using the law of sines, and for the vectorists out there, expressed even more compactly as the cross-product.

Saturday, June 16, 2012

variadic arguments

I got curious about functions with variadic arguments like in NSArray:

+ (id)arrayWithObjects:(id)firstObj, ...


Here's are two C examples. In either case, it's required to have a named argument that precedes the variable part. In the first method, that variable gives a count of the arguments. In the second, it's not used for anything except to show where to start. A sentinel value shows the end of the variable length argument list. I also incorporated __func__, which I came across in my reading and was new to me.

// clang variadic.c -o prog
#include <stdio.h>
#include <stdarg.h>

int f(int count, ...)
{
  printf("%s%s", __func__, "\n");
  int n, sum; 
  va_list ap;
  va_start(ap, count);
  sum = 0;
  for ( ; count > 0; count--) {
      n = va_arg(ap, int);
      printf("n = %3d\n", n);
      sum += n;
  }
  va_end(ap);
  return sum;
}

int g(int x, ...)
{
  printf("%s%s", __func__, "\n");
  fflush(stdout);
  int n, sum; 
  va_list ap;
  va_start(ap, x);
  n = x;
  sum = 0;
  while (!(0)) {
      printf("n = %3d\n", n);
      sum += n;
      n = va_arg(ap, int);
      if (n == 0) { break; }
  }
  va_end(ap);
  return sum;
}

int main(int argc, char * argv[]) {
    int result = f(3,1,2,3);
    printf ("sum = %d\n", result);
    result = g(1,2,3,0);
    printf ("sum = %d\n", result);
    return 0;
}

> ./prog
f
n =   1
n =   2
n =   3
sum = 6
g
n =   1
n =   2
n =   3
sum = 6

NSString (1)



Test harness:

// clang strings1.m -o prog -framework Foundation -fobjc-gc-only
#include <Foundation/Foundation.h>

int main(int argc, char * argv[]) {

    // ..code here..

    return 0;
}

NSString *s;
    NSMutableArray *ma = [NSMutableArray arrayWithCapacity:10];

    char *cstr = "abc";   // real C string w/ "\0" also works
    uint ascii = NSASCIIStringEncoding;
    uint utf8 = NSUTF8StringEncoding;
    [ma addObject:[NSString stringWithCString:cstr encoding:ascii]];
    [ma addObject:[NSString stringWithCString:cstr encoding:utf8]];
    [ma addObject:[NSString stringWithUTF8String:cstr]];
    NSMutableString *ms = [[NSMutableString alloc] init];
    [ms appendString:@"abde"];
    [ms insertString:@"c" atIndex:2];
    [ma addObject:[ms copy]];  // otherwise code below will alter
    [ms appendString:@"fghij"];
    [ms replaceOccurrencesOfString:@"B"
                        withString:@"*"
                           options:NSCaseInsensitiveSearch
                             range:NSMakeRange(0,2) ];
    [ma addObject:ms];
    for (id obj in ma) { NSLog(@"%@ %@", obj, [obj class]); }
    [ma removeAllObjects];
    printf("-----------------\n");



> ./prog
2012-05-29 16:45:03.624 prog[444:707] abc __NSCFString
2012-05-29 16:45:03.626 prog[444:707] abc __NSCFString
2012-05-29 16:45:03.627 prog[444:707] abc __NSCFString
2012-05-29 16:45:03.627 prog[444:707] abcde __NSCFString
2012-05-29 16:45:03.628 prog[444:707] a*cdefghij __NSCFString
-----------------

[ma addObject:[NSString stringWithFormat:@"abcd"]];
    [ma addObject:[NSString stringWithFormat:@"ab%@", @"cd" @"ef"]];
    [ma addObject:[NSString stringWithFormat:@"a%C%C", 0x62, 0x63]];
    char b[] = { 0x61, 0x62, 0x63, 0x64, 0x65 };
    NSData *d = [NSData dataWithBytes:b length:5];
    s = [[NSString alloc] initWithData:d encoding:utf8];
    [ma addObject:s];
    for (id obj in ma) { NSLog(@"%@ %@", obj, [obj class]); }
    printf("-----------------\n");

2012-05-29 16:45:03.629 prog[444:707] abcd __NSCFString
2012-05-29 16:45:03.629 prog[444:707] abcdef __NSCFString
2012-05-29 16:45:03.630 prog[444:707] abc __NSCFString
2012-05-29 16:45:03.630 prog[444:707] abcde __NSCFString
-----------------

d = [NSData dataWithBytes:b length:5];
    [d writeToFile:@"x" atomically:YES];
    NSError *err = nil;
    NSStringEncoding enc = NSUTF8StringEncoding;
    s = [NSString stringWithContentsOfFile:@"x" 
                              usedEncoding:&enc
                                     error:&err];
    if (err != nil) { NSLog(@"%@", [err userInfo]); }
    else {  NSLog(@"%@", s );  }

    b[4] = 0xff;
    d = [NSData dataWithBytes:b length:5];
    [d writeToFile:@"x" atomically:YES];
    s = [NSString stringWithContentsOfFile:@"x" 
                              usedEncoding:&enc 
                                     error:&err];
    if (err != nil) { NSLog(@"%@", [err localizedDescription]); } 
    else {  NSLog(@"%@", s );  }

2012-05-29 16:45:03.641 prog[444:707] abcde
2012-05-29 16:45:03.658 prog[444:707] The file “x” couldn’t be opened because the text encoding of its contents can’t be determined.


Thursday, May 31, 2012

Burroughs-Wheeler Transform-BWT (3)

I'm exploring the Burroughs-Wheeler Transform, which has become important for fast methods of matching short reads to large genomes. The first two posts are here and here.

In the first we saw what BWT is and showed how to reverse or invert it. In the second, we saw how to use BWT to do a search. Here, I show a fairly efficient way to generate the BWT for a sequence.

Last time, we used a script which does a rotation on a deque followed by sorting to generate the BWT of a sequence:

GAATTCAAGCTTGGATCCGGAAAGATCTGATC

and obtained

GACGAAGGGATTCTGTG^GATACTTAAACTACC

The result matches what the visualization tool on this page generates. However, our script does a lot more work (and uses more memory) than is really necessary. If we look again at the first few lines of output from bwt.py:

20 AAAGATCTGATC^GAATTCAAGCTTGGATCCGG
21 AAGATCTGATC^GAATTCAAGCTTGGATCCGGA
 6 AAGCTTGGATCCGGAAAGATCTGATC^GAATTC

Each number is the index into the sequence where the row (the suffix) begins (0-based indexing). The array of numbers from top to bottom of the output is the suffix array.

As you can see from the last part of the script below, the BWT is easily generated from the suffix array, it's just text[j-1] for each j in the array. That's because the BWT is the last column and the sequences wrap, so each character in the BWT is -1 with respect to the character in the first column for any row.

The Python script shown here sorts the rows but only looks at as many of the characters as are needed to do the sort. We have a list L to contain the growing suffix array. This consists of indexes j into the text, one corresponding to each row, which are in sorted order in the suffix array.

We move an index i along the text and for the suffix starting at each i, find the place where it belongs in the suffix array, then insert it.

I'm sure there are even faster ways to generate the BWT for a text. The most glaring problem with this attempt is that repeated insertion into a list is expensive. We need a data structure that combines ease of insertion with ease of traversal (like a linked list).

Output:

> python fast_bwt3.py 
GACGAAGGGATTCTGTGGATACTTAAACTACC 

 20  21   6   1  22   7  14  24  29   2
  5  16  17  26   9  31  19  13  23  28
  8  18  12   4  15  25  30  27  11   3
 10  32

If you compare this with what we got last time you'll see it matches.

fast_bwt.py

#text = 'BANANA'
text = 'GAATTCAAGCTTGGATCCGGAAAGATCTGATC' 
t = text + '^'
N = len(t)

# L is the suffix array
L = list()

# i: index to walk current suffix along text
# j: indexes for sorted suffix already in suffix array
# k: enumeration of L, to give insertion point

def doOne(i,L):
    for k,j in enumerate(L):
        n = 0
        while t[i+n] == t[j+n]:
            n += 1
        if t[i+n] < t[j+n]:
            L.insert(k,i)
            return
    L.append(i)

for i in range(1,N):
    doOne(i,L)

bwt = ''.join([t[j-1] for j in L])
print bwt, '\n'
for k,j in enumerate(L):
    if k and not k % 10:
        print
    print '%3d' % j,