Sunday, December 13, 2009

Matplotlib in OS X (1)


Although matplotlib is a sophisticated approach to plotting and it uses Python, I have used R instead for two reasons: Bioconductor uses R, and I couldn't get matplotlib installed properly. Since PyCogent uses matplotlib and I've got it working now, I thought I'd take a look.

The immediate goal is to get some numbers to use to work up a simple example of Principal Coordinates Analysis with PyCogent (and try to remember the difference with Principal Component Analysis). According to the abstract of this book chapter:

The problem is that PCA is based on the correlation or covariance coefficient, and this may not always be the most appropriate measure of association. Principal coordinate analysis (PCoA) is a method that, just like PCA, is based on an eigenvalue equation, but it can use any measure of association (Chapter 10). Just like PCA, the axes are plotted against each other in a Euclidean space, but the PCoA does not produce a biplot (a joint plot of the variables and observations).


This should all be similar to what we did previously in R. Here is a Python script, in two parts. The first part generates generates the points plotted in red as separate lists of x and y values, and makes a scatterplot. 's' is for the size of the plot character (weirdly, 'size' doesn't work).

The second part uses the array manipulation from numpy to rotate these points. The transformation matrix is t, and the multiplication is achieved by dot(t,a). The matplotlib OS X GUI doesn't work properly, so I just save the output as a PDF. (In contrast, the R OS X GUI is beautiful and always works perfectly for me. I think the matplotlib developers must be Windows or Unix guys. That would also explain why the Makefile is out of date).

The figure is a little off because I haven't figured out how to set the plot limits yet.

Here's the code:

import matplotlib.pyplot as plt
from numpy import *
from math import sqrt
import random
random.seed(1357)

R = range(-100,100)
def fx(): return random.choice(R)
r = range(-10,10)
def fy(): return random.choice(r)

xL = [fx() for i in range(30)]
yL = [fy() for i in range(30)]

fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xL,yL,s=250,color='r',marker='o')

#----------------------------------------------

a = xL+yL
a = array(xL + yL)
a.shape = (2,len(xL))

z = 1.0/sqrt(2)
t = [[z,-z],[z,z]]
t = array(t)
xL,yL = dot(t,a)

ax2 = fig.add_subplot(111)
ax2.scatter(xL,yL,s=250,color='b',marker='o')
plt.grid(True)
plt.savefig('example.pdf')

3 comments:

aonlazio said...

Superb

aonlazio said...

Do you know how to plot PCoA using a distance matrix like there are 10 samples and then plot those samples in 2D?

telliott99 said...

I don't have step-by-step instructions, but you could start with this post and the next one: http://telliott99.blogspot.com/2010/02/unifrac-6-understanding-pcoa-setup.html