Tuesday, November 10, 2009

Xgrid: conclusions

I've been posting about my experiments with Xgrid on a plain Mac OS X 10.6 installation with my one-year old MacBook. I was getting to the point of closing out these experiments, because although I did have BLAST working on Sunday, it was only accomplished by using a security-compromising hack. I've concluded there is no way to use this for anything serious without Server.

The bottom line seems to be that you cannot believe the hype (see this shiny web page), which is entitled Xgrid: High Performance Computing for the Rest of Us. It contains 68 matches for xgrid and 0 matches for Server. They certainly do not advertise this.

To summarize, what I wanted to do is:

• activate Xgrid on my machine (no Server)
• run BLAST via Xgrid on my machine plus other lab machines

There are still are unresolved issues with starting up xgridcontrollerd from the command-line, although the Snow Leopard Server Tools utility Xgrid Admin helps with that, but these are not deal-killers. However, there are very serious issues with authentication and permissions which mean that you cannot use Xgrid as anything but a toy without OS X Server. There doesn't seem to be any reason that this needs to be so, it is just Apple's choice. The reason for this restriction is the sandbox (e.g. see here).

Adding insult to injury, today I started working to re-run the tests for the final posts (about BLAST), but none of the things that I did previously is working. I'm fully into Cargo Cult mode, messing with XgridLite and passwords, or trying to set them with the copy hack.


sudo cp /etc/xgrid/agent/controller-password \
/etc/xgrid/controller/client-password

sudo cp /etc/xgrid/agent/controller-password \
/etc/xgrid/controller/agent-password


Jobs submitted using xgrid make it to the controller, they just don't run.

It's like Feynman describes in the last chapter of "Surely You're Joking, Mr. Feynman"

During the war they saw airplanes land with lots of good materials, and they want the same thing to happen now. So, they've arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for the man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas---he's the controller---and they wait for the airplanes to land. They're doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn't work. No airplanes land. So I call these things cargo cult science ...


The xgrid jobs just hang. It is not an authentication problem, at least not with the controller, because without the correct password, they are rejected with:

BEEPError 535 (Authentication failure)

The jobs are visible in Xgrid Admin, they're listed as pending. And they remain pending even with a stop and restart.

Perhaps it has to do with updating to 10.6.2 an hour ago...

Moving beyond, you may still be interested in playing with xgrid. You can run code installed in various places (even /Users/Shared). But to have that code open another file, you have three two choices:

• give user "nobody" the same access as "somebody"

These aren't options:

• run in what's called the "sandbox" as user "nobody"

• use Kerberos and run as "somebody"

Particularly since agent-to-controller authentication is not working, the first option (the only solution), while it allows testing, is not recommended. Option 2 doesn't work if you need a database, and option 3 requires Server.

What I did, based on the link:


sudo mv /usr/share/sandbox/xgridagentd_task_nobody{,.orig}.sb
sudo cp /usr/share/sandbox/xgridagentd_task_{some,no}body.sb


I was able to run BLAST using a database in /Users/Shared after doing this.

You win, Apple. How about sending me one of these?



Is this also just a toy?