Thursday, November 5, 2009

Xgrid baby steps

I got interested in using Xgrid for "distributed multiprocessing." Although there is quite a bit of information out there, much of it was developed for OS X 10.4 Tiger and it's clear there have been changes (often steps backward in terms of ease of use) with Leopard and now, Snow Leopard. In particular, Apple would like me to purchase Server software, but I don't have $500 to spend when I'm just trying to explore a solution to a potential problem that I don't have yet.

Sources I've found include:

• an extensive set of tutorials by Charles Parnot at Macresearch
• a earlier tutorial by Drew McCormack
• Apple docs (guide and update)
• an article at Mac OS X hints (link)

From what I can tell, all the technology to do this is available with a standard OS X installation, it just doesn't have a pretty, time-saving GUI. First, some terminology. We have a:

• Client---originates the request for a job
• Controller---maintains a queue of jobs and distrbutes them to
• Agents---receive individual jobs and return results

In a high-class setup the Controller would run Server, in this exploration, my MacBook will play all three roles.

I would have liked to follow the Parnot tutorial exactly, but he provides software to set up the controller that isn't explained. Since they are not scripts, there wasn't a way for me to take them apart and see what they do. (And since they're Tiger, who knows if they'll still work?)

Let's try to do it ourselves. The first issue to deal with is the firewall. As described (here and here), Xgrid uses certain ports for communication (4111-4120 plus some other guesses only 4111 according to Apple). No guarantees, and in any event Snow Leopard itself doesn't allow fine-grained control over ports in System Prefs. No doubt, somewhere there is a command-line tool or other method to control this, but I don't know about it. So, for now, I just disconnected from the internet and then turned off the firewall.

Next, set up Xgrid Sharing in System Prefs: under Configure... use localhost as the controller, and set always accept tasks. In the main window, I set Computer Name: localhost. Set a password.

It's not really clear to me what we are setting here. In the small, inner window, it feels like we're doing Client settings, while in the big window, we're doing Agent settings. The password is what the Controller will use to authenticate with us when it sends us a job.

In the second comment to this installment of the tutorial, there is a suggestion to copy the stored hashes of passwords around. I did this on my first attempts, and have not yet tested whether it's necessary.

As you can see from the structure and filenames below, agent has a controller-password, which contains the password that we set in System Prefs. We're copying that file (even though it is not text but a hash) to the controller for use as both the client-password and the agent-password.

The commenter recommends that we do this in Terminal:

sudo cp /etc/xgrid/agent/controller-password \

sudo cp /etc/xgrid/agent/controller-password \

We'll use the same password to authenticate (as Client) to the Controller.

Now, it's just a few more commands in Terminal. Start up the daemon:

sudo /usr/libexec/xgrid/xgridcontrollerd

This gives me an output of 6 warnings and failures including

BEEPError 600 (could not open local port)

Hmm... Well, let's try anyway. Start the controller:

sudo xgridctl controller start

You can check its status:

sudo xgridctl controller status

daemon                  state                   pid                     
====== ===== ===
xgridcontrollerd running | enabled 22

And now we can either "submit" (asynchronous) or "run" a job (substitute the password you used in System Prefs).

sudo xgrid -h -p <password> -job submit /usr/bin/cal

With "run" you get back results (almost) immediately, and with "submit" you should get back something like this:

jobIdentifier = 1;

sudo xgrid -h -p <password> -job results -id 1

   November 2009
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30

And finally:

sudo xgridctl controller stop

My most pressing need is the firewall, I want to open only those ports that are necessary, and preferably only to Xgrid.

According to the macosxhints article, you can turn off authentication by modifying

AgentAuthentication = None
ClientAuthentication = None

Now it's time to work through the tutorials step by step.

I also downloaded the server admin tools as described but I haven't used them yet. And there is a "goodies" page on Parnot's site at Stanford, which looks promising. And finally, there is the Apple x-grid users Mailing List. For example, this advice might fix some of the warnings I got when starting up the daemon.

Baby steps.


charles said...

Regarding the scripts in the tutorial used for the setup, these are applescripts that should open in the Applescript editor if you are curious.

ives said...

Nice and simple tutorial to get people up and running.

However, I think you meant to say "submit" instead of "run" in the following line:

sudo xgrid -h -p password -job run /usr/bin/cal

"run" will just dump the results immediately.

In order to make use of "-job results -id #", you would have had to specify the "submit" action instead of outright running it.

telliott99 said...

You are right! It'll be fixed in a minute. Thanks.

ives said...

Another bit to include in your post is a note about running Xgrid on OS 10.6.x.

Up to at least 10.6.2 the Xgrid agent doesn't correctly detect the number of cores (whereas 10.5.x does). Instead it goes by the number of processors. My guess is that hyper-threading isn't taken into account either.

Since the Xgrid controller polls that information from the agents, the controller won't issue multiple tasks. A dual-processor, dual-core, Mac pro will get 2 tasks instead of 4.

The fix is illustrated here:


This allows manually overriding the number of "processors" in the xgrid agent property file. Set this to the total number of cores in your system, or double if you have hyper-threading capable processors.

Assuming the above is eventually fixed, the following link shows how to limit the number of concurrent tasks, should this be necessary (for Mac Pro systems for example, where at least one core would be needed for user tasks).