Python for Bioinformatics: March 2012

Saturday, March 31, 2012

launch control to monitor my server

This post integrates the previous post with the series that came just before it on a Ubuntu server running on OS X Lion under VirtualBox. I'm a complete amateur at this (really just using blogger as a scientific notebook), so if you have suggestions I'd be very happy to hear from you.

The idea is that the first thing a black hat will likely do is wipe (or sanitize) the logs, so we should move the data off the server periodically. To do this I take advantage of the ability to ssh into the server from OS X, and do a scp to Lion. After doing this, the script compares the current logfile with previous versions, and only logs the changes. The listing for the script is at the end of the post.

I changed the plist as shown by the diff. The change runs the script every 60 seconds rather than watching a path on OS X.

> diff com.TE.script.plist com.TE.script.old.plist 
14,15d13
<         <key>StartInterval</key>
<         <integer>60</integer>
19a18,21
>         <key>WatchPaths</key>
>         <array>
>         <string>/Volumes/HD/Users/telliott_admin/Desktop/y.txt</string>
>         </array>

To make things easier, I worked on the script from the Desktop until I thought things were OK, then ran this shell script retry.sh (using ./retry.sh):

sudo cp ~/Desktop/watch.py ~/Library/Scripts/watch.py launchctl unload com.TE.script.plist launchctl load com.TE.script.plist

The plist contains the key RunAtLoad with a value of true, so reloading the job runs the script.

To test, I just do:

> date
Sat Mar 31 16:10:25 EDT 2012
> curl http://10.0.1.2:8082
^Z
[21]+  Stopped                 curl http://10.0.1.2:8082

Unfortunately, marking the time doesn't help because the server and OS X are out of synch. :)

And then I watch the tmp directory for the appearance of 201126.31.log (that's hour minute second.day), which contains the forensic evidence, 7 repeats of the following entry with intervals of a second or two (reformatted):

Mar 31 16:18:06 U32 kernel: [12503.991135] [UFW BLOCK] IN=eth1 
OUT= MAC=08:00:27:d7:ba:0e:00:26:b0:fa:75:7f:08:00 
SRC=10.0.1.3 DST=10.0.1.2 LEN=48 TOS=0x00 PREC=0x00 TTL=64 ID=50692 DF 
PROTO=TCP SPT=49982 DPT=8082 WINDOW=65535 RES=0x00 SYN URGP=0

I would be very interested to know of other solutions. For example, it might be cleaner to watch the log files on Ubuntu, then email them or something when things change. But I would not want to have Ubuntu able to ssh to OS X. That wouldn't be a good design for a "honeypot."

watch.py

#! /usr/bin/python
import sys, os, subprocess
from time import gmtime, strftime

# you'd want year first to sort properly for real
tm = strftime("%H%M%S.%d", gmtime())
tmpdir = '/Users/telliott_admin/Desktop/tmp/'

# scp the log over to OS X in tmp/uwf.log
scp = 'scp'
src = 'telliott@10.0.1.2:/var/log/ufw.log'
dst = tmpdir + 'uwf.log'
cmd = ' '.join((scp, src, dst))
obj = subprocess.call(cmd,shell=True)
# perhaps this fails (e.g. server is down)
if not obj == 0:
    sys.exit()

# check for no previous entry
src = dst
prev = tmpdir + 'prev.log'
try:
    os.stat(prev) 
except:
    dst = prev
    cmd = ' '.join(('cp', src, dst))
    obj = subprocess.call(cmd,shell=True)
    
    # write to a time-stamped file
    dst = tmpdir + tm + '.log'
    cmd = ' '.join(('cp', src, dst))
    obj = subprocess.call(cmd,shell=True)
    sys.exit()

# load data:  current logfile
FH = open(tmpdir + 'uwf.log','r')
current = FH.read().strip().split('\n')
FH.close()

# load data:  previous logfile
FH = open(tmpdir + 'prev.log','r')
previous = FH.read().strip().split('\n')
FH.close()

# only keep data that is new
t = previous[-1].split()[2]

tL = [e.split()[2] for e in current]
if t in tL:
    i = tL.index(t)
    current = current[i+1:]
    
if not current:
    sys.exit()

print 'logging', len(current), 'items'

# write to a time-stamped file
s = '\n'.join(current)
dst = tmpdir + tm + '.log'
FH = open(dst,'w')
FH.write(s)
FH.close()

cmd = ' '.join(('cp', dst, prev))
obj = subprocess.call(cmd,shell=True)

Basics of launchctl on Lion

I made a start on learning how to use launchd to run scripts. (An old) MacResearch tutorial is here. There is also a wikipedia page.

The basic usage is quite simple. We just need two files.

watch.py

#! /usr/bin/python
from time import gmtime, strftime

fn = '/Users/telliott_admin/Desktop/time.txt'
FH = open(fn,'w')
FH.write(strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime()))
FH.close()

Make sure the script is executable (I did chmod 755), and copy it to ~/Library/Scripts.

The other file is a plist that shows the details for what we want launchd to do:

com.TE.script.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" \
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <key>Label</key>
        <string>com.TE.script</string>
        <key>LowPriorityIO</key>
        <true/>
        <key>Program</key>
        <string>/Users/telliott_admin/Library/Scripts/script.py</string>
        <key>RunAtLoad</key>
        <true/>
        <key>ProgramArguments</key>
        <array>
                <string>script.py</string>
        </array>
        <key>WatchPaths</key>
        <array>
        <string>/Volumes/HD/Users/telliott_admin/Desktop/y.txt</string>
        </array>
</dict>
</plist>

The plist goes in ~/Library/LaunchAgents.

There are many additional things you could do---the man page is here.

The script just writes the current time to a file on my Desktop. It is set to RunAtLoad (normally that would be login, but see below), and whenever the WatchPath changes. Here that is just a file which is also on my Desktop, but it could be a directory.

We can see if it's been loaded by doing:

> launchctl list | grep "com.TE" >

Not yet.

We can manually (un)load it by doing:

launchctl load ~/Library/LaunchAgents/com.TE.script.plist > launchctl list | grep "com.TE" - 0 com.TE.script

When we do that we should see the file time.txt appear on the Desktop because we set RunAtLoad to true. Also, if the file y.txt is altered, say by:

> touch ~/Desktop/y.txt

The script should execute again.

Ubuntu on Lion under VirtualBox (7)

This post is mostly about logfiles for Apache and the ufw firewall.

Security is such a big topic that it's hard to know where to start. To begin with, we should probably have the document root and the script directory moved to less predictable places. But how to redirect those client requests is a goal for the future. We should also review carefully the configuration details for Apache, including which Apache modules are enabled.

There are a number of modules that are not recommended for one reason or another, as described here:

userdir – Mapping of requests to user-specific directories. i.e ~username in URL will get translated to a directory in the server
autoindex – Displays directory listing when no index.html file is present
status – Displays server stats
env – Clearing/setting of ENV vars
setenvif – Placing ENV vars on headers
cgi – CGI scripts
actions – Action triggering on requests
negotiation – Content negotiation
alias – Mapping of requests to different filesystem parts
include – Server Side Includes
filter – Smart filtering of request
version – Handling version information in config files using IfVersion
as-is – as-is filetypes

I can check whether any are in the compiled modules for Apache I'm running by doing apache2. Investigating its use with man I see:

.In general, apache2 should not be invoked directly, but rather should be invoked via /etc/init.d/apache2 or apache2ctl.

Nevertheless:

telliott@U32:/var/log$ /usr/sbin/apache2 -l
Compiled in modules:
  core.c
  mod_log_config.c
  mod_logio.c
  worker.c
  http_core.c
  mod_so.c

It seems OK.

telliott@U32:~$ sudo find / -name "apachectl"
[sudo] password for telliott: 
/usr/sbin/apachectl


telliott@U32:~$ /usr/sbin/apachectl
ulimit: 88: error setting limit (Operation not permitted)
Usage: /usr/sbin/apachectl start|stop|restart|graceful|graceful-stop|configtest|status|fullstatus|help
       /usr/sbin/apachectl 
       /usr/sbin/apachectl -h            (for help on )

Never mind

The log files for Apache are in /var/log/apache2. There are three:

telliott@U32:~$ ls /var/log/apache2/ access.log error.log other_vhosts_access.log

For example, we can see who has made requests to the server recently:

telliott@U32:/var/log/apache2$ cd
telliott@U32:~$ cd /var/log/apache2
telliott@U32:/var/log/apache2$ tail -1 access.log
10.0.1.3 - - [30/Mar/2012:11:36:10 -0400] "GET /cgi-bin/test.py HTTP/1.1" 200 801 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/534.54.16 (KHTML, like Gecko) Version/5.1.4 Safari/534.54.16"

The first field is the IP address of the request. We can use grep with the -v flag to exclude patterns (e.g. the known hosts 10.0.1.* and 127.0.0.1):

telliott@U32:/var/log/apache2$ cat access.log | grep -v "10.0.1" | grep -v "127.0.0"
telliott@U32:/var/log/apache2$

There is undoubtedly a more elegant way to do what's above, but that works. No one but 10.0.1.* and 127.0.0.1 has asked for anything (yet).

I'd like to write these files to the OS X host. Since I set up ssh, I can use that to do it from a Terminal in OS X:

> scp telliott@10.0.1.2:/var/log/apache2/error.log error.log
> scp telliott@10.0.1.2:/var/log/apache2/access.log access.logf

The error log shows a number of entries like this:

[Fri Mar 30 07:14:37 2012] [error] [client 127.0.0.1] File does not exist: /var/www/favicon.ico

Apparently we're supposed to have a favicon. This is annoying, but I can fix it as described here.

However, it's not just Apache we need to worry about. There are system logs like /var/log/syslog.

For the future, we should set up a way to systematically monitor the logfiles for unusual activity. I'd rather not write such tools from scratch, though it's a possibility. I'll have to look into it.

There is also the firewall (ufw), which I haven't actually set up yet. (The default state is off). It really is time to think about that. The docs for ufw are here.

telliott@U32:~$ sudo ufw default deny
[sudo] password for telliott: 
Default incoming policy changed to 'deny'
(be sure to update your rules accordingly)
telliott@U32:~$ sudo ufw allow from 10.0.1.3 to any port 8080
Rule added
telliott@U32:~$ sudo ufw allow from 10.0.1.3 to any port 22
Rule added
telliott@U32:~$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
22                         ALLOW       10.0.1.3
8080                       ALLOW       10.0.1.3

telliott@U32:~$ sudo ufw enable
Firewall is active and enabled on system startup
telliott@U32:~$ sudo ufw logging on 
Logging enabled

Restart Ubuntu. Check the server still works. From OS X Terminal:

> curl http://10.0.1.2:8080
<html><body><h1>It works!</h1>
..
> curl http://10.0.1.2:8080/cgi-bin/test.py
<head>
<p> HTTP_ACCEPT */* </p>
..

Now, if an attempt is made on a different port, ufw should log it.. For example, OS X has a utility called stroke, which can carry out a port scan:

> cd "/Applications/Utilities/Network Utility.app/Contents/Resources"
> ./stroke 10.0.1.2 22 22
Port Scanning host: 10.0.1.2

  Open TCP Port:  22       ssh
> ./stroke 10.0.1.2 8080 8080
Port Scanning host: 10.0.1.2

  Open TCP Port:  8080     http-alt
> ./stroke 10.0.1.2 8081 8081

The scan command hangs.. so do CTL-Z to kill it. That last scan should trigger a logfile entry on Ubuntu.

telliott@U32:~$ tail -1 /var/log/ufw.log 
Mar 30 14:53:20 U32 kernel: [  528.390507] [UFW BLOCK] IN=eth1 OUT= MAC=08:00:27:d7:ba:0e:XX:XX:XX:XX:XX:XX:08:00 SRC=10.0.1.3 DST=10.0.1.2 LEN=48 TOS=0x00 PREC=0x00 TTL=64 ID=40207 DF PROTO=TCP SPT=54520 DPT=8081 WINDOW=65535 RES=0x00 SYN URGP=0

The first part of that MAC address is for Ubuntu on the VM, and the second part is for my OS X airport card. I don't know about the last part (08:00). But the SRC and DST IP addresses are clear, as is the DPT=8081. Just to check, try it again with 8082..

Yes:

telliott@U32:~$ tail -1 /var/log/ufw.log
Mar 30 15:05:49 U32 kernel: [ 1277.422229] [UFW BLOCK] IN=eth1 OUT= MAC=08:00:27:d7:ba:0e:XX:XX:XX:XX:XX:XX:08:00 SRC=10.0.1.3 DST=10.0.1.2 LEN=48 TOS=0x00 PREC=0x00 TTL=64 ID=21050 DF PROTO=TCP SPT=54579 DPT=8082 WINDOW=65535 RES=0x00 SYN URGP=0

I have misplaced the logfile, but earlier I had an access attempt from somone unknown: 199.47.218.151. Just ask google: "who is" 199.47.218.151. It's Dropbox. Huh. Didn't know they were in Wichita.

http://www.ip-adress.com/ip_tracer/199.47.218.151

I had installed Dropbox on Ubuntu for file transfer, then deleted it. It seems that they were still trying to contact me for some time after that.

Finally, there are other tools out there. For example: nmap. This is another sophisticated tool that will require a more serious investigation. I can run nmap from Ubuntu on itself:

telliott@U32:~$ nmap -sT 10.0.1.2

Starting Nmap 5.21 ( http://nmap.org ) at 2012-03-30 15:11 EDT
Nmap scan report for 10.0.1.2
Host is up (0.00076s latency).
Not shown: 997 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
80/tcp   open  http
8080/tcp open  http-proxy

Nmap done: 1 IP address (1 host up) scanned in 0.21 seconds

This does not trigger a logfile entry by ufw. It is curious that although I only set up rules for ports 22 and 8080, port 80 still is open (as I'd like for Firefox to work properly). Note that these port scanning tools shouldn't be used against other people's servers. That will be interpreted as aggressive behavior (probing for weaknesses), and will surely make folks upset. It's the sort of thing we want to be on the lookout for on our server.

Immediate goals for the future: understand the detailed setup of ufw and get a good book about Apache.

Thursday, March 29, 2012

Ubuntu on Lion under VirtualBox (6)

Another thing you'd want for a real server is to allow remote logon using SSH.

I'm going to follow my old post, but try to be little more organized about everything.

The first step is to generate an RSA key pair. We'll use a key length of 1024 bits, although for a real application you'd want something substantially longer.

On OS X:

> ssh-keygen -b 1024 -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/telliott_admin/.ssh/id_rsa): 
Created directory '/Users/telliott_admin/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /Users/telliott_admin/.ssh/id_rsa.
Your public key has been saved in /Users/telliott_admin/.ssh/id_rsa.pub.
The key fingerprint is:
c0:67:0f:f5:08:08:af:33:17:40:e2:c5:80:5a:0a:e5 telliott_admin@Toms-Mac-mini.local
The key's randomart image is:
+--[ RSA 1024]----+
| o++=. .. .      |
|oo.o.+.  o o     |
|ooE   = + . .    |
|o    . = o       |
|    + . S .      |
|     +           |
|                 |
|                 |
|                 |
+-----------------+
>

passphrase: xxxxxxx

The purpose of the passphrase is to protect the private key on my machine (I think). The key files are in:

.ssh/id_rsa .ssh/id_rsa.pub

At a later point there will be other files here like:

.ssh/known_hosts

It's convenient to refer to a key by its digest

> ssh-keygen -l -f ~/.ssh/id_rsa.pub
1024 c0:67:0f:f5:08:08:af:33:17:40:e2:c5:80:5a:0a:e5 
/Users/telliott_admin/.ssh/id_rsa.pub (RSA)

(I wrapped the output line).

On U32 (I already did this)

sudo apt-get install openssh-server

We need to edit /etc/ssh/sshd_config. Make sure Port22 is uncommented and make the following changes:

PermitRootLogin no
ChallengeResponseAuthentication yes
PasswordAuthentication yes   # we'll set it to no eventually

telliott@U32:/etc/ssh$ diff sshd_config sshd_config.orig
27c27
< PermitRootLogin no
---
> PermitRootLogin yes
48c48
< ChallengeResponseAuthentication yes
---
> ChallengeResponseAuthentication no
51c51
< PasswordAuthentication yes
---
> #PasswordAuthentication yes

The ssh keys are also in the /etc/ssh directory:

/etc/ssh/ssh_host_rsa_key.pub

and so on. For example:

telliott@U32:/etc/ssh$ ssh-keygen -l -f ssh_host_rsa_key.pub
2048 9c:a3:65:70:81:1e:d9:47:75:de:09:87:88:4e:cd:8f 
ssh_host_rsa_key.pub (RSA)

restart the server

on OS X:

> ssh telliott@10.0.1.2
The authenticity of host '10.0.1.2 (10.0.1.2)' can't be established.
RSA key fingerprint is 9c:a3:65:70:81:1e:d9:47:75:de:09:87:88:4e:cd:8f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.1.2' (RSA) to the list of known hosts.
telliott@10.0.1.2's password: 
Welcome to Ubuntu 11.10 (GNU/Linux 3.0.0-17-generic i686)
..
telliott@U32:~$

Be sure to check that the fingerprint the "host" 10.0.1.2 gives us is the same as we get in Ubuntu for ssh_host_rsa_key.pub before you answer "yes" above.

In a new Terminal window or tab:

> ssh-keygen -lvf ~/.ssh/known_hosts
2048 9c:a3:65:70:81:1e:d9:47:75:de:09:87:88:4e:cd:8f 10.0.1.2 (RSA)
+--[ RSA 2048]----+
|       +..=.o.o. |
|      + .+.+ +o..|
|     ...+.  o ...|
|      .+ o E .   |
|        S        |
|       + .       |
|      .          |
|                 |
|                 |
+-----------------+

At this point we want to copy our public key over to the server.

The way I did this is:

> scp ~/.ssh/id_rsa.pub telliott@10.0.1.2:~/.ssh/authorized_keys
telliott@10.0.1.2's password: 
id_rsa.pub                                      100%  248     0.2KB/s   00:00

Note: the docs say to do:

ssh-copy-id username@remotehost
chmod 600 .ssh/authorized_keys

Having done this, my home directory in Ubuntu should have a file of authorized keys:

telliott@U32:~$ cd .ssh
telliott@U32:~/.ssh$ ls
authorized_keys  known_hosts
telliott@U32:~/.ssh$ ssh-keygen -l -f ~/.ssh/authorized_keys
1024 c0:67:0f:f5:08:08:af:33:17:40:e2:c5:80:5a:0a:e5 
/home/telliott/.ssh/authorized_keys (RSA)

The fingerprint matches my public key generated on OS X.

Change the config file to PasswordAuthentication no.

telliott@U32:~/.ssh$ cd /etc/ssh
telliott@U32:/etc/ssh$ sudo nano sshd_config
telliott@U32:/etc/ssh$ diff sshd_config sshd_config.orig 
27c27
< PermitRootLogin no
---
> PermitRootLogin yes
48c48
< ChallengeResponseAuthentication yes
---
> ChallengeResponseAuthentication no
51c51
< PasswordAuthentication no
---
> #PasswordAuthentication yes

Finally, from OS X

> ssh telliott@10.0.1.2

Identity added: /Users/telliott_admin/.ssh/id_rsa (/Users/telliott_admin/.ssh/id_rsa)
Welcome to Ubuntu 11.10 (GNU/Linux 3.0.0-17-generic i686)

 * Documentation:  https://help.ubuntu.com/

Last login: Thu Mar 29 13:20:00 2012 from toms-mac-mini.local
telliott@U32:~$

Logout and re-try does not require passphrase again.. nor does quitting Terminal and starting again. I did not save to the Keychain, so what's the deal? Maybe it has something to do with Lion apps remembering their state between runs.

A re-boot of the machine does force the prompt for the passphrase.

Ubuntu on Lion under VirtualBox (5)

In the last post we set up our Apache server running in Ubuntu under VirtualBox to serve pages to the host OS (OS X Lion). The next step was to check it works from another machine running on my wireless network, which it does.

http://10.0.1.2:8080 http://10.0.1.2:8080/cgi-bin/test.py

Now, I want to try it from the internet. At least that's what I think I'm doing. I google "ip address" and get XX.XXX.XX.XXX (obfuscated)

which I actually recognize because, although it's supposed to be dynamic, it never changes. Now if I do say:

http://XX.XXX.XX.XXX:8080

Safari complains that it can't connect to the server. What we need to do is turn on port forwarding for the Airport, as shown in these screenshots.

Now it works.

(Thanks to this reader's comment).

I'm going to turn this off for now, until I learn more about security in Apache, though I am curious to know how long it would take for a black hat to find me.

Ubuntu on Lion under VirtualBox (4)

The VirtualBox docs (Chapter 6) discuss several networking modes. The default is NAT

Network Address Translation (NAT)
If all you want is to browse the Web, download files and view e-mail inside the guest, then this default mode should be sufficient for you, and you can safely skip the rest of this section. Please note that there are certain limitations when using Windows file sharing (see the section called “NAT limitations” for details).

Bridged networking
This is for more advanced networking needs such as network simulations and running servers in a guest. When enabled, VirtualBox connects to one of your installed network cards and exchanges network packets directly, circumventing your host operating system's network stack.

I've done what we're about to do (serve pages to the host) using NAT, and I thought I had it all figured out (e.g. this post). But even though it was working yesterday, it's not working today, and I've tried everything I can think of. So, we'll leave that aside and go with "bridged" mode, noting also that this is what the docs recommend (above).

There are three steps to set it up. In VirtualBox Manager, select the correct VM and click on Settings, then Network. As shown in the screenshot, we want Bridged Adaptor.

Make a note of the Mac Address, this changes every time you change the setting. Currently it's 08:00:27:41:a6:2c.

The second step is to configure the DHCP server (on my Airport) using Airport Utilities. We want to reserve a particular IP address for the VM.

And the final step is to configure networking in Ubuntu to use manual IP address assignment. Under Networking > Edit Connections you'll see this:

The VM acts like it's wired. Under Wired edit Wired connection 1 to give IPv4 Settings as shown.

That's it.

[UPDATE: In the screenshot, you'll see that the DNS server is not specified. We need to do that. Enter 10.0.1.1 ]

Now restart the server and from Terminal in OS X do:

> curl http://10.0.1.2:8080/cgi-bin/test.py
<head></head>
HTTP_ACCEPT */* 
HTTP_USER_AGENT curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5 
SERVER_NAME 10.0.1.2 
REMOTE_ADDR 10.0.1.3 
SERVER_PROTOCOL HTTP/1.1 
SCRIPT_FILENAME /usr/lib/cgi-bin/test.py 
..

A note about the port we're using. As the VirtualBox docs say:

On Unix-based hosts (e.g. Linux, Solaris, Mac OS X) it is not possible to bind to ports below 1024 from applications that are not run by root.

An interesting discussion of why this limitation persists can be found here.

Ubuntu on Lion under VirtualBox (3)

So far, we've been pretty self-serving :)
The next step is to serve pages to the host as well. We first need to modify two different files so that we can listen and send to port 8080. These files are:

/etc/apache2/ports.conf /etc/apache2/sites-available/default

The edit to the first file adds two lines as shown by the diff. We save a copy before doing anything, in case we need a reference later:

telliott@U32:~$ sudo cp /etc/apache2/ports.conf /etc/apache2/ports.conf.orig
telliott@U32:~$ sudo nano /etc/apache2/ports.conf
telliott@U32:~$ diff /etc/apache2/ports.conf /etc/apache2/ports.conf.orig
11,13d10
< NameVirtualHost *:8080
< Listen 8080
<

The edit to /etc/apache2/sites-available/default is to duplicate the entire text of the file, then change the first line of the second half to read:

<VirtualHost *:8080>

Normally I'd think about using cat to do this, but because this file is owned by root, the OS won't allow it. I did this to open the file in nano:

cd /etc/apache2/sites-available sudo nano default

With the cursor at the beginning of the file, mark the spot with OPT-A, then move the cursor to the end of the file and do in succession CTL-K (the text will disappear but don't worry), then CTL-U followed by CTL-U again. Make the edit described above, then save the file.

Now our tests from before should work even if we specify the new port.

Ubuntu on Lion under VirtualBox (2)

In the last post we installed Ubuntu on OS X Lion as the host using VirtualBox, and downloaded and installed the Apache server (version 2.2, apache2).

The basic commands are:

sudo /etc/init.d/apache2 start
sudo /etc/init.d/apache2 restart
sudo /etc/init.d/apache2 stop

although after the install, I found it was already running. Another useful command to get basic information is /usr/sbin/apache2 (run with sudo), for example:

telliott@U32:~$ sudo /usr/sbin/apache2 -l
Compiled in modules:
  core.c
  mod_log_config.c
  mod_logio.c
  worker.c
  http_core.c
  mod_so.c

telliott@U32:~$ sudo /usr/sbin/apache2 -v
Server version: Apache/2.2.20 (Ubuntu)
Server built:   Feb 14 2012 17:55:17

Files for apache are scattered in various places, governed partly by Unix tradition. For the moment, to see pages from Ubuntu using Firefox, we don't need to change anything about the configuration. Just point Firefox at "localhost":

http://en.wikipedia.org/wiki/Localhost

Or, do sudo apt-get install curl and then:

telliott@U32:~$ curl localhost
<html><body><h1>It works!</h1>
<p>This is the default web page for this server.</p>
<p>The web server software is running but no content has been added, yet.</p>
</body></html>

This page comes from the "document root" directory, which is /var/www. Since we didn't specify a filename, we got index.html

telliott@U32:~$ cat /var/www/index.html
<html><body><h1>It works!</h1>
<p>This is the default web page for this server.</p>
<p>The web server software is running but no content has been added, yet.</p>
</body></html>

This directory is set in /etc/apache2/sites-available. Another useful directory is the default directory (or directories) for scripts, also set in the same file as:

ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
 <Directory "/usr/lib/cgi-bin">
  AllowOverride None
  Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
  Order allow,deny
  Allow from all
 </Directory>

So, let's write a script. The example in the docs uses PHP, but we're Python folks. We improve our Unix foo by doing:

cd Desktop cat > test.py

and then pasting this text after, followed by CTL-D:

#!/usr/bin/python
import os

s = '''Content-type: text/html

<head></head>'''

print s
D = os.environ
for k in D:
    print '<p>', k, D[k], '</p>'

print '</body></html>'

Now, make the script executable, and copy it to /usr/lib/cgi-bin:

telliott@U32:~/Desktop$ sudo cp test.py /usr/lib/cgi-bin/test.py telliott@U32:/usr/lib/cgi-bin$ sudo chmod 755 test.py

And paste this into Firefox:

http://localhost/cgi-bin/test.py

We obtain:

SERVER_SOFTWARE Apache/2.2.20 (Ubuntu)
SCRIPT_NAME /cgi-bin/test.py
SERVER_SIGNATURE
Apache/2.2.20 (Ubuntu) Server at localhost Port 80
REQUEST_METHOD GET
SERVER_PROTOCOL HTTP/1.1
QUERY_STRING
PATH /usr/local/bin:/usr/bin:/bin
HTTP_USER_AGENT Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:11.0) Gecko/20100101 Firefox/11.0
HTTP_CONNECTION keep-alive
SERVER_NAME localhost
REMOTE_ADDR 127.0.0.1
SERVER_PORT 80
SERVER_ADDR 127.0.0.1
DOCUMENT_ROOT /var/www
SCRIPT_FILENAME /usr/lib/cgi-bin/test.py
SERVER_ADMIN webmaster@localhost
HTTP_HOST localhost
HTTP_CACHE_CONTROL max-age=0
REQUEST_URI /cgi-bin/test.py
HTTP_ACCEPT text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
GATEWAY_INTERFACE CGI/1.1
REMOTE_PORT 58815
HTTP_ACCEPT_LANGUAGE en-us,en;q=0.5
HTTP_ACCEPT_ENCODING gzip, deflate

Wednesday, March 28, 2012

Found today

codepad "is an online compiler/interpreter, and a simple collaboration tool"

Charles Edge (and here and here) knows more about sysadmin than any person really should

Ubuntu on Lion under VirtualBox (1)

I posted a few times previously about running Ubuntu in a virtual machine in OS X (here), but I had some difficulty in retracing my steps six months later.

So I thought I would cover the ground again, and expand it. This time I will talk mostly about things that do work, not about my trials and tribulations on the way.

I don't have the luxury of a separate computer to run Linux. So it's very nice that there is a free virtual machine available for me to play with called VirtualBox (wikipedia). Thanks, guys. (It's an Oracle product these days).

I downloaded and installed VirtualBox 4.1.10 for OS X.

For Ubuntu, there is a choice of 32-bit or 64-bit and Desktop or Server. For example:

ubuntu-11.10-desktop-i386.iso ubuntu-11.10-server-amd64.iso

These designations stand for Intel 80386 32-bit, and AMD x86-64, 64-bit. Either one will work. I chose standard Linux Ubuntu version 11.10 ("Oneiric Ocelot") 32-bit this time. It does not matter that my machine has an Intel chip.

I want to do server-like stuff, but I do find it handy sometimes to have a GUI (e.g. for Firefox and Dropbox), and also I thought it might be instructive to add the relevant software one piece at a time, starting with Apache2. In the Desktop, the 32-bit version is recommended.

The .iso file type is described here. It is basically an archive file format for an optical disc.

After intalling VirtualBox I ran the Oracle VM VirtualBox Manager, which first lead me through setup of a new virtual machine, where I accepted all defaults except that I boosted the RAM to 2048 MB. (I don't plan to do much work on the OS X side while the server is running). Starting the VM gives a "First Start Wizard" which has a dialog where you should navigate to the stored .iso file. I am always too impatient to actually read the instructions, but in this case it would be good to read chapter one of the VirtualBox docs or at least this part of it: Starting VirtualBox.

The install doesn't take long, 10 minutes or so. I had trouble (on several installs) where I had wandered away to the OS X side, and when the install finished and wanted to restart Ubuntu it did bad things to OS X, forcing a hard shutdown with the power switch. So, make yourself a sandwich or something.

At the end, bring up the Terminal. How to achieve this was puzzling at first. A shortcut I found on the web is to do CTL-OPT-t. Another simple method is to search for Terminal after clicking the "dash" icon at the top of the "launcher"---a dock lookalike.

There are methods to add the Terminal to the launcher, but these involve right-clicking, which seems problematic with this setup. I'll have to investigate and get back to this later. I got it to work on my laptop by the two-finger-tap method, but it's not working now on the mini.

Now, the very first thing is to do:

sudo apt-get update

A GUI Update Manager will come up when this finishes, with many items to update (≈ 350). I did this too, though I suspect it's unnecessary for us and it took a long time.

In order to get cut-and-paste to work, we will need what are called the "Guest Additions" from VirtualBox. First, type in this command:

sudo apt-get install dkms build-essential linux-headers-generic

When it's done, restart Ubuntu. Then, from the VirtualBox menu above the Ubuntu window, from Devices choose mount Guest Additions. This "mounts" a "CD" in Ubuntu, when you click on that it will ask you whether you want to run the software, which you should do. When it's done, reboot Ubuntu again, and then "eject" the disk.

Now, cut-and-paste should work between OS X and Ubuntu. The only trick is that you need to click in the Ubuntu/Terminal window twice, once to shift focus to Ubuntu, and then to shift focus to Terminal. For Terminal, we do CTL-SHIFT-v and -c. (If you're in the Text Editor, just do CTL without the SHIFT).

Now, we'll grab two more packages:

sudo apt-get install apache2 sudo apt-get install openssh-server

And at this point, I saved a snapshot of the VM from VirtualBox (under Machine > Take Snapshot). To shut down (in OS X), just click on the red close button (top left of window) and then choose "Save State."

go run go

Go version 1 is released today.

Monday, March 26, 2012

Unix spoken here (5)

A link to a great review of Unix text processing tools.

Sunday, March 25, 2012

matplotlib on 10.7.3

I upgraded my home "Desktop" (a Mac mini that had been running OS X Server 10.6) to Lion (10.7.3) yesterday. This post is just to note that the pkgconfig method still works for installing matplotlib. To recap, I used Homebrew to install pkgconfig, which can then be used by the matplotlib build process to find zlib, libpng and libfreetype. It's as simple as:

git clone git://github.com/matplotlib/matplotlib.git
cd matplotlib/
python setup.py build
sudo python setup.py install

There were a few hiccups, of course. The motivation for changing to 10.7 was that my MobileMe email (and iDisk storage) are going away at then end of June. So I "upgraded" to iCloud, after installing 10.7.3 on my laptop. However, this made Mail extremely sluggish on the machines with 10.6. Mail showed the little triangle (which I think means it's having trouble) for 15 minutes at a time and wouldn't get email from the server.

I had various troubles with the install that I believe relate to having bought Lion using my first Apple ID (which used my work email) rather than my second (which uses my .mac email). It irritated me, but seemed that the simplest solution was to just buy Lion again. So I did that, and everything started working fine.

Another issue was to get Xcode (Developers Tools) from the App Store. This is free, but involves various hoops to jump through, the weirdest of which is that gcc is no longer a part of the default install. You have to start Xcode and go to Preferences > Downloads > Command Line Tools. Why would they do that? Anyway, having gcc

> ls -al /usr/bin/gcc
lrwxr-xr-x  1 root  wheel  12 Mar 24 23:14 /usr/bin/gcc -> llvm-gcc-4.2

I used Homebrew to get pkgconfig, and then built matplotlib as shown above. Simple scripts work. I used easy_install nose (with sudo) to install nose, which is needed by matplotlib's tests, then did

python tests.py -v -d

from within the matplotlib directory. The result looks good:

Ran 1086 tests in 290.786s

OK (KNOWNFAIL=546)

Now I just have to re-install my favorite Safari extensions (forgot to save them), and find the license key for TextMate. I'm making it work, but all these hoops to jump through, and Apple's Big Brother mentality, are making me think seriously about switching to Linux for my hobbyist programming.

Sunday, March 18, 2012

iTunes backup playlist

Since they removed the DRM from music downloads at the iTunes store, I've been buying songs from Apple using that approach. It's great for impulse buying. Also, it's nice in combination with an app for the iPhone called Shazam, which can identify a track that is playing wherever you are at the moment. I thought I was fairly modest in my purchases, but now find that I have over 500 songs after 18 months. So the question then is how to archive these purchases so they don't go **kabluie** in the night.

One strategy is to just let Apple do it, but I don't trust them.

The next idea is to manually copy all 500 songs to a backup disk. This would be easy, except that (i) I don't want the whole library, just a playlist, and (ii) iTunes uses nested folders to preserve the artist:album:songtitle information. I need to merge u/v/w with u/v/x to create a directory u/v containing both w and x. You can't do that just copying x.

The playlist info is contained in xml format, file: 'iTunes Music Library.xml' but I decided to export the playlist to disk from within iTunes ('File > Library > Export Playlist...'). Each entry in the exported text file ends with something like this as the last field (tab-separated, one entry per line):

HD:Users:Shared:iTunes Music:Bob Marley:Natty Dread:01 Lively Up Yourself.m4a

The following script finds each song on the playlist and copies it to a directory temp in the directory where the script is run. Simple. The result is 3.77 GB of .m4a files with the desired directory structure.

A detail that is (or should be) embarassing: iTunes (on OS X) uses '\r' (CR) as newline. Talk about the constraints of backward compatibility.

One last thing: please, please, please back-up and test before using this. YMMV. Caveat lector. No warranty express or implied. Don't blame me if your library vanishes.

import os, subprocess

name = 'playlist'
FH = open(name + '.txt', 'r')
data = FH.read().strip()
FH.close()

# iTunes uses '\r' (CR) as newline!
data = data.split('\r')

# data[0] is metadata (column names)
data.pop(0)

# file path is the last value
L = [item.split('\t')[-1] for item in data]

# ':' is path separator
L = [item.replace(':','/') for item in L]

for item in L:
    # remove HD name from file path
    item = item.split('/', 1)[1]
    
    # file path has spaces
    # must be quoted for shell command below
    src = '"/' + item + '"'

    artist, album, songfile = item.split('/')[-3:]
    # construct directory tree if it doesn't exist
    path = '/'.join(('temp', artist, album))
    try:
        os.stat(path)
    except OSError:
        os.makedirs(path)
    
    dst = '"' + '/'.join((path, songfile)) + '"'    
    cmd = ' '.join(('cp', src, dst))
    obj = subprocess.call(cmd,shell=True)
    if obj != 0:
        print 'e',
    else:
        print '*',
    print dst

Monday, March 12, 2012

Airport Express

I saw a blurb (it must be several years ago) about using an Apple wireless device called Airport Express to play music files from my computer over a WiFi network on remote speakers. I bought an AE recently, plus a miniature amplifier from Amphony and some bookshelf speakers from Polk Audio through Newegg.

I was surprised, and quite disappointed, to find that in order to use this device in the normal way one must be a member of the "Apple club." That is to say, in order to send audio from my computer to Airport Express and have it come out through speakers, I need to have an Apple ID so that I can do "Home Sharing." I think this is outrageous. However, I do think it was pretty cool, in the end, to control this setup with my iPhone running an App called Remote.

I've never owned a non-Apple computer. That's a lot of boxes, starting from 1984. And I love OS X and its integration of the pretty Mac side with Unix underneath. However, it's clear that the future is iOS, i.e. not folks like me. So I was pleased to find, in a quick Google search, a somewhat dated article which explains how to stream audio in this way with Linux hardware.

About the amp, it is fun to produce music with so little hardware on site, but there is no ON/OFF switch for output, though there is a dial, which when dialed down still sends a nasty signal through the speakers when powering down. And I do like the sound a lot. Nice speakers!

Wednesday, March 7, 2012

Go Snail Go

I came across a post which solves what it calls the "snail" problem in Go. We've seen some very pretty examples of the same thing in Python here.

This is the output of my version in Go for S = 9.

 1  2  3  4  5  6  7  8  9
32 33 34 35 36 37 38 39 10
31 56 57 58 59 60 61 40 11
30 55 72 73 74 75 62 41 12
29 54 71 80 81 76 63 42 13
28 53 70 79 78 77 64 43 14
27 52 69 68 67 66 65 44 15
26 51 50 49 48 47 46 45 16
25 24 23 22 21 20 19 18 17

You can see why he called it the snail. Anyway, I noticed a chance to use a goroutine for this problem. In any cycle starting with [left] followed by [down] we go n steps in each direction, then n+1 steps in the [right,up] directions. In the code below, we obtain these step values from a Go channel.

A couple of other Go-like things about this code. We stash the 2D array (as a 1D array with a shape parameter) in a struct, and then attach a func to that struct to pretty print it. The details of the pprint function could doubtless be improved---I'm not too swift with formatting. The other Goish thing is to modify both the row and column indices at once, returning what we'd call in Python a tuple value from the step function.

Fun.

package main

import (
    "fmt"
    "os"
    "strings"
)

func dist(ch chan int) {
    var a int = 1
    for { 
        ch <- a
        ch <-a
        a++ 
    }
}

var m = map[string]string{"L":"D","D":"R","R":"U","U":"L"}

func step(r, c int, dir string)(rn, cn int) {
    switch dir {
        case "L": { c-- }
        case "D": { r++ } 
        case "R": { c++ }
        case "U": { r-- }
    }
    return r,c
}

type A struct { arr []int;  SZ int }
func (a *A) pprint() {
    d := a.SZ
    for i:= 0; i < len(a.arr); i += d {
        S := []string{}
        for _,f := range a.arr[i:i+d] {
            S = append(S,fmt.Sprintf("%2d", f))
        }
        fmt.Println(strings.Join(S," "))
    }
}

func main() {
    S := 9
    if S%2 != 1 { os.Exit(1) }
    N := S*S
    a := A{make([]int,N), S}
    ch := make(chan int)
    go dist(ch)
    dir := "L"
    var r, c int;  r = S/2 + 1;  c = r
    n := <- ch
    for {
        for i := 0; i < n; i++ {
            //fmt.Println(r, c, N)
            a.arr[(r-1)*S + (c-1)] = N
            N--
            if N == 0 { break }
            r,c = step(r, c, dir)
        }
        n = <- ch
        dir = m[dir]
        if N == 0 { break }
    }
    a.pprint()
}

Sunday, March 4, 2012

Go, again

This is a brief, updated report on my exploration of the Go language. (First post here). For an idea about what Go can do I encourage you to check out another video with Rob Pike (and Russ Cox). Not to disrespect Russ, but as a great example, check out the segment starting about 21:25. It's amazing.

The source of that program is here.

In order to play with this stuff, you will have to download the compiler source and build it. I assume you can do that, if not, drop me a line.

I've spent a total of about 40 hours on Go over the last week, and I can say that I believe this is all quite correct:

The Go programming language is an open source project to make programmers more productive. Go is expressive, concise, clean, and efficient. Its concurrency mechanisms make it easy to write programs that get the most out of multicore and networked machines, while its novel type system enables flexible and modular program construction. Go compiles quickly to machine code yet has the convenience of garbage collection and the power of run-time reflection. It’s a fast, statically typed, compiled language that feels like a dynamically typed, interpreted language.

The more I explore, the better I like it. Go does "feel like a dynamically typed, interpreted language." Exactly. And using TextMate I can just do CMD-R and the build, linking and execution happen painlessly.

I wrote about 30 or so short programs to explore simple tasks in Go. I also wrote two different versions of the PSSM code discussed here, and in subsequent posts. And I played with Bruce Eckel's code from here. A zipped folder of all this stuff is on Dropbox. It's not very well documented but only the intrepid will follow this lead anyway.

I'm hooked. I have a lot of work to do figuring out the interface and concurrency stuff. And it is great fun!

Thursday, March 1, 2012

Go fib

I've been having fun with a new programming language called Go, developed by Rob Pike and friends at Google (video). Somehow I missed hearing about it until now. I'm not leaving my true love, Python, but there's a lot to like about Go as a replacement for C and C++.

Go has

garbage collection, so there are no worries about managing memory

no classes, but simply attaches methods to types

a simple syntax compared to C---eliminating most semicolons

much less complexity than C++

goroutines to launch parallel processes that communicate using channels

maps and string support, and slices as a kind of resizable array

multiple return values from a function

combined var declaration and assignment with type of rhs

pointers but no pointer arithmetic

the ability to construct and return a local variable from a function.

great docs, is compiled, and fast.

Go has interfaces, which I don't understand very well, but they can be used to give polymorphic behavior.

Here is a simple, familiar Go program using a goroutine and a channel. Notice that the type declaration comes after the variable name, and the arrow symbol for flow into and out of the channel, c.

package main

import "fmt"

func fib(c chan int) {
    var a, b int = 1, 0
    for {
        c <- a
        a, b = a + b, a
    }
}

func main() {
    c := make(chan int)
    go fib(c)
    for i := 1; i < 12; i++ {
        f := <- c
        fmt.Println(f)
    }
}

fib() will generate values as long as we want, similar to a Python generator. Output:

I found a Go "bundle" for TextMate here.