Wednesday, December 23, 2009

BLAST: .ncbirc file

Continuing from last time, here is a bit about the .ncbirc file used for BLAST.

First, as long as the database has been pre-formatted for BLAST, the .ncbirc file is not strictly necessary. For example, having pre-processed a FASTA-formatted database file, I can run blastall specifying full paths to the executable, input file, and database:

~/Software/blast/programs/blast-2.2.22/bin/blastall \
-i ~/Desktop/temp/inseqs.fasta -p blastn \
-d ~/Desktop/temp/refseqs.fasta


The .ncbirc file is supposed to make things easier. According to the docs (or see the documentation that comes with the downloaded executable in blast-2.2.22/doc/blast.html):

To ensure the smooth execution of blast programs, we should set up a BLAST configuration file named .ncbirc to provide the path information for the data and db directories. If we place the blast-#.#.# directory under the home directory of j_smith, we can specify the path to data and db directories using lines below.

[NCBI]
DATA=/home/j_smith/data

[BLAST]
BLASTDB=/home/j_smith/db

This had me confused for quite a while. In the first place, they do not mean that I should literally do DATA=/home/te... Rather, the path should be something like DATA=/Users/te... under OS X. Second, they do not mean that the db or data directory should be directly under my home directory (or even directly in the blast-#.#.# directory).

Third, the path should be the path to the directory containing the database, but should not include the name of the database file itself. Using .ncbirc does not relieve you of the need to specify a database file (except in the case where you want the default which is the nr database).

I put the following into .ncbirc in my home directory

[BLAST]
BLASTDB=/Users/te/Desktop/temp


Then I did:

blastall -i temp/inseqs.fasta -p blastn -d refseqs.fasta


This works! However, this shorthand for the home directory does not work:

BLASTDB=~/Desktop/temp


And all of this can be done without a .ncbirc file by manipulating environment variables. For example:

rm ~/.ncbirc
export BLASTDB=~/Desktop/temp
export BLASTMAT=~/Software/blast/programs/blast-2.2.22/data


And this works:

blastall -i temp/inseqs.fasta -p blastn -d refseqs.fasta