1
Readme for NCBI blast ftp site
2
Last updated on February 15, 2004
4
This file lists the subdirectories and files found on the NCBI BLAST
5
ftp site (ftp://ftp.ncbi.nlm.nih.gov/blast/). It provides the basic
6
information on file content, and on how the files should be used.
11
NCBI BLAST ftp site provides standalone blast, client server blast,
12
and wwwblast packages for different platforms. It also provides
13
commonly used blast databases in preformatted as well as FASTA format.
14
Some documents on the blast executables and other related subjects are
18
2. File list and content
20
A description of the files are listed in the tables below, one table
21
for each directory or subdirectory.
23
2.1 ftp://ftp.ncbi.nlm.nih.gov/blast/ directory content
25
The blast ftp directory contains several subdirectories each for a
26
specific set of files.
28
+------------------+-------------------------------------------------+
30
+------------------+-------------------------------------------------+
31
blastftp.txt this file
33
db subdirectory with database, in preformatted or
36
demo demonstration programs and documents from blast
39
documents documents for programs in standalone blast,
40
netblast, and wwwblast programs
42
executables archives for binary distribution of blast programs
44
matrices protein and nucleotide score matrices, only a
45
subset are supported by blast
47
temp temporary directory for miscellaneous files
48
+------------------+-------------------------------------------------+
51
2.2 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/db/ subdirectory
53
Databases larger than two gigabytes (2 GB) are formatted in multiple
54
volumes, which are named using the �database.##.tar.gz� convention.
55
All relevant volumes are required. An alias file is provided so that
56
the database can be called using the alias name without the extension
57
(.nal or .pal). For example, to call est database, simply use ��d est�
58
option in the commandline (without the quotes).
60
Certain databases are subsets of a larger parental database. For those
61
databases, mask files, rather than actual databases, are provided. The
62
mask file needs the parent database to function properly. The parent
63
databases should be generated on the same day as the mask file. For
64
example, to use swissprot preformatted database, swissprot.tar.gz, one
65
will need to get the nr.tar.gz with the same date stamp.
67
To use the preformatted blast database file, first inflate the file
68
using gzip (unix, linux), WinZip (window), or StuffIt Expander (Mac),
69
then extract the component files out from the resulting tar file using
70
tar (unix, linux), WinZip (Window), or StuffIt Expander (Mac). The
71
resulting files are ready for BLAST.
73
+---------------------+----------------------------------------------+
75
+---------------------+----------------------------------------------+
76
FASTA subdirectory with databases in FASTA format
78
blastdb.txt content list of the blast database
80
est.00.tar.gz first volume of the est database
81
est.01.tar.gz second volume of the est database
82
est.02.tar.gz third volume of the est database
83
all volumes are needed to reconstitute
86
est_human.tar.gz human est database, a mask file requires both
87
volumes of est to work
89
est_mouse.tar.gz mouse est database, a maks file needs both
90
volumes of est to work
92
est_others.tar.gz est database without human/mouse entries, a
93
mask file reqires both volumes of est
95
gss.tar.gz genomic survery sequence database
97
htgs.00.tar.gz first volume of the htgs database
98
htgs.01.tar.gz second volume of the htgs database
99
htgs.02.tar.gz all volumes are needed to reconstitute
100
htgs.03.tar.gz complete htgs database
102
human_genomic.tar.gz human chromosome database containing
103
concatenated contigs with adjusted gaps
106
nr.tar.gz non-redundant protein database
108
nt.00.tar.gz first volume of the nucleotide nr database
109
nt.01.tar.gz second volume of the nucleotide nr database
110
nt.02.tar.gz all volumes are needed to reconstitute
113
other_genomic.tar.gz chromosome database for organisms other than
116
pataa.tar.gz patent protein database
118
patnt.tar.gz patent nucleotide database
120
pdbaa.tar.gz protein sequence database for pdb entries. It
121
is mask file and requires nr.tar.gz
123
pdbnt.tar.gz nucleotide sequence database for pdb entries.
124
They are not coding sequences for the
125
corresponding protein structure entries!
127
sts.tar.gz sequence tag site database
129
swissprot.tar.gz swissprot sequence database, last major
130
release. It is mask file and requires
131
nr.tar.gz to work properly
133
taxdb.tar.gz taxonomy id database for use with new version
134
of blast database (not fully implemented yet)
136
wgs.00.tar.gz first volume of wgs assembly database
137
wgs.01.tar.gz second volume of the wgs assembly database.
138
wgs.02.tar.gz third volume of the wgs assembly database.
139
wgs.03.tar.gz fourth volume of the wgs assembly database.
140
wgs.04.tar.gz fifth volume of the wgs assembly database.
141
wgs.05.tar.gz sixth volume of the wgs assembly database.
142
all volumes are needed.
143
+--------------------+-----------------------------------------------+
146
2.2.1 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA
149
he FASTA database files are now stored in this subdirectory, it does
150
contain some additional databases that are not available via the NCBI
151
BLAST pages. Due to file size issues, the full est database is not
152
provided. One needs to get the three subsets and concatenate them
153
together to get the complete est database.
155
These databases will need to be formatted using formatdb program found
156
in the standalone blast executable package. The recommended
157
commandlines to use are:
159
formatdb �i input_db �p F �o T for nucleotide
161
formatdb �i input_db �p T �o T for protein
163
For additional information on formatdb, please see the formatdb.txt
164
document under /blast/documents/ directory.
166
+------------------+--------------------------------------------------+
168
+------------------+--------------------------------------------------+
169
alu.a.gz proteins translated from alu.n
171
alu.n.gz alu repeat sequences
173
drosoph.aa.gz Drosophila protein from genome annotation
175
drosoph.nt.gz Drosophila genome
177
ecoli.aa.gz E.coli K-12 proteins from genome annotation
179
ecoli.nt.gz E.coli K-12 genomic contigs
181
est_human.gz human subset of the est database
183
est_mouse.gz mouse subset of the est database
185
est_others.gz subset of est other than human or mouse entries
187
gss.gz Genomic Survey Sequences (mostly BAC ends)
189
htgs.gz High Throughput Genomic Sequences
191
human_genomic.gz Human chromosomes formed by concatenating genomic
192
contig assemblies (NT_######) and adjusting the
195
igSeqNt.gz Immunoglobulin nucleotide sequences
197
igSeqProt.gz Immunoglobulin protein sequences
199
mito.aa.gz protein from the annotated mitochondrial genomes
201
mito.nt.gz mitochondrial genomes
202
sequences released or updated in the past 30 days
204
month.est_human.gz human subset of EST released/updated in the past
207
month.est_mouse.gz mosue subset of EST released/updated in the past
210
month.est_others.gz EST, wihtout entries from human or mouse, released
211
or updated in the past 30 days
213
month.gss.gz gss entries released/updated in the past 30 days
215
month.htgs.gz htgs entries released/updated in the past 30 days
217
month.nt.gz subset of nt released/updated in the past 30 days
219
nr.gz non-redundant protein sequence database
221
nt.gz nucleotide database from GenBank excluding the
222
batch division htgs, est, gss,sts, pat divisions,
223
and wgs entries. Not non-redundant.
225
other_genomic.gz Chromosome entries other than human
227
pataa.gz Patent protein sequence database
229
patnt.gz Patent nucleotide sequence database
231
pdbaa.gz protein sequences for pdb entries
233
pdbnt.gz nucleotide entries for pdb entries. They are NOT
234
the coding sequence forthe corresponding
237
sts.gz Sequence Tag Sites database
239
swissprot.gz swissprot database, last major release
241
vector.gz vector sequences from synthetic (syn) division
244
wgs.gz Whole Genome Shotgun sequence assembly
246
yeast.aa.gz protein translations from yeast genome annotation
248
yeast.nt.gz yeast genomic sequence
249
+------------------+----------------------------------------------------+
252
2.3 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/demo/ directory
254
This directory contains some technical presentations from the BLAST
255
developers along with some demo tools or documentation relevant to BLAST.
257
+------------------------+-----------------------------------------------+
259
+------------------------+-----------------------------------------------+
260
README.blast_demo readme for blast_demo package
262
README.first readme for this directory
264
README.parse_blast_xml readme for parse_blast_xml package
266
blast_demo.tar.gz blast_demo package on blast db, blast object,
267
and reformating blast alignment from
270
blast_exercises.doc blast exercise questions answers
272
blast_programming.ppt PowerPoint presentation on BLAST programing
274
blast_talk.ppt PowerPoint presentation (O'Reilly conference)
276
ieee_blast.final.ppt PowerPoint presentation (IEEE conference)
278
ieee_talk.pdf Above IEEE presentation in PDF format
280
parse_blast_xml.tar.gz demo package on parsing xml styled blast output
282
splitd.ppt PowerPoint presentation on NCBI BLAST server�s
283
splitd implementation
285
test_suite.tar.gz test package
286
+------------------------+-----------------------------------------------+
289
2.4 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/documents/ directory
291
This directory contains copies of the documentation on different BLAST
292
programs distributed from this ftp site under the /blast/executables/
293
directory. blast.txt also contains detailed release history.
295
+------------------------+-----------------------------------------------+
297
+------------------------+-----------------------------------------------+
298
blast.txt readme for blastall and blastpgp
300
blastclust.txt readme for blastclust
302
developer subdirectory with additional documentation
304
blast_seqalign.txt describing seqalign function
306
readdb.txt describing readdb function
308
urlapi.txt a short introduction on BLAST URL API which
309
supersedes the blasturl
311
formatdb.txt readme for formatdb program
313
impala.txt readme for impala
315
megablast.txt readme for megablast
317
netblast.txt readme for netblast (blastcl3)
319
rpsblast.txt readme for rpsblast
321
xml subdirectory with .dtd and .mod field
322
description files for blast xml output
324
xml/NCBI_BlastOutput.dtd dtd file for blast xml output
325
xml/NCBI_BlastOutput.mod mod file for blast xml output
326
xml/NCBI_Entity.mod mod file for NCBI xml file
327
xml/README.blxml readme on blast xml output
328
+------------------------+-----------------------------------------------+
331
2.5 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/
334
This directory contains several subdirectories each for a specific
335
subsets of executable BLAST programs:
337
/LATEST-BLAST subdirectory contains the standalone blast binaries from
338
the latest major versioned release.
340
/LATEST-NETBLAST sudirectory contains the netblast binaries from the
341
latest major versioned release.
343
/LATEST-WWWBLAST subdirectory contains the wwwblast binaries from the
344
latest major versioned release.
346
/release different releases, with the last one linked to LATEST
349
/snapshot subdirectory contains patches or intermediate updates put up in
350
between major releases. For previous releases, go to release
351
subdirectory, where the old major releases are archived back to
356
2.5.1 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST-BLAST,
357
/LATEST-NETBLAST, and /LATEST-WWWBLAST subdirectories
359
All these three subdirectories link to the latest release directory,
360
which contains the standalone BLAST executables package (blast initialed
361
archives), blastcl3 client (netblast initialed archives), and server blast
362
(wwwblast initialed archives).
364
The standalone archive is needed to set up BLAST locally on user's own
365
machine. It also provides the tools necessary to prepare custom databases
366
and retrieve sequences from these prepared databases. Different archives
367
for commonly used platforms are available.
369
The blast client archive contains the blastcl3 program which functions by
370
formulating BLAST search locally first and forwarding the search to NCBI
371
blast server for process. The search results returned by NCBI BLAST server
372
is saved to an user-specified file on local computer disk.
374
The server blast archive contains the web pages with embedded blast search
375
forms similar to that of NCBI that can process the BLAST search request against
376
local set of databases and return the result to a browser window. wwwblast
377
is now in sync with the NCBI toolkit and the two above two packages.
380
+------------------------------------+-------------------------------+
382
+------------------------------------+-------------------------------+
385
blast-2.2.8-alpha-osf1.tar.gz Standalone for COMPAQ/HP alpha
386
machine (OSF 5.1 and above)
388
blast-2.2.8-amd64-linux.tar.gz Standalone for AMD 64-bits PC
391
blast-2.2.8-ia32-freebsd.tar.gz Standalone for intel Pentium PC
394
blast-2.2.8-ia32-linux.tar.gz Standalone for intel Pentium PC
397
blast-2.2.8-ia32-win32.exe Standalone for intel Pentium PC
400
blast-2.2.8-ia64-linux.tar.gz Standalone for intel Itanium PC
403
blast-2.2.8-mips-irix-32-bit.tar.gz Standalone for 32-bits SGI
405
blast-2.2.8-mips-irix.tar.gz Standalone for 64-bits SGI
407
blast-2.2.8-powerpc-macosx.tar.gz Standalone for MacOSX (terminal)
409
blast-2.2.8-sparc-solaris.tar.gz Standalone for Sun Sparc station
412
netblast-2.2.8-alpha-osf1.tar.gz netblast for COMPAQ/HP alpha
413
machine (OSF 5.1 and above)
415
netblast-2.2.8-amd64-linux.tar.gz netblast for AMD 64-bits PC
418
netblast-2.2.8-ia32-freebsd.tar.gz netblast for intel Pentium PC
421
netblast-2.2.8-ia32-linux.tar.gz netblast for intel Pentium PC
424
netblast-2.2.8-ia32-win32.exe netblast for for intel Pentium
427
netblast-2.2.8-ia64-linux.tar.gz netblast for for intel Itanium PC
430
netblast-2.2.8-mips-irix.tar.gz netblast for SGI 32-bits system
432
netblast-2.2.8-powerpc-macosx.tar.gz netblast for MacOSX
434
netblast-2.2.8-sparc-solaris.tar.gz netblast for Sun Sparc station
437
wwwblast-2.2.8-alpha-osf1.tar.gz wwwblast for COMPAQ/HP alpha
438
machine (OSF 5.1 and above)
440
wwwblast-2.2.8-amd64-linux.tar.gz wwwblast for AMD 64-bits PC
443
wwwblast-2.2.8-ia32-freebsd.tar.gz wwwblast for Intel Pentium PC
446
wwwblast-2.2.8-ia32-linux.tar.gz wwwblast for Intel Pentium PC
449
wwwblast-2.2.8-ia64-linux.tar.gz wwwblast for Intel Itanium PC
452
wwwblast-2.2.8-mips-irix.tar.gz wwwblast for SGI 32-bits system
454
wwwblast-2.2.8-powerpc-macosx.tar.gz wwwblast for MacOSX
456
wwwblast-2.2.8-sparc-solaris.tar.gz wwwblast for Sun Sparc station
458
+------------------------------------+-------------------------------+
461
2.5.2 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release
464
This directory contains past major releases of BLAST, as far back as
465
version 2.0.10. Each release is in its own subdirectory.
468
2.5.3 File content for ftp.ncbi.nlm.nih.gov/blast/executables/snapshot
471
This subdirectory contains intermediate enhanced or patched archives
472
released after the last major release. They are organized according
473
to the date and only contains the binaries for the affected platforms.
476
2.5.4 File content for ftp.ncbi.nlm.nih.gov/blast/executables/special
479
From time to time, we make binaries for some rare platforms under
480
special circumstances. Those files are archived here.
483
2.6 File content ftp://ftp.ncbi.nlm.nih.gov/blast/matrices directory
485
This directory contains the scoring matrices, which are files that can
486
be used by BLAST alignment assessment. The file are text files with
487
special format that can be viewed directly by a browser.
489
For valid statistical analysis, blastn uses only identity matrix and
490
blastp only supports a limited subset of the BLOSUM and PAM matrices:
491
BLOSUM 45, 62, 80, plus PAM30 and 70.
494
2.7 File content of the ftp://ftp.ncbi.nlm.nih.gov/blast/temp
497
An left-over subdirectory of miscellaneous files or tools.
500
3. Techinical Support
502
Additional questions/comments on this ftp site should be directed to
503
NCBI blast-help group at:
504
blast-help@ncbi.nlm.nih.gov
506
Other questions on general NCBI resources should be directed to:
507
info@ncbi.nlm.nih.gov