~ubuntu-branches/ubuntu/natty/ncbi-tools6/natty

BLAST and PSI-BLAST now permit calculated E-values to take into account the amino acid composition of the individual database sequences involved in reported

1198

alignments. This improves E-value accuracy, thereby reducing the number of false positive results.

1199

1200

The improved statistics are achieved with a scaling procedure [1,2] which in effect employs a slightly different scoring system for each database sequence. As a result,

1201

raw BLAST alignment scores in general will not correspond precisely to those implied by any standard substitution matrix. Furthermore, identical alignments can receive

1202

different scores, based upon the compositions of the sequences they involve. The improved statistics are now used by default for all rounds of searching on the

1203

PSI-BLAST page, but not on the BLAST page. Therefore, if one uses default settings, the results of the first round of searching will be different on the BLAST and

1204

PSI-BLAST pages.

1205

1206

In addition adjustments have been made to two PSI-BLAST parameters: the pseudocount constant default has been changed from 10 to 7, and the E-value threshold for

1207

including matches in the PSI-BLAST model has been changed from 0.001 to 0.002.

1208

1209

1. Altschul, S.F. et al. (1997) Nucl. Acids Res. 25:3389-3402.

1210

2. Sch�ffer, A.A. et al. (1999) Bioinformatics 15:1000-1011.

1211

1212

1213

Notes for 2.0.14 release:

1214

1215

1216

Bug fixes:

1217

1218

1.) extra line returns between sequences in the a FASTA file

1219

causes formatdb to produce corrupted databases.

1220

1221

2.) ";" at the beginning of a line was not being treated as a comment.

1222

1223

3.) a problem with the formatter causes blast to core-dump if

1224

the FASTA definition line only contains an identifier and

1225

no description.

1226

1227

4.) a problem in the ungapped extension for protein sequences

1228

causes a rare problem.

1229

1230

5.) the '-U' option that causes lower-case sequence to be masked

1231

does not work correctly for blastx.

1232

1233

1234

Notes for 2.0.13 release:

1235

1236

Enhancements:

1237

1238

1.) The output format for pairwise alignments was changed to

1239

put each new gi (if the sequence has redundant gi's) on a

1240

new line. If HTML output is specified then each gi is hyperlinked.

1241

1242

Bug fixes:

1243

1244

1.) An NCBI toolkit problem parsing the new RefSeq format in FASTA files

1245

(two bars instead of three) was fixed. This fix applies to all

1246

BLAST binaries (formatdb, blastall, blastpgp, etc.).

1247

1248

2.) A problem that caused BLAST version 2.0.12 under NT to freeze in

1249

multithreaded mode has been fixed.

1250

1251

Notes for 2.0.12 release:

1252

1253

Enhancements:

1254

1255

1.) Bl2seq can now perform nucleotide-protein (blastx style) comparisons.

1256

This necessitated changing the '-p' option from a Boolean to a

1257

string. Valid arguments are "blastn", "blastp", or "blastx".

1258

1259

Bug fixes:

1260

1261

1.) A problem in the NCBI threads library that caused BLAST to sometimes

1262

stick was corrected. Many thanks to Haruna Cofer and colleauges at SGI

1263

for providing a fix.

1264

1265

2.) A problem that caused BLAST to core-dump (especially on long queries)

1266

has been fixed. Many thanks to Gary Williams for providing examples.

1267

1268

3.) A problem that prevented the search of multiple multivolume databases

1269

has been fixed.

1270

1271

1272

1273

Notes for 2.0.11 release:

1274

1275

Enhancements:

1276

1277

1.) Optimizations were contributed by Chris Joerg of COMPAQ. These changes

1278

reduce the number of cache misses, unroll loops, and make some instructions

1279

unnecessary. These improvements can speed up BLAST for long sequences

1280

several-fold.

1281

1282

2.) A database is now only memory-mapped while being searched. If multiple databases

1283

are searched and the total exceeds the allowed memory-map limit this allows

1284

all databases to be searched as memory-mapped files. If a database cannot

1285

be memory-mapped it is read as an ordinary file, rather than causing an error.

1286

1287

Bug fixes:

1288

1289

1.) Formatdb was fixed to correct a problem with FASTA string identifiers under NT.

1290

1291

2.) Blastpgp was fixed to prevent a core-dump under LINUX

1292

1293

3.) BLASTN was found to miss some hits near the expect value cutoff. This has been

1294

corrected.

1295

1296

1297

1298

Notes for 2.0.10 release:

1299

1300

Enhancements:

1301

1302

1.) Bl2seq, a utility to compare two sequences using the blastn or blastp approach,

1303

is included in the archive. See the full description in the README.bls for details.

1304

1305

2.) A 'sparse' option ('-s') has been added to formatdb. This option limits the indices

1306

for the string identifiers (used by formatdb) to accessions (i.e., no locus names).

1307

This is especially useful for sequences sets like the EST's where the accession and locus

1308

names are identical. Formatdb runs faster and produces smaller temporary files if this

1309

option is used. It is strongly recommended for EST's, STS's, GSS's, and HTGS's.

1310

1311

3.) A volume option ('-v') has been added to formatdb. This option breaks up large

1312

FASTA files into 'volumes' (each with a maximum size of 2 billion letters).

1313

As part of the creation of a volume formatdb writes a new type of BLAST database file,

1314

called an alias file, with the extension 'nal' or 'pal', is written. This option

1315

should be used if one wishes to formatdb large databases (e.g., over 2 billion

1316

base pairs).

1317

1318

4.) It is is now possible to jump start the command line version of PSI-BLAST (blastpgp)

1319

from a multiple alignment that includes the query sequence using the -B option. Details

1320

are in README.bls.

1321

1322

5.) The maximum wordsize limit for BLASTN has been removed.

1323

1324

Bug fixes:

1325

1326

1.) A problem if the database length, set by the '-z' option was greater than

1327

2 billion, was fixed.

1328

1329

2.) A core-dump that resulted from the use of the coil-coil masking

1330

('-F C') was fixed by including a file needed for the data directory.

1331

1332

3.) A bug was fixed that caused some very short alignments to be assigned incorrect

1333

expect values.

1334

1335

4.) A bug was fixed that caused formatdb to produce incorrect BLAST databases if

1336

the input was ASN.1.

1337

1338

5.) A serious performance problem with BLASTN and longer words (greater than 16)

1339

was fixed.

1340

1341

Notes for 2.0.9 release:

1342

1343

Enhancements:

1344

1345

1.) two new options have been added to blastall: to produce output in HTML and

1346

to search a subset of the database based upon a list of GI's. Please see

1347

the options section for full information.

1348

1349

2.) two new options have been added to blastpgp: to produce HTML output and to

1350

produce an ASCII version of the PSI-BLAST Matrix. Please see the options section

1351

for more information.

1352

1353

3.) formatdb has a new option to allow specification of a 'base' name. see the options

1354

section for full details.

1355

1356

4.) it is possible to mask only during the phase when the lookup table is being built,

1357

but not during the extensions. See the options section for full details.

1358

1359

Bug fixes:

1360

1361

1.) a problem that occurred when too many HSP's aligned to the same part

1362

of the query from one database sequence has been fixed.

1363

1364

2.) a problem that caused seedtop to not perform pattern-matching for DNA

1365

sequences has been fixed.

1366

1367

3.) the number of HSP's saved for ungapped BLAST and tblastx is now limited to

1368

200 to prevent problems with memory and speed.

1369

1370

4.) a missing thread join that caused problems under DEC Alpha has been added.

1371

1372

5.) a formatting problem with the database summary at the beginning of the

1373

BLAST output (if multiple databases totaling over 2 Gig) has been fixed.

1374

1375

6.) a bug in formatdb that caused a core-dump if the total number of sequences was an

1376

exact multiple of 100000 was fixed.

1377

1378

1379

Notes for 2.0.8 release:

1380

1381

Enhancements:

1382

1383

1.) Frame and strand information was added to the output. Examples of the

1384

new output format may be found at http://www.ncbi.nlm.nih.gov/BLAST/example.html.

1385

1386

2.) An option that specifes the query strand to be searched (for blastn, blastx, and tblastx)

1387

has been added. The option is '-S'.

1388

1389

Bug fixes:

1390

1391

1.) The problem with the 'too-wide' parameter input screen under NT was fixed.

1392

1393

2.) BLAST no longer core-dump's when the query is NULL.

1394

1395

3.) BLAST no longer core-dump's when the query contains an '@' and blastx or tblastx is selected.

1396

1397

Notes for 2.0.7 release:

1398

1399

Bug fixes:

1400

1401

1.) BLAST now multi-threads properly under LINUX.

1402

1403

2.) A problem with very redundant databases and psi-blast was fixed.

1404

1405

3.) A problem with the formatting of the number of identities and positives

1406

was fixed. This affected results on the minus strand only and did not

1407

affect the expect value or scores.

1408

1409

4.) A problem that caused tblastn to core-dump very occassionally was corrected.

1410

1411

5.) A problem with multiple patterns in PHI-BLAST was fixed.

1412

1413

6.) A limit on the number of HSP's that were saved (100) was removed.

1414

1415

Notes for 2.0.6 release:

1416

1417

Enhancements:

1418

1419

1.) PHI-BLAST is included in this release. Please see notes on PHI-BLAST for

1420

details.

1421

1422

2.) SEG has become an integral part of the NCBI toolkit and it is no longer necessary

1423

to install it separately. It is also now supported under non-UNIX platforms.

1424

1425

3.) Access to filtering options.

1426

1427

If one uses "-F T" then normal filtering by seg or dust (for blastn)

1428

occurs (likewise "-F F" means no filtering whatsoever). The seg options

1429

can be changed by using:

1430

1431

-F "S 10 1.0 1.5"

1432

1433

which specifies a window of 10, locut of 1.0 and hicut of 1.5. One may

1434

also specify coiled-coiled filtering by specifying:

1435

1436

-F "C"

1437

1438

There are three parameters for this: window, cutoff (prob of a coil-coil), and

1439

linker (distance between two coiled-coiled regions that should be linked

1440

together). These are now set to

1441

1442

window: 22

1443

cutoff: 40.0

1444

linker: 32

1445

1446

One may also change the coiled-coiled parameters in a manner analogous to

1447

that of seg:

1448

1449

-F "C 28 40.0 32" will change the window to 28.

1450

1451

One may also run both seg and coiled-coiled together by using a ";":

1452

1453

-F "C;S"

1454

1455

4.) BLAST has been changed to reduce the number of redundant hits that a user

1456

may see. This is acheived by keeping track of the number of hits completely

1457

contained in a certain region and eliminating those lower scoring hits that

1458

are redundant with others. This behavior may be controlled with the -K and -L

1459

options:

1460

1461

-K Number of best hits from a region to keep [Integer]

1462

default = 50

1463

-L Length of region used to judge hits [Integer]

1464

default = 20

1465

1466

Setting -K to zero turns off this feature. This is the default only on blastall.

1467

1468

Bug fixes:

1469

1470

1.) There was a problem with the procedure that called the external utility seg.

1471

The need to fix this was obviated by the integration of seg into the toolkit.

1472

This showed up under LINUX.

1473

1474

2.) There was a memory problem with formatdb that has been fixed. This showed up

1475

mostly under NT and LINUX.

1476

1477

3.) A problem with running in multi-processing mode under IRIX6.5 (as a non-root user)

1478

was fixed.

1479

1480

Notes for 2.0.5 release:

1481

1482

Enhancements:

1483

1484

1.) The BLAST version is printed by formatdb in it's log file.

1485

1486

2.) Multi-database searches no longer require that the -o option be used when

1487

preparing the databases (i.e., with formatdb).

1488

1489

Bugs fixed:

1490

1491

1.) A serious bug with multi-database iterative searches was fixed (thanks to

1492

Steve Brenner for providing an example).

1493

1494

2.) 'lcl' is not formatted in the BLAST report when the sequence identifier

1495

is a local identifier or does not contain a bar ("|").

1496

1497

3.) A large memory leak in formatdb was fixed.

1498

1499

4.) An unnecessary cast that caused formatdb to fail on Solaris 2.5 machines

1500

if the binary was made under 2.6 was fixed.

1501

1502

5.) Better error checking was added to protect against core-dumps.

1503

1504

6.) Some problems with the sum statistics treatment of the blastx and tblastn

1505

programs reported by D. Rozenbaum were fixed. The number of alignments

1506

involved in a sum group was misrepresented. Also the incorrect length for

1507

the database sequence was used, sometimes casuing a slight change in the

1508

value reported.

1509

1510

7.) A problem with blastpgp was fixed that reported incorrect values for

1511

matrices other than BLOSUM62 during iterative searches.

1512

1513

Notes for 2.0.4 release:

1514

1515

Enhancements:

1516

1517

1.) multiple database searches:

1518

1519

Version 2.0.4 will accept multiple database names (bracketed by quotations).

1520

An example would be

1521

1522

-d "nr est"

1523

1524

which will search both the nr and est databases, presenting the results as if one

1525

'virtual' database consisting of all the entries from both were searched. The

1526

statistics are based on the 'virtual' database.

1527

1528

2.) new options:

1529

1530

-W Word size, default if zero [Integer]

1531

default = 0

1532

-z Effective length of the database (use zero for the real size) [Integer]

1533

default = 0

1534

1535

3.) The number of identities, positives, and gaps are now printed out before the

1536

alignments for gapped blastx, tblastn, and tblastx. Additionally this feature is

1537

now also enabled for ungapped BLAST.

1538

1539

4.) Formatdb now accepts ASN.1, as well as FASTA, as input.

1540

1541

Bugs fixed:

1542

1543

1.) In blastx, tblastn, and tblastx a codon was incorrectly formatted as a start codon in

1544

some cases.

1545

1546

2.) The last alignment of the last sequence being presented was incorrectly dropped

1547

in some cases. This change could affect the statistical significance of the last database

1548

sequence if the dropped alignment had a lower e-value than any other alignments from the

1549

same database sequence.

Older »