2
<TITLE>PHI Blast Pattern description</TITLE>
3
<BODY BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#660099" ALINK="#660099">
4
<h2>Rules for pattern syntax for PHI-BLAST.</h2>
6
The syntax for patterns in PHI-BLAST follows the conventions
7
of PROSITE. When using the stand-alone program, it
8
is permissible to have multiple patterns in a file separated
9
by a blank line between patterns. When using the Web-page
10
only one pattern is allowed per query.
12
Valid protein characters for PHI-BLAST patterns:
14
ABCDEFGHIKLMNPQRSTVWXYZU
16
Valid DNA characters for PHI-BLAST patterns:
20
Other useful delimiters:
22
[ ] means any one of the characters enclosed in the brackets
23
e.g., [LFYT] means one occurrence of L or F or Y or T
24
- means nothing (this is a spacer character used by PROSITE)
25
x with nothing following means any residue
26
x(5) means 5 positions in which any residue is allowed (and similarly for any other
27
single number in parentheses after x)
28
x(2,4) means 2 to 4 positions where any residue is allowed,
29
and similarly for any other two numbers separated by a comma;
30
the first number should be < the second number.
31
> can occur only at the end of a pattern and means nothing
32
it may occur before a period
33
(another spacer used by PROSITE)
35
. may be used at the end of the pattern and means nothing
37
When using the stand-alone program, the pattern should
38
be in a file, with the first line starting:
42
followed by 2 spaces and a text string givign the pattern a name.
44
There should also be a line starting
48
followed by 2 spaces followed by the pattern description.
50
All other PROSITE codes in the first two columns are allowed,
51
but only the HI code, described below is relevant to PHI-BLAST.
53
Here is an example from PROSITE.
55
ID CNMP_BINDING_2; PATTERN.
57
DT OCT-1993 (CREATED); OCT-1993 (DATA UPDATE); NOV-1995 (INFO UPDATE).
58
DE Cyclic nucleotide-binding domain signature 2.
59
PA [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV].
61
NR /TOTAL=57(36); /POSITIVE=57(36); /UNKNOWN=0(0); /FALSE_POS=0(0);
62
NR /FALSE_NEG=1; /PARTIAL=1;
63
CC /TAXO-RANGE=??EP?; /MAX-REPEAT=2;
69
gives the pattern a name.
72
AC, DT, DE, NR, NR, CC
74
are relevant to PROSITE users, but irrelevant to PHI-BLAST.
75
These lines are tolerated, but ignored by PHI-BLAST.
80
describes the pattern as:
93
any 5 to 11 characters
109
In this case the pattern ends with a period.
110
It can end with nothing after the last specifying symbol
111
or any number of > signs or periods or combination thereof.
113
Here is another example, illustrating the use of an HI line.
115
ID ER_TARGET; PATTERN.
116
PA [KRHQSA]-[DENQ]-E-L>.
120
In this example, the HI lines specify that the pattern
121
occurs twice, once from positions 19 through 22 in the
122
sequence and once from positions 201 through 204 in the
124
These specifications are relevant when stand-alone PHI-BLAST is
125
used with the "seedp"
126
option, in which the interesting occurrences of the pattern
127
in the sequence are specified. In this case the
128
HI lines specify which occurrence(s) of the pattern
129
should be used to find good alignments.
131
In general, the seedp option is more useful than the
132
standard patternp option ONLY when the
133
pattern occurs K > 1 times in the sequence AND
134
the user is interested in matching to J < K of those
136
Then using the HI lines enables the user to specify which
137
occurrences are of interest.