50
58
//-------------------------------------------------------------
53
@page TOPP_OMSSAAdapter OMSSAAdapter
61
@page TOPP_OMSSAAdapter OMSSAAdapter
55
@brief Identifies peptides in MS/MS spectra via OMSSA (Open Mass Spectrometry Search Algorithm).
63
@brief Identifies peptides in MS/MS spectra via OMSSA (Open Mass Spectrometry Search Algorithm).
60
<td ALIGN = "center" BGCOLOR="#EBEBEB"> pot. predecessor tools </td>
61
<td VALIGN="middle" ROWSPAN=2> \f$ \longrightarrow \f$ OMSSAAdapter \f$ \longrightarrow \f$</td>
62
<td ALIGN = "center" BGCOLOR="#EBEBEB"> pot. successor tools </td>
65
<td VALIGN="middle" ALIGN = "center" ROWSPAN=1> any signal-/preprocessing tool @n (in mzML format)</td>
66
<td VALIGN="middle" ALIGN = "center" ROWSPAN=1> @ref TOPP_IDFilter or @n any protein/peptide processing tool</td>
68
<td ALIGN = "center" BGCOLOR="#EBEBEB"> pot. predecessor tools </td>
69
<td VALIGN="middle" ROWSPAN=2> \f$ \longrightarrow \f$ OMSSAAdapter \f$ \longrightarrow \f$</td>
70
<td ALIGN = "center" BGCOLOR="#EBEBEB"> pot. successor tools </td>
73
<td VALIGN="middle" ALIGN = "center" ROWSPAN=1> any signal-/preprocessing tool @n (in mzML format)</td>
74
<td VALIGN="middle" ALIGN = "center" ROWSPAN=1> @ref TOPP_IDFilter or @n any protein/peptide processing tool</td>
71
@em OMSSA must be installed on the system to be able to use the @em OMSSAAdapter. See pubchem.ncbi.nlm.nih.gov/omssa/
72
for further information on how to download and install @em OMSSA on your system. You might find that the latest OMSSA version
73
does not run on your system (to test this, run @em omssacl in your OMMSA/bin/ directory and see if it crashes). If you encounter
74
an error message, try another OMSSA version
76
Sequence databases in FASTA format must be converted into the NCBI format before OMSSA can read them. Therefore, use the program formatdb
77
of the NCBI-tools suite (see ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.13/ for a working version).
78
The latest NCBI BLAST distribution does not contain the formatdb executable any longer!).
79
Use @em formatdb @em -i @em SwissProt_TargetAndDecoy.fasta @em -o to create
80
additional files, which will be used by @em OMSSA. The database option of the @em OMSSAAdapter should contain the name of the psq file
81
, e.g., 'SwissProt_TargetAndDecoy.fasta.psq'. The '.psq' suffix can also be omitted, e.g. 'SwissProt_TargetAndDecoy.fasta' and will be added
83
This makes it easy to specify a common TOPPAS input node (using only the FASTA suffix) for many adapters.
85
This adapter supports relative database filenames, which (when not found in the current working directory) is looked up in
86
the directories specified by 'OpenMS.ini:id_db_dir' (see @subpage TOPP_advanced).
88
The options that specify the protease specificity (@em e) are directly taken from OMSSA. A complete list of available
89
proteases can be found by executing @em omssacl @em -el.
91
This wrapper has been tested successfully with OMSSA, version 2.x.
79
@em OMSSA must be installed on the system to be able to use the @em OMSSAAdapter. See pubchem.ncbi.nlm.nih.gov/omssa/
80
for further information on how to download and install @em OMSSA on your system. You might find that the latest OMSSA version
81
does not run on your system (to test this, run @em omssacl in your OMMSA/bin/ directory and see if it crashes). If you encounter
82
an error message, try another OMSSA version
84
Sequence databases in FASTA format must be converted into the NCBI format before OMSSA can read them.
85
For this, you can use the program 'formatdb' (old releases) or 'makeblastdb' (recent release) of the NCBI-tools suite, which is freely available for download.
86
Use @em formatdb @em -i @em SwissProt_TargetAndDecoy.fasta @em -o to create a BLAST database, which actually consists of multiple files.
87
The more recent 'makeblastdb' has a similar syntax, e.g., @em makeblastdb @em -dbtype @em prot @em -in @em SwissProt_TargetAndDecoy.fasta .
89
Make sure that your FASTA file (which you convert into BLAST database files) is properly formatted, especially
90
that it conforms to the FASTA Defline format and that there is a description(!), i.e. '>ID DESCRIPTION'. Otherwise you might get a
93
"..\..\..\..\..\..\..\src\algo\ms\omssa\omssacl.cpp", line 282: Fatal: COMSSA::Run() - Exception in COMSSA::Run: NCBI C++ Exception:
94
"..\..\..\..\..\src\serial\serialobject.cpp", line 228: Error: NCBI-BlastDL::Blast-def-line.title
97
As database parameter for the OMSSAAdapter you can either specify the .psq file as generated by formatdb/makeblastdb
98
(e.g., 'SwissProt_TargetAndDecoy.fasta.psq') or the original FASTA file (e.g., 'SwissProt_TargetAndDecoy.fasta').
99
This makes it easy to specify a common TOPPAS input node (using only the FASTA suffix) for many adapters.
100
Just make sure that the .psq and .fasta file reside in the same directory!
102
This adapter supports relative database filenames, which (when not found in the current working directory) is looked up in
103
the directories specified by 'OpenMS.ini:id_db_dir' (see @subpage TOPP_advanced).
105
The options that specify the protease specificity (@em e) are directly taken from OMSSA. A complete list of available
106
proteases can be found by executing @em omssacl @em -el.
108
This wrapper has been tested successfully with OMSSA, version 2.x.
93
110
@note OMSSA search is much faster when the database (.psq files etc.) is accessed locally, rather than over a network share (we measured 10x speed increase in some cases).
95
<B>The command line parameters of this tool are:</B>
96
@verbinclude TOPP_OMSSAAdapter.cli
112
<B>The command line parameters of this tool are:</B>
113
@verbinclude TOPP_OMSSAAdapter.cli
114
<B>INI file documentation of this tool:</B>
115
@htmlinclude TOPP_OMSSAAdapter.html
98
@improvement modes to read OMSSA output data and save in idXML format (Andreas)
117
@improvement modes to read OMSSA output data and save in idXML format (Andreas)
101
120
// We do not want this class to show up in the docu:
102
121
/// @cond TOPPCLASSES
105
class TOPPOMSSAAdapter
124
class TOPPOMSSAAdapter :
110
: TOPPBase("OMSSAAdapter","Annotates MS/MS spectra using OMSSA.")
129
TOPPBase("OMSSAAdapter", "Annotates MS/MS spectra using OMSSA.")
138
omssa_major(0), omssa_minor(0), omssa_patch(0)
141
OMSSAVersion(Int maj, Int min, Int pat) :
142
omssa_major(maj), omssa_minor(min), omssa_patch(pat)
149
bool operator<(const OMSSAVersion & v) const
119
: omssa_major(0), omssa_minor(0), omssa_patch(0)
122
OMSSAVersion (Int maj, Int min, Int pat)
123
: omssa_major(maj), omssa_minor(min), omssa_patch(pat)
130
bool operator < (const OMSSAVersion& v) const
151
if (omssa_major > v.omssa_major) return false;
152
else if (omssa_major < v.omssa_major) return true;
132
if (omssa_major > v.omssa_major) return false;
133
else if (omssa_major < v.omssa_major) return true;
155
if (omssa_minor > v.omssa_minor) return false;
156
else if (omssa_minor < v.omssa_minor) return true;
136
if (omssa_minor > v.omssa_minor) return false;
137
else if (omssa_minor < v.omssa_minor) return true;
140
return (omssa_patch < v.omssa_patch);
159
return omssa_patch < v.omssa_patch;
147
bool getVersion_(const String& version, OMSSAVersion& omssa_version_i) const
149
// we expect three components
150
IntList nums = IntList::create(StringList::create(version,'.'));
151
if (nums.size()!=3) return false;
153
omssa_version_i.omssa_major =nums[0];
154
omssa_version_i.omssa_minor =nums[1];
155
omssa_version_i.omssa_patch =nums[2];
159
void registerOptionsAndFlags_()
163
addText_("Common Identification engine options");
165
registerInputFile_("in", "<file>", "", "Input file ");
166
setValidFormats_("in",StringList::create("mzML"));
167
registerOutputFile_("out", "<file>", "", "Output file ");
168
setValidFormats_("out",StringList::create("idXML"));
170
registerDoubleOption_("precursor_mass_tolerance", "<tolerance>", 1.5, "Precursor mass tolerance (Default: Dalton)", false);
171
registerFlag_("precursor_mass_tolerance_unit_ppm", "If this flag is set, ppm is used as precursor mass tolerance unit");
172
registerDoubleOption_("fragment_mass_tolerance", "<tolerance>", 0.3, "Fragment mass error in Dalton", false);
173
registerInputFile_("database", "<psq-file>", "", "NCBI formatted FASTA files. Only the .psq filename should be given, e.g. 'SwissProt.fasta.psq'. If the filename does not end in '.psq' the suffix will be added automatically. Non-existing relative file-names are looked up via'OpenMS.ini:id_db_dir'", true, false, StringList::create("skipexists"));
174
//setValidFormats_("database",StringList::create("psq")); // 'psq' not supported yet, and not required in this case
175
registerIntOption_("min_precursor_charge", "<charge>", 1, "Minimum precursor ion charge", false);
176
registerIntOption_("max_precursor_charge", "<charge>", 3, "Maximum precursor ion charge", false);
177
vector<String> all_mods;
178
ModificationsDB::getInstance()->getAllSearchModifications(all_mods);
179
registerStringList_("fixed_modifications", "<mods>", StringList::create(""), "Fixed modifications, specified using UniMod (www.unimod.org) terms, e.g. 'Carbamidomethyl (C)' or 'Oxidation (M)'", false);
180
setValidStrings_("fixed_modifications", all_mods);
181
registerStringList_("variable_modifications", "<mods>", StringList::create(""), "Variable modifications, specified using UniMod (www.unimod.org) terms, e.g. 'Carbamidomethyl (C)' or 'Oxidation (M)'", false);
182
setValidStrings_("variable_modifications", all_mods);
185
addText_("OMSSA specific input options");
188
//-d <String> Blast sequence library to search. Do not include .p* filename suffixes.
189
//-pc <Integer> The number of pseudocounts to add to each precursor mass bin.
190
//registerStringOption_("d", "<file>", "", "Blast sequence library to search. Do not include .p* filename suffixes", true);
191
registerInputFile_("omssa_executable", "<executable>", "omssacl", "The 'omssacl' executable of the OMSSA installation", true, false, StringList::create("skipexists"));
192
registerInputFile_("omssa_user_mods", "<file>", "", "additional <MSModSpec> subtrees of user modifications.\nSubtrees will be pasted into OMSSAAdapter generated user mod files.\nSee http://www.ncbi.nlm.nih.gov/data_specs/schema/OMSSA.mod.xsd for details about user mod file definition.", false, true, StringList::create("input file"));
193
registerIntOption_("pc", "<Integer>", 1, "The number of pseudocounts to add to each precursor mass bin", false, true);
195
//registerFlag_("omssa_out", "If this flag is set, the parameter 'in' is considered as an output file of OMSSA and will be converted to IdXML");
196
//registerStringOption_("omssa_out_format", "<type>", "", "Specifies the output format of OMSSA, if not given the format will be estimated", false);
199
//Input format and filename
200
//-f <String> single dta file to search
201
//-fx <String> multiple xml-encapsulated dta files to search
202
//-fb <String> multiple dta files separated by blank lines to search
203
//-fm <String> mgf formatted file
204
//-fp <String> pkl formatted file
205
//-hs <Integer> the minimum number of m/z values a spectrum must have to be searched
206
//-fxml <String> omssa xml search request file (contains search parameters and spectra. overrides command line)
207
//-pm <String> search parameter input in xml format (contains search parameters but no spectra. overrides command line except for name of file containing spectra)
208
// input options are not all necessary as TOPP tools only accept mzML
209
registerIntOption_("hs", "<Integer>", 4, "the minimum number of m/z values a spectrum must have to be searched", false, true);
210
//registerStringOption_("pm", "<file>", "", "search parameter input in xml format", false);
213
//-o <String> filename for text asn.1 formatted search results
214
//-ob <String> filename for binary asn.1 formatted search results
215
//-ox <String> filename for xml formatted search results
216
//-oc <String> filename for comma separated value (excel .csv) formatted search results
217
// output options of OMSSA are not necessaryOMSSA
219
//The following options output the search parameters and search spectra in the output results. This is necessary for viewing result in the OMSSA browser:
220
//-w include spectra and search params in search results
222
//To turn off informational messages (but not error messages), use:
223
//-ni don't print informational messages
225
//Mass type and tolerance
226
//-to <Real> product ion mass tolerance in Da
227
//-te <Real> precursor ion mass tolerance in Da
228
//-tez <Integer> scaling of precursor mass tolerance with charge (0 = none, 1= linear)
229
//registerDoubleOption_("to", "<Real>", 0.8, "product ion mass tolerance in Da", false);
230
//registerDoubleOption_("te", "<Real>", 2.0, "precursor ion mass tolerance in Da", false);
231
registerIntOption_("tez", "<Integer>", 1, "scaling of precursor mass tolerance with charge (0 = none, 1= linear)", false, true);
233
//A precursor ion is the ion before fragmentation and the product ions are the ions generated after fragmentation. These values are specified in Daltons +/- the measured value, e.g. a value of 2.0 means +/- 2.0 Daltons of the measured value.
234
//The tez value allows you to specify how the mass tolerance scales with the charge of the precursor. For example, you may search a precursor assuming that it has a charge state of 2+ and 3+. If you set tez to 1, then the mass tolerance for the +2 charge state will be 2 times the precursor mass tolerance, and for the 3+ charge state it will be 3 times the precursor mass tolerance. If you set tez to 0, the mass tolerance will always be equal to the precursor mass tolerance, irrespective of charge state.
236
//-tom <Integer> product ion search type, with 0 = monoisotopic, 1 = average, 2 = monoisotopic N15, 3 = exact.
237
//-tem <Integer> precursor ion search type, with 0 = monoisotopic, 1 = average, 2 = monoisotopic N15, 3 = exact.
238
registerIntOption_("tom", "<Integer>", 0, "product ion search type, with 0 = monoisotopic, 1 = average, 2 = monoisotopic N15, 3 = exact", false, true);
239
registerIntOption_("tem", "<Integer>", 0, "precursor ion search type, with 0 = monoisotopic, 1 = average, 2 = monoisotopic N15, 3 = exact", false, true);
241
//Monoisotopic searching searches spectral peaks that correspond to peptides consisting entirely of carbon-12. Average mass searching searches on the average natural isotopic mass of peptides. Exact mass searches on the most abundant isotopic peak for a given mass range.
243
//-tex <Double> threshold in Da above which the mass of a neutron should be added in an exact mass search.
244
registerDoubleOption_("tex", "<Real>", 1446.94, "threshold in Da above which the mass of a neutron should be added in an exact mass search", false, true);
247
//Preprocessing is the process of eliminating noise from a spectrum. Normally, you do not need to adjust these options as OMSSA automatically adjusts its preprocessing for best results.
249
//-cl <Real> low intensity cutoff as a fraction of max peak
250
//-ch <Real> high intensity cutoff as a fraction of max peak
251
//-ci <Real> intensity cutoff increment as a fraction of max peak
252
//-w1 <Integer> single charge window in Da
253
//-w2 <Integer> double charge window in Da
254
//-h1 <Integer> number of peaks allowed in single charge window
255
//-h2 <Integer> number of peaks allowed in double charge window
256
//-cp <Integer> eliminate charge reduced precursors in spectra (0=no, 1=yes). Typically turned on for ETD spectra.
259
//Determination of precursor charge and product ion charges. Presently, OMSSA estimates which precursors are 1+. All other precursors are searched with charge from the minimum to maximum precursor charge specified.
260
//-zl <Integer> minimum precursor charge to search when not 1+
261
//-zh <Integer> maximum precursor charge to search when not 1+
262
//-zt <Integer> minimum precursor charge to start considering multiply charged products
263
//-z1 <Double> the fraction of peaks below the precursor used to determine if the spectrum is charge +1
264
//-zc <Integer> should charge +1 be determined algorithmically (1=yes)
265
//-zcc <Integer> how should precursor charges be determined? (1=believe the input file,2=use the specified range)
266
//-zoh <Integer> set the maximum product charge to search
268
//registerIntOption_("zl", "<Integer>", 1, "minimum precursor charge to search when not 1+", false);
269
//registerIntOption_("zh", "<Integer>", 3, "maximum precursor charge to search when not 1+", false);
270
registerIntOption_("zt", "<Integer>", 3, "minimum precursor charge to start considering multiply charged products", false, true);
271
registerDoubleOption_("z1", "<Real>", 0.95, "the fraction of peaks below the precursor used to determine if the spectrum is charge +1", false, true);
272
registerIntOption_("zc", "<Integer>", 1, "should charge +1 be determined algorithmically (1=yes)", false, true);
273
registerIntOption_("zcc", "<Integer>", 2, "how should precursor charges be determined? (1=believe the input file,2=use the specified range)", false, true);
274
registerIntOption_("zoh", "<Integer>", 2, "set the maximum product charge to search", false, true);
276
//Enzyme specification
277
//Additional enzymes can be added upon request.
278
//-v <Integer> number of missed cleavages allowed
279
//-e <Integer> id number of enzyme to use (trypsin is the default)
280
//-el print a list of enzymes and their corresponding id number
281
//-no <Integer> minimum size of peptides for no-enzyme and semi-tryptic searches
282
//-nox <Integer> maximum size of peptides for no-enzyme and semi-tryptic searches
283
registerIntOption_("v", "<Integer>", 1, "number of missed cleavages allowed", false);
284
registerIntOption_("e", "<Integer>", 0, "id number of enzyme to use (trypsin is the default)", false);
285
registerIntOption_("no", "<Integer>", 4, "minimum size of peptides for no-enzyme and semi-tryptic searches", false, true);
286
registerIntOption_("nox", "<Integer>", 40, "maximum size of peptides for no-enzyme and semi-tryptic searches", false, true);
289
//OMSSA searches two ions series, both of which can be specified. Normally one of the ion series specified is a forward ion series and the other is a reverse ion series.
290
//-il print a list of ions and their corresponding id number
291
//-i comma delimited list of id numbers of ions to search
292
//-sp <Integer> number of product ions to search
293
//-sb1 <Integer> should first forward (e.g. b1) product ions be searched (1 = no, 0 = yes)
294
//-sct <Integer> should c terminus ions (e.g. y1) be searched (1 = no, 0 = yes)
295
registerStringOption_("i", "<Num>,<Num>,<Num>", "1,4", "comma delimited list of id numbers of ions to search", false, true);
296
registerIntOption_("sp", "<Integer>", 100, "number of product ions to search", false, true);
297
registerIntOption_("sb1", "<Integer>", 1, "should first forward (e.g. b1) product ions be searched (1 = no, 0 = yes)", false, true);
298
registerIntOption_("sct", "<Integer>", 0, "should c terminus ions (e.g. y1) be searched (1 = no, 0 = yes)", false, true);
301
//By default, OMSSA searches without limiting by taxonomy. By specifying an NCBI taxonomy id, you can limit your search to a particular organism. The taxonomy id can by found by searching the NCBI tax browser (enter the scientific name of the organism of interest in the search box and then click the correct search result and then the scientific name in the taxonomy browser to get the numeric taxonomy id).
302
//-x comma delimited list of NCBI taxonomy ids to search (0 = all. This is the default)
303
registerStringOption_("x", "<Num>,<Num>,<Num>", "0", "comma delimited list of NCBI taxonomy ids to search (0 = all. This is the default)", false, true);
305
//Search heuristic parameters
306
//These are options that can speed up the search. They can result in decreased sensitivity
307
//-hm <Integer> the minimum number of m/z matches a sequence library peptide must have for the hit to the peptide to be recorded
308
//-ht <Integer> number of m/z values corresponding to the most intense peaks that must include one match to the theoretical peptide
309
registerIntOption_("hm", "<Integer>", 2, "the minimum number of m/z matches a sequence library peptide must have for the hit to the peptide to be recorded", false, true);
310
registerIntOption_("ht", "<Integer>", 6, "number of m/z values corresponding to the most intense peaks that must include one match to the theoretical peptide", false, true);
313
//-hl <Integer> maximum number of hits retained for one spectrum
314
//-he <Double> the maximum e-value allowed in the hit list
315
registerIntOption_("hl", "<Integer>", 30, "maximum number of hits retained for one spectrum. Note: even when set to 1 OMSSA may report multiple hits with different charge states", false);
316
registerDoubleOption_("he", "<Real>", 1, "the maximum e-value allowed in the hit list", false);
318
//Post translational modifications
319
//To specify modifications, first type in "omssacl -ml" to see a list of modifications available and their corresponding id number. Then when running the search, specify the id numbers of the modification you wish to apply, e.g. "omssacl -mf 5 -mv 1,8 ...". Multiple PTMs can be specified by placing commas between the numbers without any spaces. At the present time, the list of allowed post translational modifications will be expanded over time.
320
//-mf comma delimited list of id numbers for fixed modifications
321
//-mv comma delimited list of id numbers for variable modifications
322
//-ml print a list of modifications and their corresponding id number
323
//registerStringOption_("mf", "<Num>,<Num>,<Num>", "", "comma delimited list of id numbers for fixed modifications", false);
324
//registerStringOption_("mv", "<Num>,<Num>,<Num>", "", "comma delimited list of id numbers for variable modifications", false);
326
//registerStringOption_("mux", "<file>", "", "use the given file which contains user modifications in OMSSA modifications xml format", false);
329
//To add your own user defined modifications, edit the usermod0-29 entries in the mods.xml file. If it is common modification, please contact NCBI so that it can be added to the standard list.
330
//To reduce the combinatorial expansion that results when specifying multiple variable modifications, you can put an upper bound on the number of mass ladders generated per peptide using the -mm option. The ladders are generated in the order of the least number of modification to the most number of modifications.
331
//-mm <Integer> the maximum number of mass ladders to generate per database peptide
332
registerIntOption_("mm", "<Integer>", 128, "the maximum number of mass ladders to generate per database peptide", false, true);
334
//There is an upper bound on the number of combinations of variable mods that can be applied to a peptide from the sequence library. The hard upper bound is 1024, which effectively limits the number of variable modification sites per peptide for an exhaustive search to 10. If you set this number too low, you will miss highly modified peptides. If you set it too high, it will make the e-values less significant by searching for too many possible modifications.
335
//OMSSA treats cleavage of the initial methionine in each protein record as a variable modification by default. To turn off this behavior use the command line option
336
//-mnm n-term methionine should not be cleaved
337
registerFlag_("mnm", "n-term methionine should not be cleaved", true);
339
//Iterative searching
340
//-is <Double> evalue threshold to include a sequence in the iterative search, 0 = all
341
//-ir <Double> evalue threshold to replace a hit, 0 = only if better
342
//-ii <Double> evalue threshold to iteratively search a spectrum again, 0 = always
343
registerDoubleOption_("is", "<Real>", 0.0, "evalue threshold to include a sequence in the iterative search, 0 = all", false, true);
344
registerDoubleOption_("ir", "<Real>", 0.0, "evalue threshold to replace a hit, 0 = only if better", false, true);
345
registerDoubleOption_("ii", "<Real>", 0.0, "evalue threshold to iteratively search a spectrum again, 0 = always", false, true);
348
//-foms <String> read in search result in .oms format (binary asn.1).
349
//-fomx <Double> read in search result in .omx format (xml).
350
//Iterative searching is the ability to re-search search results in hopes of increasing the number of spectra identified. To accomplish this, an iterative search may change search parameters, such as using a no-enzyme search, or restrict the sequence search library to sequences already hit.
354
ExitCodes main_(int , const char**)
357
// path to the log file
358
String logfile(getStringOption_("log"));
359
String omssa_executable(getStringOption_("omssa_executable"));
360
String unique_name = QDir::toNativeSeparators(String(File::getTempDirectory() + "/" + File::getUniqueName()).toQString()); // body for the tmp files
361
String unique_input_name = unique_name + "_OMSSA.mgf";
362
String unique_output_name = unique_name + "_OMSSA.xml";
363
String unique_usermod_name = unique_name + "_OMSSA_user_mod_file.xml";
365
//-------------------------------------------------------------
366
// parsing parameters
367
//-------------------------------------------------------------
369
// get version of OMSSA
371
String call = omssa_executable + " -version";
372
qp.start(call.toQString(), QIODevice::ReadOnly); // does automatic escaping etc...
373
bool success = qp.waitForFinished();
374
String output (QString(qp.readAllStandardOutput ()));
375
String omssa_version;
376
OMSSAVersion omssa_version_i;
377
if (!success || qp.exitStatus() != 0 || qp.exitCode()!=0)
379
writeLog_("Warning: unable to determine the version of OMSSA - the process returned an error. Call string was: '" + call + "'. Make sure that the path to the OMSSA executable is correct!");
380
return ILLEGAL_PARAMETERS;
167
bool getVersion_(const String & version, OMSSAVersion & omssa_version_i) const
169
// we expect three components
170
IntList nums = IntList::create(StringList::create(version, '.'));
171
if (nums.size() != 3) return false;
173
omssa_version_i.omssa_major = nums[0];
174
omssa_version_i.omssa_minor = nums[1];
175
omssa_version_i.omssa_patch = nums[2];
179
void registerOptionsAndFlags_()
181
registerInputFile_("in", "<file>", "", "Input file ");
182
setValidFormats_("in", StringList::create("mzML"));
183
registerOutputFile_("out", "<file>", "", "Output file ");
184
setValidFormats_("out", StringList::create("idXML"));
186
registerDoubleOption_("precursor_mass_tolerance", "<tolerance>", 1.5, "Precursor mass tolerance (Default: Dalton)", false);
187
registerFlag_("precursor_mass_tolerance_unit_ppm", "If this flag is set, ppm is used as precursor mass tolerance unit");
188
registerDoubleOption_("fragment_mass_tolerance", "<tolerance>", 0.3, "Fragment mass error in Dalton", false);
189
registerInputFile_("database", "<psq-file>", "", "NCBI formatted FASTA files. Only the .psq filename should be given, e.g. 'SwissProt.fasta.psq'. If the filename does not end in '.psq' the suffix will be added automatically. Non-existing relative file-names are looked up via'OpenMS.ini:id_db_dir'", true, false, StringList::create("skipexists"));
190
setValidFormats_("database",StringList::create("psq,fasta"));
191
registerIntOption_("min_precursor_charge", "<charge>", 1, "Minimum precursor ion charge", false);
192
registerIntOption_("max_precursor_charge", "<charge>", 3, "Maximum precursor ion charge", false);
193
vector<String> all_mods;
194
ModificationsDB::getInstance()->getAllSearchModifications(all_mods);
195
registerStringList_("fixed_modifications", "<mods>", StringList::create(""), "Fixed modifications, specified using UniMod (www.unimod.org) terms, e.g. 'Carbamidomethyl (C)' or 'Oxidation (M)'", false);
196
setValidStrings_("fixed_modifications", all_mods);
197
registerStringList_("variable_modifications", "<mods>", StringList::create(""), "Variable modifications, specified using UniMod (www.unimod.org) terms, e.g. 'Carbamidomethyl (C)' or 'Oxidation (M)'", false);
198
setValidStrings_("variable_modifications", all_mods);
201
//-d <String> Blast sequence library to search. Do not include .p* filename suffixes.
202
//-pc <Integer> The number of pseudocounts to add to each precursor mass bin.
203
//registerStringOption_("d", "<file>", "", "Blast sequence library to search. Do not include .p* filename suffixes", true);
204
registerInputFile_("omssa_executable", "<executable>", "omssacl", "The 'omssacl' executable of the OMSSA installation", true, false, StringList::create("skipexists"));
205
registerIntOption_("pc", "<Integer>", 1, "The number of pseudocounts to add to each precursor mass bin", false, true);
207
//registerFlag_("omssa_out", "If this flag is set, the parameter 'in' is considered as an output file of OMSSA and will be converted to idXML");
208
//registerStringOption_("omssa_out_format", "<type>", "", "Specifies the output format of OMSSA, if not given the format will be estimated", false);
211
//Input format and filename
212
//-f <String> single dta file to search
213
//-fx <String> multiple xml-encapsulated dta files to search
214
//-fb <String> multiple dta files separated by blank lines to search
215
//-fm <String> mgf formatted file
216
//-fp <String> pkl formatted file
217
//-hs <Integer> the minimum number of m/z values a spectrum must have to be searched
218
//-fxml <String> omssa xml search request file (contains search parameters and spectra. overrides command line)
219
//-pm <String> search parameter input in xml format (contains search parameters but no spectra. overrides command line except for name of file containing spectra)
220
// input options are not all necessary as TOPP tools only accept mzML
221
registerIntOption_("hs", "<Integer>", 4, "the minimum number of m/z values a spectrum must have to be searched", false, true);
222
//registerStringOption_("pm", "<file>", "", "search parameter input in xml format", false);
225
//-o <String> filename for text asn.1 formatted search results
226
//-ob <String> filename for binary asn.1 formatted search results
227
//-ox <String> filename for xml formatted search results
228
//-oc <String> filename for comma separated value (excel .csv) formatted search results
229
// output options of OMSSA are not necessaryOMSSA
231
//The following options output the search parameters and search spectra in the output results. This is necessary for viewing result in the OMSSA browser:
232
//-w include spectra and search params in search results
234
//To turn off informational messages (but not error messages), use:
235
//-ni don't print informational messages
237
//Mass type and tolerance
238
//-to <Real> product ion mass tolerance in Da
239
//-te <Real> precursor ion mass tolerance in Da
240
//-tez <Integer> scaling of precursor mass tolerance with charge (0 = none, 1= linear)
241
//registerDoubleOption_("to", "<Real>", 0.8, "product ion mass tolerance in Da", false);
242
//registerDoubleOption_("te", "<Real>", 2.0, "precursor ion mass tolerance in Da", false);
243
registerIntOption_("tez", "<Integer>", 1, "scaling of precursor mass tolerance with charge (0 = none, 1= linear)", false, true);
245
//A precursor ion is the ion before fragmentation and the product ions are the ions generated after fragmentation. These values are specified in Daltons +/- the measured value, e.g. a value of 2.0 means +/- 2.0 Daltons of the measured value.
246
//The tez value allows you to specify how the mass tolerance scales with the charge of the precursor. For example, you may search a precursor assuming that it has a charge state of 2+ and 3+. If you set tez to 1, then the mass tolerance for the +2 charge state will be 2 times the precursor mass tolerance, and for the 3+ charge state it will be 3 times the precursor mass tolerance. If you set tez to 0, the mass tolerance will always be equal to the precursor mass tolerance, irrespective of charge state.
248
//-tom <Integer> product ion search type, with 0 = monoisotopic, 1 = average, 2 = monoisotopic N15, 3 = exact.
249
//-tem <Integer> precursor ion search type, with 0 = monoisotopic, 1 = average, 2 = monoisotopic N15, 3 = exact.
250
registerIntOption_("tom", "<Integer>", 0, "product ion search type, with 0 = monoisotopic, 1 = average, 2 = monoisotopic N15, 3 = exact", false, true);
251
registerIntOption_("tem", "<Integer>", 0, "precursor ion search type, with 0 = monoisotopic, 1 = average, 2 = monoisotopic N15, 3 = exact", false, true);
253
//Monoisotopic searching searches spectral peaks that correspond to peptides consisting entirely of carbon-12. Average mass searching searches on the average natural isotopic mass of peptides. Exact mass searches on the most abundant isotopic peak for a given mass range.
255
//-tex <Double> threshold in Da above which the mass of a neutron should be added in an exact mass search.
256
registerDoubleOption_("tex", "<Real>", 1446.94, "threshold in Da above which the mass of a neutron should be added in an exact mass search", false, true);
259
//Preprocessing is the process of eliminating noise from a spectrum. Normally, you do not need to adjust these options as OMSSA automatically adjusts its preprocessing for best results.
261
//-cl <Real> low intensity cutoff as a fraction of max peak
262
//-ch <Real> high intensity cutoff as a fraction of max peak
263
//-ci <Real> intensity cutoff increment as a fraction of max peak
264
//-w1 <Integer> single charge window in Da
265
//-w2 <Integer> double charge window in Da
266
//-h1 <Integer> number of peaks allowed in single charge window
267
//-h2 <Integer> number of peaks allowed in double charge window
268
//-cp <Integer> eliminate charge reduced precursors in spectra (0=no, 1=yes). Typically turned on for ETD spectra.
271
//Determination of precursor charge and product ion charges. Presently, OMSSA estimates which precursors are 1+. All other precursors are searched with charge from the minimum to maximum precursor charge specified.
272
//-zl <Integer> minimum precursor charge to search when not 1+
273
//-zh <Integer> maximum precursor charge to search when not 1+
274
//-zt <Integer> minimum precursor charge to start considering multiply charged products
275
//-z1 <Double> the fraction of peaks below the precursor used to determine if the spectrum is charge +1
276
//-zc <Integer> should charge +1 be determined algorithmically (1=yes)
277
//-zcc <Integer> how should precursor charges be determined? (1=believe the input file,2=use the specified range)
278
//-zoh <Integer> set the maximum product charge to search
280
//registerIntOption_("zl", "<Integer>", 1, "minimum precursor charge to search when not 1+", false);
281
//registerIntOption_("zh", "<Integer>", 3, "maximum precursor charge to search when not 1+", false);
282
registerIntOption_("zt", "<Integer>", 3, "minimum precursor charge to start considering multiply charged products", false, true);
283
registerDoubleOption_("z1", "<Real>", 0.95, "the fraction of peaks below the precursor used to determine if the spectrum is charge +1", false, true);
284
registerIntOption_("zc", "<Integer>", 1, "should charge +1 be determined algorithmically (1=yes)", false, true);
285
registerIntOption_("zcc", "<Integer>", 2, "how should precursor charges be determined? (1=believe the input file,2=use the specified range)", false, true);
286
registerIntOption_("zoh", "<Integer>", 2, "set the maximum product charge to search", false, true);
288
//Enzyme specification
289
//Additional enzymes can be added upon request.
290
//-v <Integer> number of missed cleavages allowed
291
//-e <Integer> id number of enzyme to use (trypsin is the default)
292
//-el print a list of enzymes and their corresponding id number
293
//-no <Integer> minimum size of peptides for no-enzyme and semi-tryptic searches
294
//-nox <Integer> maximum size of peptides for no-enzyme and semi-tryptic searches
295
registerIntOption_("v", "<Integer>", 1, "number of missed cleavages allowed", false);
296
registerIntOption_("e", "<Integer>", 0, "id number of enzyme to use (0 (i.e. trypsin) is the default, 17 would be no enzyme (i.e. unspecific digestion), for more please refer to omssacl -help).", false);
297
registerIntOption_("no", "<Integer>", 4, "minimum size of peptides for no-enzyme and semi-tryptic searches", false, true);
298
registerIntOption_("nox", "<Integer>", 40, "maximum size of peptides for no-enzyme and semi-tryptic searches", false, true);
301
//OMSSA searches two ions series, both of which can be specified. Normally one of the ion series specified is a forward ion series and the other is a reverse ion series.
302
//-il print a list of ions and their corresponding id number
303
//-i comma delimited list of id numbers of ions to search
304
//-sp <Integer> number of product ions to search
305
//-sb1 <Integer> should first forward (e.g. b1) product ions be searched (1 = no, 0 = yes)
306
//-sct <Integer> should c terminus ions (e.g. y1) be searched (1 = no, 0 = yes)
307
registerStringOption_("i", "<Num>,<Num>,<Num>", "1,4", "comma delimited list of id numbers of ions to search", false, true);
308
registerIntOption_("sp", "<Integer>", 100, "number of product ions to search", false, true);
309
registerIntOption_("sb1", "<Integer>", 1, "should first forward (e.g. b1) product ions be searched (1 = no, 0 = yes)", false, true);
310
registerIntOption_("sct", "<Integer>", 0, "should c terminus ions (e.g. y1) be searched (1 = no, 0 = yes)", false, true);
313
//By default, OMSSA searches without limiting by taxonomy. By specifying an NCBI taxonomy id, you can limit your search to a particular organism. The taxonomy id can by found by searching the NCBI tax browser (enter the scientific name of the organism of interest in the search box and then click the correct search result and then the scientific name in the taxonomy browser to get the numeric taxonomy id).
314
//-x comma delimited list of NCBI taxonomy ids to search (0 = all. This is the default)
315
registerStringOption_("x", "<Num>,<Num>,<Num>", "0", "comma delimited list of NCBI taxonomy ids to search (0 = all. This is the default)", false, true);
317
//Search heuristic parameters
318
//These are options that can speed up the search. They can result in decreased sensitivity
319
//-hm <Integer> the minimum number of m/z matches a sequence library peptide must have for the hit to the peptide to be recorded
320
//-ht <Integer> number of m/z values corresponding to the most intense peaks that must include one match to the theoretical peptide
321
registerIntOption_("hm", "<Integer>", 2, "the minimum number of m/z matches a sequence library peptide must have for the hit to the peptide to be recorded", false, true);
322
registerIntOption_("ht", "<Integer>", 6, "number of m/z values corresponding to the most intense peaks that must include one match to the theoretical peptide", false, true);
325
//-hl <Integer> maximum number of hits retained for one spectrum
326
//-he <Double> the maximum e-value allowed in the hit list
327
registerIntOption_("hl", "<Integer>", 30, "maximum number of hits retained for one spectrum. Note: even when set to 1 OMSSA may report multiple hits with different charge states", false);
328
registerDoubleOption_("he", "<Real>", 1000, "the maximum e-value allowed in the hit list. If you set this parameter too small (e.g., he=1), this will effectively introduce FDR filtering."
329
" Thus, allowing a less stringent FDR during post-processing will nevertheless return the (better) FDR introduced here, since mediocre hits are not even reported.", false);
331
//Post translational modifications
332
//To specify modifications, first type in "omssacl -ml" to see a list of modifications available and their corresponding id number. Then when running the search, specify the id numbers of the modification you wish to apply, e.g. "omssacl -mf 5 -mv 1,8 ...". Multiple PTMs can be specified by placing commas between the numbers without any spaces. At the present time, the list of allowed post translational modifications will be expanded over time.
333
//-mf comma delimited list of id numbers for fixed modifications
334
//-mv comma delimited list of id numbers for variable modifications
335
//-ml print a list of modifications and their corresponding id number
336
//registerStringOption_("mf", "<Num>,<Num>,<Num>", "", "comma delimited list of id numbers for fixed modifications", false);
337
//registerStringOption_("mv", "<Num>,<Num>,<Num>", "", "comma delimited list of id numbers for variable modifications", false);
339
//registerStringOption_("mux", "<file>", "", "use the given file which contains user modifications in OMSSA modifications xml format", false);
342
//To add your own user defined modifications, edit the usermod0-29 entries in the mods.xml file. If it is common modification, please contact NCBI so that it can be added to the standard list.
343
//To reduce the combinatorial expansion that results when specifying multiple variable modifications, you can put an upper bound on the number of mass ladders generated per peptide using the -mm option. The ladders are generated in the order of the least number of modification to the most number of modifications.
344
//-mm <Integer> the maximum number of mass ladders to generate per database peptide
345
registerIntOption_("mm", "<Integer>", 128, "the maximum number of mass ladders to generate per database peptide", false, true);
347
//There is an upper bound on the number of combinations of variable mods that can be applied to a peptide from the sequence library. The hard upper bound is 1024, which effectively limits the number of variable modification sites per peptide for an exhaustive search to 10. If you set this number too low, you will miss highly modified peptides. If you set it too high, it will make the e-values less significant by searching for too many possible modifications.
348
//OMSSA treats cleavage of the initial methionine in each protein record as a variable modification by default. To turn off this behavior use the command line option
349
//-mnm n-term methionine should not be cleaved
350
registerFlag_("mnm", "n-term methionine should not be cleaved", true);
352
//Iterative searching
353
//-is <Double> evalue threshold to include a sequence in the iterative search, 0 = all
354
//-ir <Double> evalue threshold to replace a hit, 0 = only if better
355
//-ii <Double> evalue threshold to iteratively search a spectrum again, 0 = always
356
registerDoubleOption_("is", "<Real>", 0.0, "evalue threshold to include a sequence in the iterative search, 0 = all", false, true);
357
registerDoubleOption_("ir", "<Real>", 0.0, "evalue threshold to replace a hit, 0 = only if better", false, true);
358
registerDoubleOption_("ii", "<Real>", 0.0, "evalue threshold to iteratively search a spectrum again, 0 = always", false, true);
361
//-foms <String> read in search result in .oms format (binary asn.1).
362
//-fomx <Double> read in search result in .omx format (xml).
363
//Iterative searching is the ability to re-search search results in hopes of increasing the number of spectra identified. To accomplish this, an iterative search may change search parameters, such as using a no-enzyme search, or restrict the sequence search library to sequences already hit.
367
ExitCodes main_(int, const char **)
369
StringList parameters;
370
// path to the log file
371
String logfile(getStringOption_("log"));
372
String omssa_executable(getStringOption_("omssa_executable"));
373
String unique_name = QDir::toNativeSeparators(String(File::getTempDirectory() + "/" + File::getUniqueName()).toQString()); // body for the tmp files
374
String unique_input_name = unique_name + "_OMSSA.mgf";
375
String unique_output_name = unique_name + "_OMSSA.xml";
376
String unique_usermod_name = unique_name + "_OMSSA_user_mod_file.xml";
378
//-------------------------------------------------------------
379
// parsing parameters
380
//-------------------------------------------------------------
382
// get version of OMSSA
384
qp.start(omssa_executable.toQString(), QStringList() << "-version", QIODevice::ReadOnly); // does automatic escaping etc...
385
bool success = qp.waitForFinished();
386
String output(QString(qp.readAllStandardOutput()));
387
String omssa_version;
388
OMSSAVersion omssa_version_i;
389
if (!success || qp.exitStatus() != 0 || qp.exitCode() != 0)
391
writeLog_("Warning: unable to determine the version of OMSSA - the process returned an error. Call string was: '" + omssa_executable + " -version'. Make sure that the path to the OMSSA executable is correct!");
392
return ILLEGAL_PARAMETERS;
396
vector<String> version_split;
397
output.split(' ', version_split);
398
if (version_split.size() == 2 && getVersion_(version_split[1], omssa_version_i))
400
omssa_version = version_split[1].removeWhitespaces();
401
writeDebug_("Setting OMSSA version to " + omssa_version, 1);
384
vector<String> version_split;
385
output.split(' ', version_split);
386
if (version_split.size() == 2 && getVersion_(version_split[1], omssa_version_i))
388
omssa_version = version_split[1].removeWhitespaces();
389
writeDebug_("Setting OMSSA version to " + omssa_version, 1);
393
writeLog_("Warning: OMSSA version output (" + output + ") not formatted as expected!");
397
String inputfile_name = getStringOption_("in");
398
String outputfile_name = getStringOption_("out");
399
String db_name = String(getStringOption_("database"));
400
// @todo: find DB for OMSSA (if not given) in OpenMS_bin/share/OpenMS/DB/*.fasta|.pin|...
403
if (db_name.suffix('.') != "psq")
408
if (!File::readable(db_name))
413
full_db_name = File::findDatabase(db_name);
418
return ILLEGAL_PARAMETERS;
420
db_name = full_db_name;
423
db_name = db_name.substr(0,db_name.size()-4); // OMSSA requires the filename without the .psq part
425
parameters += " -d " + String(db_name).quote('"', String::NONE);
426
parameters += " -to " + String(getDoubleOption_("fragment_mass_tolerance")); //String(getDoubleOption_("to"));
427
parameters += " -hs " + String(getIntOption_("hs"));
428
parameters += " -te " + String(getDoubleOption_("precursor_mass_tolerance")); //String(getDoubleOption_("te"));
429
if (getFlag_("precursor_mass_tolerance_unit_ppm"))
431
if (omssa_version_i < OMSSAVersion(2,1,8))
433
writeLog_("This OMSSA version (" + omssa_version + ") does not support the 'precursor_mass_tolerance_unit_ppm' flag."
434
+" Please disable it and set the precursor tolerance in Da."
435
+" Required version is 2.1.8 and above.\n");
436
return ILLEGAL_PARAMETERS;
438
parameters += " -teppm "; // only from OMSSA 2.1.8 on
440
parameters += " -zl " + String(getIntOption_("min_precursor_charge")); //String(getIntOption_("zl"));
441
parameters += " -zh " + String(getIntOption_("max_precursor_charge")); //String(getIntOption_("zh"));
442
parameters += " -zt " + String(getIntOption_("zt"));
443
parameters += " -zc " + String(getIntOption_("zc"));
444
parameters += " -zcc " + String(getIntOption_("zcc"));
445
parameters += " -zoh " + String(getIntOption_("zoh"));
446
parameters += " -no " + String(getIntOption_("no"));
447
parameters += " -nox " + String(getIntOption_("nox"));
448
parameters += " -sp " + String(getIntOption_("sp"));
449
parameters += " -sb1 " + String(getIntOption_("sb1"));
450
parameters += " -sct " + String(getIntOption_("sct"));
451
parameters += " -x " + getStringOption_("x");
452
parameters += " -hl " + String(getIntOption_("hl"));
453
parameters += " -hm " + String(getIntOption_("hm"));
454
parameters += " -ht " + String(getIntOption_("ht"));
455
parameters += " -tex " + String(getDoubleOption_("tex"));
456
parameters += " -i " + getStringOption_("i");
457
parameters += " -z1 " + String(getDoubleOption_("z1"));
458
parameters += " -v " + String(getIntOption_("v"));
459
parameters += " -e " + String(getIntOption_("e"));
460
parameters += " -tez " + String(getIntOption_("tez"));
463
parameters += " -tom " + String(getIntOption_("tom"));
464
parameters += " -tem " + String(getIntOption_("tem"));
466
parameters += " -mm " + String(getIntOption_("mm"));
467
parameters += " -is " + String(getDoubleOption_("is"));
468
parameters += " -ir " + String(getDoubleOption_("ir"));
469
parameters += " -ii " + String(getDoubleOption_("ii"));
470
parameters += " -nt " + String(getIntOption_("threads"));
474
parameters += " -mnm ";
477
parameters += " -fm " + String(unique_input_name).quote('"', String::NONE);
478
parameters += " -ox " + String(unique_output_name).quote('"', String::NONE);
480
if (getIntOption_("debug") == 0)
482
parameters += " -ni ";
484
parameters += " -he " + String(getDoubleOption_("he"));
487
// read mapping for the modifications
488
String file = File::find("CHEMISTRY/OMSSA_modification_mapping");
490
TextFile infile(file);
491
Map<String, UInt> mods_map;
492
for (TextFile::ConstIterator it = infile.begin(); it != infile.end(); ++it)
494
vector<String> split;
495
it->split(',', split);
497
if (it->size() > 0 && (*it)[0] != '#')
499
if (split.size() < 2)
501
throw Exception::ParseError(__FILE__, __LINE__, __PRETTY_FUNCTION__, "parse mapping file line: '" + *it + "'", "");
503
vector<ResidueModification> mods;
504
for (Size i = 2; i != split.size(); ++i)
506
String tmp(split[i].trim());
509
mods_map[tmp] = split[0].trim().toInt();
515
writeDebug_("Evaluating modifications", 1);
516
ModificationDefinitionsSet mod_set(getStringList_("fixed_modifications"), getStringList_("variable_modifications"));
517
writeDebug_("Setting modifications", 1);
518
UInt user_mod_num(119);
519
vector<pair<UInt, String> > user_mods;
520
// fixed modifications
521
if ( !getStringList_("fixed_modifications").empty())
523
set<String> mod_names = mod_set.getFixedModificationNames();
525
for (set<String>::const_iterator it = mod_names.begin(); it != mod_names.end(); ++it)
527
if (mods_map.has(*it))
529
mod_list.push_back(String(mods_map[*it]));
533
mod_list.push_back(String(user_mod_num));
534
// add this to the usermods
535
user_mods.push_back(make_pair(user_mod_num++, *it));
536
writeDebug_("Inserting unknown fixed modification: '" + *it + "' into OMSSA", 1);
539
if (mod_list.size() > 0)
541
parameters += " -mf " + mod_list.concatenate(",");
545
if ( !getStringList_("variable_modifications").empty() )
547
set<String> mod_names = mod_set.getVariableModificationNames();
550
for (set<String>::const_iterator it = mod_names.begin(); it != mod_names.end(); ++it)
552
if (mods_map.has(*it))
554
mod_list.push_back(String(mods_map[*it]));
558
mod_list.push_back(String(user_mod_num));
559
// add this to the usermods
560
user_mods.push_back(make_pair(user_mod_num++, *it));
561
writeDebug_("Inserting unknown variable modification: '" + *it + "' into OMSSA", 1);
565
if (mod_list.size() > 0)
567
parameters += " -mv " + mod_list.concatenate(",");
571
String additional_user_mods_filename = getStringOption_("omssa_user_mods");
572
// write unknown modifications to user mods file
573
if ( !user_mods.empty() || additional_user_mods_filename != "")
575
writeDebug_("Writing usermod file to " + unique_usermod_name, 1);
576
parameters += " -mux " + (File::absolutePath(unique_usermod_name));
577
ofstream out(unique_usermod_name.c_str());
578
out << "<?xml version=\"1.0\"?>" << endl;
579
out << "<MSModSpecSet xmlns=\"http://www.ncbi.nlm.nih.gov\" xmlns:xs=\"http://www.w3.org/2001/XMLSchema-instance\" xs:schemaLocation=\"http://www.ncbi.nlm.nih.gov OMSSA.xsd\">" << endl;
581
UInt user_mod_count(1);
582
for (vector<pair<UInt, String> >::const_iterator it = user_mods.begin(); it != user_mods.end(); ++it)
584
writeDebug_("Writing information into user mod file of modification: " + it->second, 1);
585
out << "<MSModSpec>" << endl;
586
out << "\t<MSModSpec_mod>" << endl;
587
out << "\t\t<MSMod value=\"usermod" << user_mod_count++ << "\">" << it->first << "</MSMod>" << endl;
588
out << "\t</MSModSpec_mod>" << endl;
589
out << "\t<MSModSpec_type>" << endl;
592
0 modaa - at particular amino acids
593
1 modn - at the N terminus of a protein
594
2 modnaa - at the N terminus of a protein at particular amino acids
595
3 modc - at the C terminus of a protein
596
4 modcaa - at the C terminus of a protein at particular amino acids
597
5 modnp - at the N terminus of a peptide
598
6 modnpaa - at the N terminus of a peptide at particular amino acids
599
7 modcp - at the C terminus of a peptide
600
8 modcpaa - at the C terminus of a peptide at particular amino acids
601
9 modmax - the max number of modification types
604
ResidueModification::Term_Specificity ts = ModificationsDB::getInstance()->getModification(it->second).getTermSpecificity();
605
String origin = ModificationsDB::getInstance()->getModification(it->second).getOrigin();
606
if (ts == ResidueModification::ANYWHERE)
608
out << "\t\t<MSModType value=\"modaa\">0</MSModType>" << endl;
610
if (ts == ResidueModification::C_TERM)
612
if (origin == "" || origin == "X")
614
out << "\t\t<MSModType value=\"modcp\">7</MSModType>" << endl;
618
out << "\t\t<MSModType value=\"modcpaa\">8</MSModType>" << endl;
621
if (ts == ResidueModification::N_TERM)
623
if (origin == "" || origin == "X")
625
out << "\t\t<MSModType value=\"modnp\">5</MSModType>" << endl;
629
out << "\t\t<MSModType value=\"modnpaa\">6</MSModType>" << endl;
632
out << "\t</MSModSpec_type>" << endl;
634
out << "\t<MSModSpec_name>" << it->second << "</MSModSpec_name>" << endl;
635
out << "\t<MSModSpec_monomass>" << ModificationsDB::getInstance()->getModification(it->second).getDiffMonoMass() << "</MSModSpec_monomass>" << endl;
636
out << "\t<MSModSpec_averagemass>" << ModificationsDB::getInstance()->getModification(it->second).getDiffAverageMass() << "</MSModSpec_averagemass>" << endl;
637
out << "\t<MSModSpec_n15mass>0</MSModSpec_n15mass>" << endl;
641
out << "\t<MSModSpec_residues>" << endl;
642
out << "\t\t<MSModSpec_residues_E>" << origin << "</MSModSpec_residues_E>" << endl;
643
out << "\t</MSModSpec_residues>" << endl;
645
/* TODO: Check why these are always 0
646
DoubleReal neutral_loss_mono = ModificationsDB::getInstance()->getModification(it->second).getNeutralLossMonoMass();
647
DoubleReal neutral_loss_avg = ModificationsDB::getInstance()->getModification(it->second).getNeutralLossAverageMass();
649
DoubleReal neutral_loss_mono = ModificationsDB::getInstance()->getModification(it->second).getNeutralLossDiffFormula().getMonoWeight();
650
DoubleReal neutral_loss_avg = ModificationsDB::getInstance()->getModification(it->second).getNeutralLossDiffFormula().getAverageWeight();
652
if (fabs(neutral_loss_mono) > 0.00001)
654
out << "\t<MSModSpec_neutralloss>" << endl;
655
out << "\t\t<MSMassSet>" << endl;
656
out << "\t\t\t<MSMassSet_monomass>" << neutral_loss_mono << "</MSMassSet_monomass>" << endl;
657
out << "\t\t\t<MSMassSet_averagemass>" << neutral_loss_avg << "</MSMassSet_averagemass>" << endl;
658
out << "\t\t\t<MSMassSet_n15mass>0</MSMassSet_n15mass>" << endl;
659
out << "\t\t</MSMassSet>" << endl;
660
out << "\t</MSModSpec_neutralloss>" << endl;
663
out << "</MSModSpec>" << endl;
667
// Add additional MSModSPec subtrees to generated user mods
668
ifstream additional_user_mods_file(additional_user_mods_filename.c_str());
670
if(additional_user_mods_file.is_open())
672
while (additional_user_mods_file.good())
674
getline(additional_user_mods_file, line);
677
additional_user_mods_file.close();
679
out << "</MSModSpecSet>" << endl;
683
//-------------------------------------------------------------
685
//-------------------------------------------------------------
687
MzMLFile mzml_infile;
405
writeLog_("Warning: OMSSA version output (" + output + ") not formatted as expected!");
409
String inputfile_name = getStringOption_("in");
410
String outputfile_name = getStringOption_("out");
411
String db_name = String(getStringOption_("database"));
412
// @todo: find DB for OMSSA (if not given) in OpenMS_bin/share/OpenMS/DB/*.fasta|.pin|...
415
if (db_name.suffix('.') != "psq")
420
if (!File::readable(db_name))
425
full_db_name = File::findDatabase(db_name);
429
LOG_ERROR << "Unable to find database '" << db_name << "' (searched all folders). Did you mistype its name? Is the .psq-formatted version available?" << std::endl;
430
return ILLEGAL_PARAMETERS;
432
db_name = full_db_name;
435
db_name = db_name.substr(0, db_name.size() - 4); // OMSSA requires the filename without the .psq part
437
bool db_name_contains_space = false;
438
if (db_name.hasSubstring(" "))
440
db_name_contains_space = true;
442
// This is a workaround for a bug in the NCBI libraries.
443
// They internally don't support spaces in path or file names.
444
if (db_name_contains_space)
446
#ifdef OPENMS_WINDOWSPLATFORM
447
// Windows: use doubly escaped double quotes (and do a system call instead of QProcess later)
448
parameters << "-d" << String("\"\\\"") + String(db_name) + String("\\\"\"");
450
// Linux/Mac: wrap into singly escaped double quotes
451
parameters << "-d" << String("\"") + String(db_name) + String("\"");
456
parameters << "-d" << String(db_name);
459
parameters << "-to" << String(getDoubleOption_("fragment_mass_tolerance")); //String(getDoubleOption_("to"));
460
parameters << "-hs" << String(getIntOption_("hs"));
461
parameters << "-te" << String(getDoubleOption_("precursor_mass_tolerance")); //String(getDoubleOption_("te"));
462
if (getFlag_("precursor_mass_tolerance_unit_ppm"))
464
if (omssa_version_i < OMSSAVersion(2, 1, 8))
466
writeLog_("This OMSSA version (" + omssa_version + ") does not support the 'precursor_mass_tolerance_unit_ppm' flag."
467
+ " Please disable it and set the precursor tolerance in Da."
468
+ " Required version is 2.1.8 and above.\n");
469
return ILLEGAL_PARAMETERS;
471
parameters << "-teppm"; // only from OMSSA 2.1.8 on
473
parameters << "-zl" << String(getIntOption_("min_precursor_charge")); //String(getIntOption_("zl"));
474
parameters << "-zh" << String(getIntOption_("max_precursor_charge")); //String(getIntOption_("zh"));
475
parameters << "-zt" << String(getIntOption_("zt"));
476
parameters << "-zc" << String(getIntOption_("zc"));
477
parameters << "-zcc" << String(getIntOption_("zcc"));
478
parameters << "-zoh" << String(getIntOption_("zoh"));
479
parameters << "-no" << String(getIntOption_("no"));
480
parameters << "-nox" << String(getIntOption_("nox"));
481
parameters << "-sp" << String(getIntOption_("sp"));
482
parameters << "-sb1" << String(getIntOption_("sb1"));
483
parameters << "-sct" << String(getIntOption_("sct"));
484
parameters << "-x" << getStringOption_("x");
485
parameters << "-hl" << String(getIntOption_("hl"));
486
parameters << "-hm" << String(getIntOption_("hm"));
487
parameters << "-ht" << String(getIntOption_("ht"));
488
parameters << "-tex" << String(getDoubleOption_("tex"));
489
parameters << "-i" << getStringOption_("i");
490
parameters << "-z1" << String(getDoubleOption_("z1"));
491
parameters << "-v" << String(getIntOption_("v"));
492
parameters << "-e" << String(getIntOption_("e"));
493
parameters << "-tez" << String(getIntOption_("tez"));
496
parameters << "-tom" << String(getIntOption_("tom"));
497
parameters << "-tem" << String(getIntOption_("tem"));
499
parameters << "-mm" << String(getIntOption_("mm"));
500
parameters << "-is" << String(getDoubleOption_("is"));
501
parameters << "-ir" << String(getDoubleOption_("ir"));
502
parameters << "-ii" << String(getDoubleOption_("ii"));
503
parameters << "-nt" << String(getIntOption_("threads"));
507
parameters << "-mnm";
510
parameters << "-fm" << unique_input_name;
511
parameters << "-ox" << unique_output_name;
513
if (getIntOption_("debug") == 0)
517
parameters << "-he" << String(getDoubleOption_("he"));
520
// read mapping for the modifications
521
String file = File::find("CHEMISTRY/OMSSA_modification_mapping");
523
TextFile infile(file);
524
Map<String, UInt> mods_map;
525
for (TextFile::ConstIterator it = infile.begin(); it != infile.end(); ++it)
527
vector<String> split;
528
it->split(',', split);
530
if (it->size() > 0 && (*it)[0] != '#')
532
if (split.size() < 2)
534
throw Exception::ParseError(__FILE__, __LINE__, __PRETTY_FUNCTION__, "parse mapping file line: '" + *it + "'", "");
536
for (Size i = 2; i != split.size(); ++i)
538
String tmp(split[i].trim());
541
mods_map[tmp] = split[0].trim().toInt();
547
writeDebug_("Evaluating modifications", 1);
548
ModificationDefinitionsSet mod_set(getStringList_("fixed_modifications"), getStringList_("variable_modifications"));
549
writeDebug_("Setting modifications", 1);
550
UInt user_mod_num(119);
551
vector<pair<UInt, String> > user_mods;
552
// fixed modifications
553
if (!getStringList_("fixed_modifications").empty())
555
set<String> mod_names = mod_set.getFixedModificationNames();
557
for (set<String>::const_iterator it = mod_names.begin(); it != mod_names.end(); ++it)
559
if (mods_map.has(*it))
561
mod_list.push_back(String(mods_map[*it]));
565
mod_list.push_back(String(user_mod_num));
566
// add this to the usermods
567
user_mods.push_back(make_pair(user_mod_num++, *it));
568
writeDebug_("Inserting unknown fixed modification: '" + *it + "' into OMSSA", 1);
571
if (mod_list.size() > 0)
573
parameters << "-mf" << mod_list.concatenate(",");
577
if (!getStringList_("variable_modifications").empty())
579
set<String> mod_names = mod_set.getVariableModificationNames();
582
for (set<String>::const_iterator it = mod_names.begin(); it != mod_names.end(); ++it)
584
if (mods_map.has(*it))
586
mod_list.push_back(String(mods_map[*it]));
590
mod_list.push_back(String(user_mod_num));
591
// add this to the usermods
592
user_mods.push_back(make_pair(user_mod_num++, *it));
593
writeDebug_("Inserting unknown variable modification: '" + *it + "' into OMSSA", 1);
597
if (mod_list.size() > 0)
599
parameters << "-mv" << mod_list.concatenate(",");
603
// write unknown modifications to user mods file
604
if (!user_mods.empty())
606
writeDebug_("Writing usermod file to " + unique_usermod_name, 1);
607
parameters << "-mux" << File::absolutePath(unique_usermod_name);
608
ofstream out(unique_usermod_name.c_str());
609
out << "<?xml version=\"1.0\"?>" << "\n";
610
out << "<MSModSpecSet xmlns=\"http://www.ncbi.nlm.nih.gov\" xmlns:xs=\"http://www.w3.org/2001/XMLSchema-instance\" xs:schemaLocation=\"http://www.ncbi.nlm.nih.gov OMSSA.xsd\">" << "\n";
612
UInt user_mod_count(1);
613
for (vector<pair<UInt, String> >::const_iterator it = user_mods.begin(); it != user_mods.end(); ++it)
615
writeDebug_("Writing information into user mod file of modification: " + it->second, 1);
616
out << "<MSModSpec>" << "\n";
617
out << "\t<MSModSpec_mod>" << "\n";
618
out << "\t\t<MSMod value=\"usermod" << user_mod_count++ << "\">" << it->first << "</MSMod>" << "\n";
619
out << "\t</MSModSpec_mod>" << "\n";
620
out << "\t<MSModSpec_type>" << "\n";
623
0 modaa - at particular amino acids
624
1 modn - at the N terminus of a protein
625
2 modnaa - at the N terminus of a protein at particular amino acids
626
3 modc - at the C terminus of a protein
627
4 modcaa - at the C terminus of a protein at particular amino acids
628
5 modnp - at the N terminus of a peptide
629
6 modnpaa - at the N terminus of a peptide at particular amino acids
630
7 modcp - at the C terminus of a peptide
631
8 modcpaa - at the C terminus of a peptide at particular amino acids
632
9 modmax - the max number of modification types
635
ResidueModification::Term_Specificity ts = ModificationsDB::getInstance()->getModification(it->second).getTermSpecificity();
636
String origin = ModificationsDB::getInstance()->getModification(it->second).getOrigin();
637
if (ts == ResidueModification::ANYWHERE)
639
out << "\t\t<MSModType value=\"modaa\">0</MSModType>" << "\n";
641
if (ts == ResidueModification::C_TERM)
643
if (origin == "" || origin == "X")
645
out << "\t\t<MSModType value=\"modcp\">7</MSModType>" << "\n";
649
out << "\t\t<MSModType value=\"modcpaa\">8</MSModType>" << "\n";
652
if (ts == ResidueModification::N_TERM)
654
if (origin == "" || origin == "X")
656
out << "\t\t<MSModType value=\"modnp\">5</MSModType>" << "\n";
660
out << "\t\t<MSModType value=\"modnpaa\">6</MSModType>" << "\n";
663
out << "\t</MSModSpec_type>" << "\n";
665
out << "\t<MSModSpec_name>" << it->second << "</MSModSpec_name>" << "\n";
666
out << "\t<MSModSpec_monomass>" << ModificationsDB::getInstance()->getModification(it->second).getDiffMonoMass() << "</MSModSpec_monomass>" << "\n";
667
out << "\t<MSModSpec_averagemass>" << ModificationsDB::getInstance()->getModification(it->second).getDiffAverageMass() << "</MSModSpec_averagemass>" << "\n";
668
out << "\t<MSModSpec_n15mass>0</MSModSpec_n15mass>" << "\n";
672
out << "\t<MSModSpec_residues>" << "\n";
673
out << "\t\t<MSModSpec_residues_E>" << origin << "</MSModSpec_residues_E>" << "\n";
674
out << "\t</MSModSpec_residues>" << "\n";
676
/* TODO: Check why these are always 0
677
DoubleReal neutral_loss_mono = ModificationsDB::getInstance()->getModification(it->second).getNeutralLossMonoMass();
678
DoubleReal neutral_loss_avg = ModificationsDB::getInstance()->getModification(it->second).getNeutralLossAverageMass();
680
DoubleReal neutral_loss_mono = ModificationsDB::getInstance()->getModification(it->second).getNeutralLossDiffFormula().getMonoWeight();
681
DoubleReal neutral_loss_avg = ModificationsDB::getInstance()->getModification(it->second).getNeutralLossDiffFormula().getAverageWeight();
683
if (fabs(neutral_loss_mono) > 0.00001)
685
out << "\t<MSModSpec_neutralloss>" << "\n";
686
out << "\t\t<MSMassSet>" << "\n";
687
out << "\t\t\t<MSMassSet_monomass>" << neutral_loss_mono << "</MSMassSet_monomass>" << "\n";
688
out << "\t\t\t<MSMassSet_averagemass>" << neutral_loss_avg << "</MSMassSet_averagemass>" << "\n";
689
out << "\t\t\t<MSMassSet_n15mass>0</MSMassSet_n15mass>" << "\n";
690
out << "\t\t</MSMassSet>" << "\n";
691
out << "\t</MSModSpec_neutralloss>" << "\n";
694
out << "</MSModSpec>" << "\n";
698
out << "</MSModSpecSet>" << "\n";
702
//-------------------------------------------------------------
704
//-------------------------------------------------------------
706
{ // local scope to free memory after conversion to OMSSA format is done
708
FileTypes::Type in_type = fh.getType(inputfile_name);
689
mzml_infile.setLogType(log_type_);
690
ProteinIdentification protein_identification;
691
vector<PeptideIdentification> peptide_ids;
692
mzml_infile.load(inputfile_name, map);
694
writeDebug_("Read " + String(map.size()) + " spectra from file", 5);
696
vector<ProteinIdentification> protein_identifications;
697
//-------------------------------------------------------------
699
//-------------------------------------------------------------
701
writeDebug_("Storing input file: " + unique_input_name, 5);
702
MascotInfile omssa_infile;
703
omssa_infile.store(unique_input_name, map, "OMSSA search tmp file");
705
// @todo find OMSSA if not given
706
// executable is stored in OpenMS_bin/share/OpenMS/3rdParty/OMSSA/omssacl(.exe)
709
writeDebug_("omssa_executable " + parameters, 5);
710
Int status = QProcess::execute((omssa_executable+" "+parameters).toQString());
713
writeLog_("Error: OMSSA problem! (Details can be seen in the logfile: \"" + logfile + "\")");
714
writeLog_("Note: This message can also be triggered if you run out of space in your tmp directory");
715
if (getIntOption_("debug") <= 1)
717
QFile(unique_input_name.toQString()).remove();
718
QFile(unique_output_name.toQString()).remove();
720
if ( !user_mods.empty() || additional_user_mods_filename!="")
722
QFile(unique_usermod_name.toQString()).remove();
724
return EXTERNAL_PROGRAM_ERROR;
728
writeDebug_("Reading output of OMSSA", 10);
729
OMSSAXMLFile omssa_out_file;
730
omssa_out_file.setModificationDefinitionsSet(mod_set); // TODO: add modifications from additional user mods subtree
731
omssa_out_file.load(unique_output_name, protein_identification, peptide_ids);
733
// OMSSA does not write fixed modifications so we need to add them to the sequences
734
set<String> fixed_mod_names = mod_set.getFixedModificationNames();
735
vector<String> fixed_nterm_mods, fixed_cterm_mods;
736
Map<String, String> fixed_residue_mods;
737
writeDebug_("Splitting modification into N-Term, C-Term and anywhere specificity", 1);
738
for (set<String>::const_iterator it = fixed_mod_names.begin(); it != fixed_mod_names.end(); ++it)
740
ResidueModification::Term_Specificity ts = ModificationsDB::getInstance()->getModification(*it).getTermSpecificity();
741
if (ts == ResidueModification::ANYWHERE)
743
fixed_residue_mods[ModificationsDB::getInstance()->getModification(*it).getOrigin()] = *it;
745
if (ts == ResidueModification::C_TERM)
747
fixed_cterm_mods.push_back(*it);
749
if (ts == ResidueModification::N_TERM)
751
fixed_nterm_mods.push_back(*it);
754
writeDebug_("Assigning modifications to peptides", 1);
755
for (vector<PeptideIdentification>::iterator it = peptide_ids.begin(); it != peptide_ids.end(); ++it)
757
vector<PeptideHit> hits = it->getHits();
758
for (vector<PeptideHit>::iterator pit = hits.begin(); pit != hits.end(); ++pit)
760
AASequence seq = pit->getSequence();
761
for (vector<String>::const_iterator mit = fixed_nterm_mods.begin(); mit != fixed_nterm_mods.end(); ++mit)
763
seq.setNTerminalModification(*mit);
765
for (vector<String>::const_iterator mit = fixed_cterm_mods.begin(); mit != fixed_cterm_mods.end(); ++mit)
767
seq.setCTerminalModification(*mit);
770
for (AASequence::Iterator mit = seq.begin(); mit != seq.end(); ++mit, ++pos)
772
if (fixed_residue_mods.has(mit->getOneLetterCode()))
774
seq.setModification(pos, fixed_residue_mods[mit->getOneLetterCode()]);
777
pit->setSequence(seq);
782
// delete temporary files
783
if (getIntOption_("debug") <= 1)
785
writeDebug_("Removing temporary files", 10);
786
QFile(unique_input_name.toQString()).remove();
787
QFile(unique_output_name.toQString()).remove();
788
if ( !user_mods.empty() )
790
QFile(unique_usermod_name.toQString()).remove();
794
// handle the search parameters
795
ProteinIdentification::SearchParameters search_parameters;
796
search_parameters.db = getStringOption_("database");
797
//search_parameters.db_version =
798
search_parameters.taxonomy = getStringOption_("x");
799
search_parameters.charges = "+" + String(getIntOption_("min_precursor_charge")) + "-+" + String(getIntOption_("max_precursor_charge"));
800
ProteinIdentification::PeakMassType mass_type = ProteinIdentification::MONOISOTOPIC;
802
if (getIntOption_("tom") == 1)
804
mass_type = ProteinIdentification::AVERAGE;
808
if (getIntOption_("tom") != 0)
810
writeLog_("Warning: unrecognized mass type: " + String(getIntOption_("tom")));
813
search_parameters.mass_type = mass_type;
814
search_parameters.fixed_modifications = getStringList_("fixed_modifications");
815
search_parameters.variable_modifications = getStringList_("variable_modifications");
816
ProteinIdentification::DigestionEnzyme enzyme = ProteinIdentification::TRYPSIN;
818
UInt e(getIntOption_("e"));
821
writeLog_("Warning: cannot handle enzyme: " + getIntOption_("e"));
824
search_parameters.enzyme = enzyme;
825
search_parameters.missed_cleavages = getIntOption_("v");
826
search_parameters.peak_mass_tolerance = getDoubleOption_("fragment_mass_tolerance");
827
search_parameters.precursor_tolerance = getDoubleOption_("precursor_mass_tolerance");
830
protein_identification.setSearchParameters(search_parameters);
831
protein_identification.setSearchEngineVersion(omssa_version);
832
protein_identification.setSearchEngine("OMSSA");
834
//-------------------------------------------------------------
836
//-------------------------------------------------------------
838
protein_identifications.push_back(protein_identification);
839
IdXMLFile().store(outputfile_name, protein_identifications, peptide_ids);
710
fh.getOptions().addMSLevel(2);
711
fh.loadExperiment(inputfile_name, map, in_type, log_type_);
713
writeDebug_("Read " + String(map.size()) + " spectra from file", 5);
714
writeDebug_("Storing input file: " + unique_input_name, 5);
715
MascotGenericFile omssa_infile;
716
omssa_infile.store(unique_input_name, map);
718
//-------------------------------------------------------------
720
//-------------------------------------------------------------
723
ProteinIdentification protein_identification;
724
vector<PeptideIdentification> peptide_ids;
725
vector<ProteinIdentification> protein_identifications;
727
// @todo find OMSSA if not given
728
// executable is stored in OpenMS_bin/share/OpenMS/3rdParty/OMSSA/omssacl(.exe)
732
for (Size i = 0; i < parameters.size(); ++i)
734
qparam << parameters[i].toQString();
738
#ifdef OPENMS_WINDOWSPLATFORM
739
if (db_name_contains_space)
741
// for some reason QProcess doesn't handle escaped " in arguments properly so we use a system call
742
// see http://www.ncbi.nlm.nih.gov/books/NBK1763/ for the format the ncbi library is expecting internally if spaces are in file/path names
743
String call_string = omssa_executable + " " + parameters.concatenate(" ");
744
writeDebug_(call_string, 5);
745
status = system(call_string.c_str());
748
writeDebug_(omssa_executable + " " + parameters.concatenate(" "), 5);
749
status = QProcess::execute(omssa_executable.toQString(), qparam);
752
writeDebug_(omssa_executable + " " + parameters.concatenate(" "), 5);
753
status = QProcess::execute(omssa_executable.toQString(), qparam);
758
writeLog_("Error: OMSSA problem! (Details can be seen in the logfile: \"" + logfile + "\")");
759
writeLog_("Note: This message can also be triggered if you run out of space in your tmp directory");
760
if (getIntOption_("debug") <= 1)
762
QFile(unique_input_name.toQString()).remove();
763
QFile(unique_output_name.toQString()).remove();
765
if (!user_mods.empty())
767
QFile(unique_usermod_name.toQString()).remove();
769
return EXTERNAL_PROGRAM_ERROR;
773
writeDebug_(String("Reading output of OMSSA from ") + unique_output_name, 10);
774
OMSSAXMLFile omssa_out_file;
775
omssa_out_file.setModificationDefinitionsSet(mod_set);
776
omssa_out_file.load(unique_output_name, protein_identification, peptide_ids);
778
// OMSSA does not write fixed modifications so we need to add them to the sequences
779
set<String> fixed_mod_names = mod_set.getFixedModificationNames();
780
vector<String> fixed_nterm_mods, fixed_cterm_mods;
781
Map<String, String> fixed_residue_mods;
782
writeDebug_("Splitting modification into N-Term, C-Term and anywhere specificity", 1);
783
for (set<String>::const_iterator it = fixed_mod_names.begin(); it != fixed_mod_names.end(); ++it)
785
ResidueModification::Term_Specificity ts = ModificationsDB::getInstance()->getModification(*it).getTermSpecificity();
786
if (ts == ResidueModification::ANYWHERE)
788
fixed_residue_mods[ModificationsDB::getInstance()->getModification(*it).getOrigin()] = *it;
790
if (ts == ResidueModification::C_TERM)
792
fixed_cterm_mods.push_back(*it);
794
if (ts == ResidueModification::N_TERM)
796
fixed_nterm_mods.push_back(*it);
799
writeDebug_("Assigning modifications to peptides", 1);
800
for (vector<PeptideIdentification>::iterator it = peptide_ids.begin(); it != peptide_ids.end(); ++it)
802
vector<PeptideHit> hits = it->getHits();
803
for (vector<PeptideHit>::iterator pit = hits.begin(); pit != hits.end(); ++pit)
805
AASequence seq = pit->getSequence();
806
for (vector<String>::const_iterator mit = fixed_nterm_mods.begin(); mit != fixed_nterm_mods.end(); ++mit)
808
seq.setNTerminalModification(*mit);
810
for (vector<String>::const_iterator mit = fixed_cterm_mods.begin(); mit != fixed_cterm_mods.end(); ++mit)
812
seq.setCTerminalModification(*mit);
815
for (AASequence::Iterator mit = seq.begin(); mit != seq.end(); ++mit, ++pos)
817
if (fixed_residue_mods.has(mit->getOneLetterCode()))
819
seq.setModification(pos, fixed_residue_mods[mit->getOneLetterCode()]);
822
pit->setSequence(seq);
827
// delete temporary files
828
if (getIntOption_("debug") <= 1)
830
writeDebug_("Removing temporary files", 10);
831
QFile(unique_input_name.toQString()).remove();
832
QFile(unique_output_name.toQString()).remove();
833
if (!user_mods.empty())
835
QFile(unique_usermod_name.toQString()).remove();
839
// handle the search parameters
840
ProteinIdentification::SearchParameters search_parameters;
841
search_parameters.db = getStringOption_("database");
842
//search_parameters.db_version =
843
search_parameters.taxonomy = getStringOption_("x");
844
search_parameters.charges = "+" + String(getIntOption_("min_precursor_charge")) + "-+" + String(getIntOption_("max_precursor_charge"));
845
ProteinIdentification::PeakMassType mass_type = ProteinIdentification::MONOISOTOPIC;
847
if (getIntOption_("tom") == 1)
849
mass_type = ProteinIdentification::AVERAGE;
853
if (getIntOption_("tom") != 0)
855
writeLog_("Warning: unrecognized mass type: " + String(getIntOption_("tom")));
858
search_parameters.mass_type = mass_type;
859
search_parameters.fixed_modifications = getStringList_("fixed_modifications");
860
search_parameters.variable_modifications = getStringList_("variable_modifications");
861
ProteinIdentification::DigestionEnzyme enzyme = ProteinIdentification::TRYPSIN;
863
UInt e(getIntOption_("e"));
866
writeLog_("Warning: cannot handle enzyme: " + String(getIntOption_("e")));
869
search_parameters.enzyme = enzyme;
870
search_parameters.missed_cleavages = getIntOption_("v");
871
search_parameters.peak_mass_tolerance = getDoubleOption_("fragment_mass_tolerance");
872
search_parameters.precursor_tolerance = getDoubleOption_("precursor_mass_tolerance");
875
protein_identification.setSearchParameters(search_parameters);
876
protein_identification.setSearchEngineVersion(omssa_version);
877
protein_identification.setSearchEngine("OMSSA");
879
//-------------------------------------------------------------
881
//-------------------------------------------------------------
883
protein_identifications.push_back(protein_identification);
884
IdXMLFile().store(outputfile_name, protein_identifications, peptide_ids);
846
int main( int argc, const char** argv )
892
int main(int argc, const char ** argv)
848
TOPPOMSSAAdapter tool;
894
TOPPOMSSAAdapter tool;
850
return tool.main(argc,argv);
896
return tool.main(argc, argv);