~ubuntu-branches/ubuntu/wily/openms/wily

<p>Two applications has been described in the following publications: Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Statistical learning of peptide retention behavior in chromatographic separations: A new kernel-based approach for computational proteomics. BMC Bioinformatics 2007, 8:468 Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Improving Peptide Identification in Proteome Analysis by a Two-Dimensional Retention Time Filtering Approach J. Proteome Res. 2009, 8(8):4109-15</p>

<p>The predicted retention time can be used in IDFilter to filter out false identifications. Assume you have data from several identification runs. You should first align the data using MapAligner. Then you can use the various identification wrappers like MascotAdapter, OMSSAAdapter, ... to get the identifications. To train a model using RTModel you can now use IDFilter for one of the runs to get the high scoring identifications (40 to 200 distinct peptides should be enough). Then you use RTModel as described in the documentation to train a model for these spectra. With this model you can use RTPredict to predict the retention times for the remaining runs. The predicted retention times are stored in the idXML files. These predicted retention times can then be used to filter out false identifications using the IDFilter tool.</p>

<p>A typical sequence of TOPP tools would look like this: </p>

<div class="fragment"><div class="line">MapAligner -in Run1.mzML,...,Run4.mzML -out Run1_aligned.mzML,...,Run4_aligned.mzML</div>

<div class="line">MascotAdapter -in Run1_aligned.mzML -out Run1_aligned.idXML -ini Mascot.ini</div>

<div class="line">MascotAdapter -in Run2_aligned.mzML -out Run2_aligned.idXML -ini Mascot.ini</div>

<div class="line">MascotAdapter -in Run3_aligned.mzML -out Run3_aligned.idXML -ini Mascot.ini</div>

<div class="line">MascotAdapter -in Run4_aligned.mzML -out Run4_aligned.idXML -ini Mascot.ini</div>

<div class="line">IDFilter -in Run1_aligned.idXML -out Run1_best_hits.idXML -pep_fraction 1 -best_hits</div>

<div class="line">RTModel -in Run1_best_hits.idXML -out Run1.model -ini RT.ini</div>

<div class="line">RTPredict -in Run2_aligned.idXML -out Run2_predicted.idXML -svm_model Run1.model</div>

<div class="line">RTPredict -in Run3_aligned.idXML -out Run3_predicted.idXML -svm_model Run1.model</div>

<div class="line">RTPredict -in Run4_aligned.idXML -out Run4_predicted.idXML -svm_model Run1.model</div>

<div class="line">IDFilter -in Run2_predicted.mzML -out Run2_filtered.mzML -rt_filtering</div>

<div class="line">IDFilter -in Run3_predicted.mzML -out Run3_filtered.mzML -rt_filtering</div>

<div class="line">IDFilter -in Run4_predicted.mzML -out Run4_filtered.mzML -rt_filtering</div>

</div><p>If you have a file with certainly identified peptides and want to train a model for RT prediction, you can also directly use the IDs. Therefore, the file has to have one peptide sequence together with the RT per line (separated by one tab or space). This can then be loaded by RTModel using the -textfile_input flag: </p>

<div class="fragment"><div class="line">RTModel -in IDs_with_RTs.txt -out IDs_with_RTs.model -ini RT.ini -textfile_input </div>

</div><p>The likelihood of a peptide to be proteotypic can be predicted using PTModel and PTPredict. Assume we have a file PT.idXML which contains all proteotypic peptides of a set of proteins. Lets also assume, we have a fasta file containing the amino acid sequences of these proteins called mixture.fasta. To be able to train PTPredict, we need negative peptides (peptides, which are not proteotypic). Therefore, one can use the Digestor, which is located in the APPLICATIONS/UTILS/ folder together with the IDFilter:</p>

<div class="fragment"><div class="line">Digestor -in mixture.fasta -out all.idXML</div>

<div class="line">IDFilter -in all.idXML -out NonPT.idXML -exclusion_peptides_file PT.idXML </div>

</div><p>In this example the proteins are digested in silico and the non proteotypic peptides set is created by subtracting all proteotypic peptides from the set of all possible peptides. Then, one can train PTModel:</p>

<div class="fragment"><div class="line">PTModel -in_positive PT.idXML -in_negative NonPT.idXML -out PT.model -ini PT.ini</div>

</div> </div></div>

<TR>

<TD><font color="#c0c0c0">OpenMS / TOPP release 1.11.1</font></TD>

<TD align="right"><font color="#c0c0c0">Documentation generated on Thu Nov 14 2013 11:19:25 using doxygen 1.8.5</font></TD>

</TR>

</TABLE>

</BODY>

</HTML>

Older »