Quantitative LC-MS use case

Comparisons of multiple samples by LC-MS is a proteomics workflow that should be supported. The general setup is that samples are separated in several dimensions using 1D gels, free flow electrophoresis, isoelectric focusing, or LC, before or after digestion with trypsin. The final step is LC separation and direct MS analysis. The LC-MS raw data files are then analysed with a software to extract peaks, and multiple sample files are aligned to produce a matrix with aligned peaks and their intensities. The process can be divided into:

1) Conversion of raw data files into standard format (e.g. mzXML or mzData). This is normally performed on the instrument computer

2) Analysis of the LC-MS files with a software to extract peaks. This generates a tabular report file per sample with apex retention time, m/z, charge, max intensity, total intensity etc. A software that could be used is msInspect

3) Alignment of multiple files to generate matrix for comparisons of multiple files. This is to compensate from the fact that retention times vary in a non-linear fashion between acquisitions. m/z also varies, but to a lower extent. We're supposed to work with high mass accuracy data.

4) Statistical analysis, similar to what is performed for mRNA expression analysis or for gel spot volume comparisons

5) Identification of relevant peaks discovered in 4. If MS/MS was performed in parallel to MS at the initial data collection, it can be aligned to give protein identity to the peaks. Otherwise directed MS/MS on interesting peak can be performed if there is sample remaining.

The role of Proteios

Proteios should keep links to raw data/mzXML files that are generated in step (1) as URIs. From a LIMS perspective it would here be good to have information about sample origin and from what fraction or gel slice the raw data files are generated.
It should be possible to launch the peak extraction software (2) from Proteios, and the resulting files should be stored on the Proteios server file system. Java Web Start could be used to open file viewer and perform the peak extraction. The peak extraction could alternatively be performed on the Proteios server. msInspect would support both modes.
The alignment process (3) should be possible to perform using plugins in Proteios. There are a few R programs available for this.
The statistical analysis (4) should be possible to perform as in BASE for multiple samples.
Finally, interesting peaks should be possible to align to peptide identifications from separate Mascot result files. If there is no peptide identification info, one could imagine exporting inclusion lists with m/z and retention times to use for directed MS/MS on the instrument. The search results could then be imported and the peaks from the multiple alignment annotated using the search results.