Different mass spectrometry modes to handle
We would like to handle a variety of different MS types and data analyses. The data model must be flexible enough to handle the different forms of data : LC / gel separation, MALDI / ESI MS MS/MS MS/MS/MS etc. Results from data analysis / database searching should be readily traceable back to the spectrum / peak list that was used. The results from database searches using external software should be automatically placed at the right position in the database. PSI mzData should be importable and exportable. mzData is in our case meant as an extended peak list format (with possibility to store raw data).
We do not want to store raw data in the database, but rather keep a reference to the file name and ideally an URI. Peak lists should be stored in the database. In mzData processed peak lists which basically are the same as PKL or DTAs can be stored. The advantage with mzData is that it also contains all necessary information about the MS acquisition and data processing from raw file to peak list. Every spectrum in the mzData is actually a peak list which can be derived from more than one scan in the raw data file. This is defined under acqSpecification with the scans listed as acquisition. If the peak list is MS/MS there is a reference to the spectrum in which the precursor ion peak was found, and the precursor mass is given. It is not clear how parent ion scans are to be described but it should work.
The bad thing with mzData for our purposes is that peak lists are stored as arrays of binaries and not as peak objects. However, if we keep our Peak elements and take away the binary arrays from our model, it is easy to convert the data at the time of import or export. There is probably no advantage of keeping binary arrays in the database, since we have the peaks stored as binary numbers anyway.
When a peak list is filtered or recalibrated it seems like the most straightforward solution is to generate a new mzData in the database where the source file is the old peaklist (mzData).
The Proteios 1 data model is probably quite ok. It would be possible to take away the PeakList? element and use the spectrum element as the peak container.
Pictures of different MS setups schematically described in our data model can be found in the attachments:
For the gel-based proteomics workflow the LC-MSMS, MALDI-MS and MALDI MS/MS workflows are applicable. For a non-gel workflow or 1D-gel workflow the LC-MS, LC-MSMS and MALDI-MSMS setups can be employed.
Attachments
- LCMS.pdf (14.6 kB) - added by gregory on 02/23/06 09:12:13.
- LCMSMS.pdf (12.4 kB) - added by gregory on 02/23/06 09:12:49.
- MALDIMS.pdf (14.1 kB) - added by gregory on 02/23/06 09:13:23.
- MALDIMSMS.pdf (12.4 kB) - added by gregory on 02/23/06 09:13:48.
