Mass Spectrometry part of the Proteios database

The database model is supposed to be as simple as possible, but should contain all the information in the mzData structure. Only peak lists are supposed to go into the database. mzData files representing raw data will not be imported. The containers in mzData that are supposed to contain cVparam or userParam are inheriting the 'AnnotableData?' class from Base2. This means that they can contain a set of parameters where each can be either free form or a controlled vocabulary parameter.

Peak List Set

This class corresponds to the mzData top element. As opposed to the current mzData, multiple dataProcessing steps are allowed.

Peak List

This corresponds to the spectrum element of mzData and Proteios1. It was renamed to clarify that we work with peak lists only. It will contain peaks instead of the intenArrayBinary and mzArrayBinary elements of mzData. Some of the most frequently used parameters are kept as attributes for faster database searches. The doublePrecision attribute is used to keep track of if the peaks are of float or double precision. (They will be saved as double, but it can be useful to know of what precision the imported data was).

Peak

The basic Peak only contain m/z and intensity. It can be expanded with an Annotation Set for information about charge state etc.

Ms Annotable Data

The idea is to let this class know the MS ontology only (if possible). Otherwise it is standard annotable data Peak list part of database


MS analysis part

Input Spectra

This class contains database links to the peak lists (entire peaklistsets and/or indiviual peaklists) that were used to make up the peak list file used as input for the search. If the file with the combined peaklists is available it is referenced.

Spectrum Search

This element represent a database search that was performed with peak lists, either a mzData ('PeakListSet?') or combined peak lists. Search parameters will be in the annotation set. If a file which represents a single complete peaklistset which is in the database was used, it will be directly referenced, and inputSpectra will not normally be used. If the peaklists have not been imported into the database, the inputSpectra can hold a reference to the file. For files which are made up from a combination of any number of individual peaklists and peaklistsets in the database, the inputSpectra class is used.

Result

A result will refer to a specific peak list (peptide mass fingerprint,PMF, or peptide fragment fingerprint, PFF) or to previous results (Protein scores derived from peptide results). This means that PMF searches will have multiple results, which all refer to the same PeakList?. A PFF search with multiple peak lists will contain many results that refer to peak lists, and sometimes results for each protein assembled from the Peptide results.

Poly Peptide

A Peptide result is used for a matched mass in a PMF search (no score in the Result) and a Protein result for the actual result with score. For PFF searches, Peptide is used for individual matches and Protein is used for the Protein level scores. Normally the sequence is given for a peptide, for proteins it is usually retrieved from an external source if the LSID can be found. The accessionNumber and LSID in a Peptide entry is the LSID of the parent molecule (Protein).

Search results part of database


This was the first version, it is kept as reference:

Attachments