wiki:OverviewOfGeneratedPrideXmlFile

Overview of the Generated PRIDE XML Export File

Introduction

Here is found an overview of what data from the Hits Report table are exported to the generated PRIDE XML export file, and where it is found in the latter.

Terms used

PRIDE
An acronym for "PRoteomics IDEntifications database".
FDR
"False Discovery Rate", (Number of False Discoveries / Number of Discoveries).
CV
"Controlled Vocabulary", notations defined in the used ontology.
Gel-based PRIDE XML Export
Here: Export with specified gel external id.
Non-Gel-based PRIDE XML Export
Here: Export with specified local sample id instead of gel external id.
First-score protein hit
For gel-based export, the first processed protein hit with a specific score value (or E-value, depending on search engine used). For non-gel export, a protein hit with score type "Proteios Protein" and that is primary combined.
Indistinguishable protein hit
For gel-based export, a processed protein hit with the same score value (or E-value, depending on search engine used) as the first-score protein. For non-gel export, a protein for a combined hit whose primary hit is the first score protein hit.

Overview of XML Blocks in the Generated PRIDE XML File

  1. "ExperimentCollection". Standard outer XML block.
    1. "Experiment". Standard outer XML block.
      1. "Title". A general title string with time stamp of the type <Title>PRIDE 2.1 XML file generated by Proteios SE 2.5.0 build 3074 (2009-02-03 09:08:09)</Title>.
      2. "ShortLabel". A string, here identical to the "Title" string.
      3. "Protocol". A protocol XML block.
        1. If a PRIDE Protocol XML file was specified, the protocol XML block is copied to the exported PRIDE XML file. Otherwise a simple inner tag <ProtocolName>No Name</ProtocolName> is added.
      4. "mzData". An mzData XML block copied from the mzData file in the Hits Report table (one PRIDE XML export file is generated for each mzData file for the selected hits).
      5. One XML block for each first score protein hit. For gel-based export, a "TwoDimensionalIdentification" XML block, otherwise a "GelFreeIdentification" XML block.
        1. "Accession". An XML tag with the external id of the first score protein hit.
        2. "Database". An XML tag with information on the search engine database, inferred from a SpectrumSearch database query, or from the accession number.
        3. XML block(s) for each indistinguishable protein hit related to first score protein.
          1. "additional". An "additional" XML block with information on the indistinguishable protein hit.
            1. A cvParam tag with information on the external id of the indistinguishable protein hit, e.g. <cvParam cvLabel="PRIDE" accession="PRIDE:0000098" name="Indistinguishable alternative protein accession" value="IPI00008670.2" />.
            2. A userParam tag with the name of the indistinguishable protein hit, taken from the hit description.
          2. "additional" (Optional). An optional "additional" XML block if the indistinguishable protein hit is primary combined.
            1. "Score". A "Score" XML tag with the value of the combined FDR for the indistinguishable protein hit.
            2. A cvParam or userParam tag with information on the score or E-value of the indistinguishable protein hit, depending on the search engine used.
        4. XML block(s) for each peptide hit with same external id as first score protein, and that is primary combined in case of non-gel based export.
          1. "PeptideItem". A "PeptideItem" XML block with information on the peptide hit.
            1. "Sequence". A "Sequence" XML tag with the peptide sequence in one-letter amino acid code from (the first part of) the hit description, e.g. <Sequence>LLEGEEQR</Sequence>.
            2. "SpectrumReference". A "SpectrumReference" XML tag with the spectrum id of the peptide hit, e.g. <SpectrumReference>11</SpectrumReference>.
            3. "additional". An "additional" XML block with information on the score/E-value of the peptide hit.
              1. A cvParam tag with information on the score or E-value of the peptide hit, depending on the search engine used, e.g. <cvParam cvLabel="PRIDE" accession="PRIDE:0000069" name="Mascot score" value="2.57" />.
              2. If data exists for the peptide item for other search engines, one cvParam tag with information on the score or E-value of the peptide hit for each other serach engine.
              3. A cvParam tag with information obtained from a SpectrumSearch query on a fixed modification if the peptide sequence contains the modified amino acid, e.g. <cvParam cvLabel="PRIDE" accession="PRIDE:0000072" name="Fixed modification setting" value="Carbamidomethyl (C)" />.
              4. A userParam tag with information obtained from a SpectrumSearch query on a fixed modification if the peptide sequence contains the modified amino acid, e.g. <userParam name="Fixed modification setting" value="Name = 'Carbamidomethyl (C)', Terminal specificity = none, Amino acid specificity = C" />.
              5. A cvParam tag with information obtained from a SpectrumSearch query on a variable modification for the search results file, e.g. <cvParam cvLabel="PRIDE" accession="PRIDE:0000073" name="Variable modification setting" value="Oxidation (M)" />.
              6. A userParam tag with information obtained from a SpectrumSearch query on a variable modification for the search results file, e.g. <userParam name="Variable modification setting" value="Name = 'Oxidation (M)', Terminal specificity = none, Amino acid specificity = M" />.
              7. A cvParam tag with information obtained from the hit description on a variable modification, e.g. <cvParam cvLabel="PRIDE" accession="MOD:00719" name="oxidation to L-methionine sulfoxide" />. If the multiplicity of the variable modification is larger than one, the same number of (identical) cvParam tags will be written as the value of the multiplicity.
              8. A userParam tag with information obtained from the hit description on a variable modification, e.g. <userParam name="Mascot modification string" value="Oxidation (M)" />, or <userParam name="Mascot modification string" value="3 Oxidation (M)" /> (multiplicity = 3).
        5. "additional" (Optional). An optional "additional" XML block if the first score protein hit is primary combined.
          1. A cvParam or userParam tag with information on the score or E-value of the first score protein hit, depending on the search engine used.
          2. A userParam tag with the name of the first score protein hit, taken from the hit description.
        6. "Score". A "Score" XML tag with the value of the combined FDR for the first score protein.
        7. "SearchEngine". A "SearchEngine" XML tag with information on the used search engine obtained from the score type of the first score protein hit, e.g. <SearchEngine>Mascot protein score</SearchEngine>. If the value of the score type equals "Proteios Protein", the search engines used will be reported, e.g. <SearchEngine>Proteios SE, combination of k-score, Mascot and Tandem</SearchEngine>.
        8. "Gel" (Optional). In case of gel-based export, a "Gel" XML block with information on the gel.
          1. "GelLink". A "GelLink" XML tag, currently empty.
        9. "GelLocation" (Optional). In case of gel-based export, a "GelLocation" XML block with information on the gel.
          1. "XCoordinate". An "XCoordinate" XML tag with the SpotXPixel value for the first score protein hit, e.g. <XCoordinate>1645.0</XCoordinate>.
          2. "YCoordinate". A "YCoordinate" XML tag with the SpotYPixel value for the first score protein hit, e.g. <YCoordinate>251.0</YCoordinate>.
        10. "MolecularWeight". A "MolecularWeight" XML tag with the molecular weight for the first score protein hit in Daltons, e.g. <MolecularWeight>39031.0</MolecularWeight>.
        11. "pI". A "pI" XML tag with the pI value for the first score protein hit, e.g. <pI>5.08</pI>.
      6. "additional". An "additional" XML block with information on the software used to generate the PRIDE XML file.
        1. A cvParam tag with information on the software used to generate the PRIDE XML file, e.g. <cvParam cvLabel="PRIDE" accession="PRIDE:0000175" name="XML generation software" value="Proteios SE 2.5.0 build 3074" />.