Ticket #430 (closed: fixed)

Opened 4 years ago

Last modified 3 years ago

Support compressed mzML

Reported by: fredrik Owned by: olle
Milestone: Proteios SE 2.4 Keywords:
Cc:

Description

Refs #394. The mzML reader should support zlib compressed spectra. An example:  http://proteowizard.svn.sourceforge.net/viewvc/*checkout*/proteowizard/trunk/pwiz/example_data/small_zlib.pwiz.mzML

Furthermore, eventually the reader could make use of the index in indexed mzML for fast spectrum access.

Change History

comment:1 Changed 4 years ago by olle

  • Status changed from new to assigned

Ticket accepted.

comment:2 Changed 4 years ago by olle

Traceability note: Previous ticket related to the mzML reader was ticket #394 (mzML reader).

comment:3 Changed 4 years ago by olle

Differences in mzML spectrum tags for uncompressed and zlib compressed spectra are described below using example mzML files.

Example of mzML spectrum tag for uncompressed spectra:

<spectrum id="S20" ...>
   ...
   <binaryDataArray arrayLength="43" ...>
      <cvParam cvLabel="MS" accession="MS:1000523" name="64-bit float" value=""/>
      <cvParam cvLabel="MS" accession="MS:1000576" name="no compression" value=""/>
      <cvParam cvLabel="MS" accession="MS:1000514" name="m/z array" value=""/>
      ...
      <binary>AAAAwN ... KCYgEA=</binary>
   </binaryDataArray>
   <binaryDataArray arrayLength="43" ...>
      <cvParam cvLabel="MS" accession="MS:1000523" name="64-bit float" value=""/>
      <cvParam cvLabel="MS" accession="MS:1000576" name="no compression" value=""/>
      <cvParam cvLabel="MS" accession="MS:1000515" name="intensity array" value=""/>
      ...
      <binary>AAAAAA ... ABgnUA=</binary>
   </binaryDataArray>
</spectrum>

Example of mzML spectrum tag for zlib compressed spectra:

<spectrum index="0" id="S1" nativeID="1" defaultArrayLength="19914" ...>
   ...
   <binaryDataArray encodedLength="54280" ...>
      <cvParam cvRef="MS" accession="MS:1000521" name="32-bit float" value=""/>
      <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
      <cvParam cvRef="MS" accession="MS:1000514" name="m/z array" value=""/>
      ...
      <binary>eJwU11 ... ybl9Do=</binary>
   </binaryDataArray>
   <binaryDataArray encodedLength="54936" ...>
      <cvParam cvRef="MS" accession="MS:1000521" name="32-bit float" value=""/>
      <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
      <cvParam cvRef="MS" accession="MS:1000515" name="intensity array" value=""/>
      ...
      <binary>eJzsvX ... 8Hv6TP</binary>
   </binaryDataArray>
</spectrum>

Byte ordering for mzML is always little endian.

Apart from other differences, the cvParam "name" property with value "no compression" is exchanged for a cvParam "name" property with value "zlib compression".

comment:4 Changed 4 years ago by olle

(In [2721]) Refs #430. Refs #394. First revision of support of zlib compressed data in the mzML reader.

  1. Class/file io/Base64Util.java in api/core/ updated:
  2. New public static method

List<Double> decode(boolean doublePrecision, boolean bigEndian, boolean zLibCompression, String dataString) with optional zlib decompression of the decoded byte array before conversion to a list of double values.

  1. Previous public static method

List<Double> decode(boolean doublePrecision, boolean bigEndian, String dataString) updated to call the new method with argument "zLibCompression" set to false, in order to avoid duplication of code.

  1. Class/file io/MzMLFileReader.java in api/core/ updated:
  2. New String instance variable "compression" with accessor methods.
  3. Private method void processStartElement(XMLStreamReader parser)

updated to set value of new "compression" String instance variable based on cvParam name property values related to the data compression.

  1. New private utility method boolean isZLibCompression() that

returns true or false depending on the value of "compression" String instance variable. Default is false.

  1. Private method List<Double> dataItem(...) updated to accept

a boolean argument indicating if zlib compression is used, List<Double> dataItem(boolean doublePrecision, boolean bigEndian, boolean zLibCompression, String dataBase64Raw). The compression flag is used when calling updated decode(...) method in class Base64Util.

  1. Private method void processEndElement(XMLStreamReader parser)

updated to call updated method List<Double> dataItem(...) with compression flag obtained by from new utility method isZLibCompression().

comment:5 Changed 4 years ago by olle

  • severity changed from 16 to 4

Severity set to 4 since the order in which data decompression and other conversions should be performed was not clear.

comment:6 Changed 4 years ago by olle

Note on the example file  small_zlib.pwiz.mzML in the ticket description.

File small_zlib.pwiz.mzML contains continuous spectra, which may take some time to display graphically. The current routine for selecting peaks to annotate with mass values is also intended for discrete spectra. The data points describing a single peak in a continuous spectra will be interpreted as many individual peaks with nearly the same mass and intensity values, with the result that none of them is considered to dominate the mass value search window used to find peaks to annotate (for more details, see Ticket #425, "Show mass values for some peaks in mass spectra"). Therefore continuous mass spectra will normally only have the most intense peak annotated with mass value.

comment:7 Changed 3 years ago by olle

Traceability note: Next ticket related to the mzML reader is ticket #450 (mzML reader should support referencable param groups).

comment:8 Changed 3 years ago by olle

(In [2792]) Refs #450. Refs #430. Refs #394. First revision of support of referenceable param groups in the mzML reader.

  1. Class/file io/MzMLFileReader.java in api/core/ updated:
  2. New private list variable

List<ReferenceableParamGroup?> referenceableParamGroupList for storing data for referenceable param groups. The elements are instances of new private inner class ReferenceableParamGroup?.

  1. Private method void processStartElement(XMLStreamReader parser)

updated to store values for referenceable param group data in a "referenceableParamGroup" XML block, and use the appropriate values if a "referenceableParamGroupRef" is encountered.

  1. Private method void processEndElement(XMLStreamReader parser)

updated to support "referenceableParamGroup" XML blocks.

comment:9 Changed 3 years ago by olle

(In [2793]) Refs #450. Refs #430. Refs #394. MzML reader updated for safer management of referenceable param groups.

  1. Class/file io/MzMLFileReader.java in api/core/ updated in

private method void processStartElement(XMLStreamReader parser) to create a new referenceableParamGroupList empty list when a "referenceableParamGroupList" XML tag is encountered.

comment:10 Changed 3 years ago by olle

(In [2802]) Refs #454. Refs #450. Refs #430. Refs #394. First revision of use of accession number property values instead of name property values in the mzML reader:

  1. Class/file io/MzMLFileReader.java in api/core/ updated in

private method void processStartElement(XMLStreamReader parser) to use "accession" instead of "name" property values for cvParam XML tags when obtaining data for array type, precision, and compression. In order to make the code more readable, comparison of obtained accession string values are made against new string constants defined for the class, with names indicating the value the accession number represents.

comment:11 Changed 3 years ago by olle

(In [2805]) Refs #454. Refs #450. Refs #430. Refs #394. Support for obtaining retention time values in minutes added in the mzML reader:

  1. Class/file io/MzMLFileReader.java in api/core/ updated in

private method void processStartElement(XMLStreamReader parser) to use "unitAccession" property values for cvParam XML tags when obtaining data for retention times ("scan time"). In order to make the code more readable, comparison of obtained accession string values are made against new string constants defined for the class, with names indicating the value the accession number represents.

comment:12 Changed 3 years ago by olle

  • Status changed from assigned to closed
  • Resolution set to fixed

Ticket closed as the desired feature has been implemented.

Note: See TracTickets for help on using tickets.