ROC data when parsing pepXML #7

Owen-Duncan · 2017-10-25T11:27:41Z

Hi, msftbx has been great, I've started using it extensively in an analysis pipeline. When parsing pepXML i'd like to retrieve the roc_data_point entries to determine FDRs at given probabilities. When i parse pepXML to an msmsPipelineAnalysis type the roc data doesn't seem to be present, though RocErrorData types are in the library. Using interprophet analysis on TPP 5.0.

chhh · 2017-10-26T00:38:19Z

@Owen-Duncan I've looked into this, and here's what I've found.
PepXml schema doesn't specify where elements such as peptideprophet_summary should go, i.e. inside which elements they can be contained. However, it does provide a description of what peptideprophet_summary is, that's why you see RocData... and friends in MSFTBX.

What this means, is that there's no way for the automatic parser to know where to expect peptideprophet_summary, so it just never parses it by itself. BUT, you can still point a parser manually to the block of xml and parse it, I'm providing a code snippet below that will print all ROC info from a file.

// prepare the input stream
final XMLStreamReader xsr = JaxbUtils.createXmlStreamReader(p, false);
// advance the input stream to the beginning of <peptideprophet_summary>
final boolean foundPepProphSummary = XmlUtils.advanceReaderToNext(xsr, "peptideprophet_summary");
if (!foundPepProphSummary)
    throw new IllegalStateException("Could not advance the reader to the beginning of a peptideprophet_summary tag.");

// unmarshal
final PeptideprophetSummary ps = JaxbUtils.unmarshal(PeptideprophetSummary.class, xsr);

Make sure you're using MSFTBX v1.6.1 (it's on Maven Central now), there were a few fixes introduced.

I know this is waaay suboptimal, but I never noticed the issue as nobody ever needed to access that portion of the file. Too bad that the pepxml xsd schema is flawed. Here's a complete example:

public static void main(String[] args) throws Exception {

        // input file
        String pathIn = args[0];
        Path p = Paths.get(pathIn).toAbsolutePath();
        if (!Files.exists(p))
            throw new IllegalArgumentException("File doesn't exist: " + p.toString());

        //////////////////////////////////
        //
        //      Relevant part start
        //
        //////////////////////////////////

        // prepare the input stream
        final XMLStreamReader xsr = JaxbUtils.createXmlStreamReader(p, false);
        // advance the input stream to the beginning of <peptideprophet_summary>
        final boolean foundPepProphSummary = XmlUtils.advanceReaderToNext(xsr, "peptideprophet_summary");
        if (!foundPepProphSummary)
            throw new IllegalStateException("Could not advance the reader to the beginning of a peptideprophet_summary tag.");

        // unmarshal
        final PeptideprophetSummary ps = JaxbUtils.unmarshal(PeptideprophetSummary.class, xsr);

        //////////////////////////////////
        //
        //      Relevant part end
        //
        //////////////////////////////////

        // use the unmarshalled object
        StringBuilder sb = new StringBuilder();
        sb.append("Input files:");
        for (InputFileType inputFile : ps.getInputfile()) {
            sb.append("\n\t").append(inputFile.getName());
            if (!StringUtils.isNullOrWhitespace(inputFile.getDirectory()))
                sb.append(" @ ").append(inputFile.getDirectory());
        }
        for (RocErrorDataType rocErrorData : ps.getRocErrorData()) {
            sb.append("\n");
            sb.append(String.format("ROC Error data (charge '%s'): \n", rocErrorData.getCharge()));
            // roc_data_points
            for (RocDataPoint rocDataPoint : rocErrorData.getRocDataPoint()) {
                sb.append(String.format("ROC min_prob=\"%.3f\" sensitivity=\"%.3f\" error=\"%.3f\" " +
                                "num_corr=\"%d\" num_incorr=\"%d\"\n",
                        rocDataPoint.getMinProb(), rocDataPoint.getSensitivity(), rocDataPoint.getError(),
                        rocDataPoint.getNumCorr(), rocDataPoint.getNumIncorr()));
            }
            // error_points
            for (ErrorPoint errroPoint : rocErrorData.getErrorPoint()) {
                sb.append(String.format("ERR error=\"%.3f\" min_prob=\"%.3f\" num_corr=\"%d\" num_incorr=\"%d\"\n",
                        errroPoint.getError(), errroPoint.getMinProb(), errroPoint.getNumCorr(), errroPoint.getNumIncorr()));
            }
        }

        System.out.println(sb.toString());
    }

Owen-Duncan · 2017-10-31T03:33:32Z

Thank you! that worked perfectly.

for anyone following i needed to make two modifications to the code;

XmlUtils.advanceReaderToNextRunSummary

and

JaxbUtils.unmarshall

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import javax.xml.stream.XMLStreamReader;


public class JAXBPEPXMLFDR {
    public static void main(String[] args) throws Exception{
        // input file
        String pathIn = args[0];
        Path p = Paths.get(pathIn).toAbsolutePath();
        if (!Files.exists(p))
            throw new IllegalArgumentException("File doesn't exist: " + p.toString());
        //prepare input stream
        final XMLStreamReader xsr = JaxbUtils.createXmlStreamReader(p, false);
        //advance reader to begining of <roc_error_data>
        final boolean foundPepProphSummary = XmlUtils.advanceReaderToNextRunSummary(xsr, "interprophet_summary");
        final InterprophetSummary ps = JaxbUtils.unmarshall(InterprophetSummary.class, xsr);
        // use the unmarshalled object
        StringBuilder sb = new StringBuilder();
        sb.append("Input files:");
        for (InputFileType inputFile : ps.getInputfile()) {
            sb.append("\n\t").append(inputFile.getName());
            if (!StringUtils.isNullOrWhitespace(inputFile.getDirectory()))
                sb.append(" @ ").append(inputFile.getDirectory());
        }
        for (RocErrorDataType rocErrorData : ps.getRocErrorData()) {
            sb.append("\n");
            sb.append(String.format("ROC Error data (charge '%s'): \n", rocErrorData.getCharge()));
            // roc_data_points
            for (RocDataPoint rocDataPoint : rocErrorData.getRocDataPoint()) {
                sb.append(String.format("ROC min_prob=\"%.3f\" sensitivity=\"%.3f\" error=\"%.3f\" " +
                                "num_corr=\"%d\" num_incorr=\"%d\"\n",
                        rocDataPoint.getMinProb(), rocDataPoint.getSensitivity(), rocDataPoint.getError(),
                        rocDataPoint.getNumCorr(), rocDataPoint.getNumIncorr()));
            }
            // error_points
            for (ErrorPoint errroPoint : rocErrorData.getErrorPoint()) {
                sb.append(String.format("ERR error=\"%.3f\" min_prob=\"%.3f\" num_corr=\"%d\" num_incorr=\"%d\"\n",
                        errroPoint.getError(), errroPoint.getMinProb(), errroPoint.getNumCorr(), errroPoint.getNumIncorr()));
            }
        }
        System.out.println(sb.toString());
    }
}

chhh · 2017-10-31T18:58:56Z

@Owen-Duncan in 1.6.1 I changed the names of those methods to better reflect what they're doing. Glad it's working for you.

chhh added the wiki Informative questions with answers that might help with lib usage label Jul 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROC data when parsing pepXML #7

ROC data when parsing pepXML #7

Owen-Duncan commented Oct 25, 2017

chhh commented Oct 26, 2017 •

edited

Loading

Owen-Duncan commented Oct 31, 2017 •

edited by chhh

Loading

chhh commented Oct 31, 2017

ROC data when parsing pepXML #7

ROC data when parsing pepXML #7

Comments

Owen-Duncan commented Oct 25, 2017

chhh commented Oct 26, 2017 • edited Loading

Owen-Duncan commented Oct 31, 2017 • edited by chhh Loading

chhh commented Oct 31, 2017

chhh commented Oct 26, 2017 •

edited

Loading

Owen-Duncan commented Oct 31, 2017 •

edited by chhh

Loading