Secrets of the universe

The secrets of the bio/chemical universe are written in high-resolution mass spec data. Tomorrow’s diagnostics and treatments lie there waiting to be deciphered, but our best analytical tools remain too fragmented to work together. Too complex to learn quickly. Too messy to produce the same answer every time. And so we live with workflow frustration and poor reproducibility. We should expect more in this age of search and AI.

Introducing the MS Orchestra Graph™

The MSO Graph™ is a next-generation data analysis platform that functions as a multidimensional plugin to MZmine. curated* MoNA. MetFrag. SIRIUS. Spec2Vec.
It unites all those tools to streamline nontargeted analyses (NTA) and accelerate compound ID.

You’ve got MS² spectra for your unidentified compound of interest. What’s that compound? Which spectral library to search? MoNA? MetFrag? SIRIUS? GNPS? Learning and searching all those platforms is a slog. Not searching them seems irresponsible. Reconciling all their search results is nearly impossible. This sort of friction and compromise is nonsensical in the 21st century.

Unify & harmonize

In the MSO Graph™ you can search MetFrag, SIRIUS, and newly curated versions of MoNA all at once. Query empirical reference libraries along with a vast universe of theoretical spectra. The curation engine in the MSO Graph™ has corrected, integrated, and expanded metadata to enable powerful new search logic. Then the platform deconvolves and recodes search results to enable reproducible comparisons across analyses, experiments, and lab groups (article). The time has come to unify and harmonize.

State of the union for molecular annotation

Three perspectives on the problem.

① ASMS task force laments trouble in Annotationville.

ASMS Special Task Force on Data Analysis in Metabolomics

The American Society for Mass Spectrometry (ASMS) supports discussion about data analysis challenges in the realm of Metabolomics and other nontargeted application domains. On page 1 of this workshop summary, Erin Baker and Gary Patti report that metabolomics deploys a most promising discovery paradigm but:
① “The ability to process and interpret data is still severely limited.”
② “The informatic burden can require days, weeks, or months.”
③ “Even after extensive data analysis, there may be large numbers of ‘unknowns’ that cannot be characterized.”
④ “These obstacles are preventing the full potential of metabolomics from being realized.”

The challenges with data analysis are considerable and chronic. The MS Orchestra Graph™ unsticks some of these digital logjams to accelerate discovery science around the world.

② Three labs analyzed the same apple juice, couldn't agree on composition. We have a reproducibility problem.

Leaders at the Periodic Table of Foods Initiative sent 3 aliquots of the same apple juice to 3 mass spec labs and asked a simple question: What’s in there? What’s the molecular composition of this juice? Altogether those 3 groups found 927 compounds, but agreement among labs was abysmal: Fully 92% of those compounds were only found by one lab, and only 1.5% (14 compounds) were found by all three.

Three labs (A, B, C) produced seemingly incongruous lists of compounds from a sample of apple juice. Graph from Steve Watkins’ presentation to the BP4NTA working group.

This tangle illustrates the existence of a chronic problem, but it leaves us guessing about what that problem might be… Do those results represent disagreement about actual molecular composition? Separations? Measurement? Data processing? A scientific discipline mired in this level of ambiguity has to clean some things up. The MS Orchestra Graph™ implements a new standardized framework to realize and reconcile results across experiments, labs, and continents. Read more.

③ Chat GPT on the challenging annotation landscape.

Hey Chat GPT, in the world of nontargeted analysis with mass spectrometry, I notice people complaining about complexity and confusion around ANNOTATING unknown molecular entities that emerge as compounds-of-interest during an experiment. How would you describe the state of the union in that important research space?

Chat GPT:
You’ve put your finger on one of the thorniest issues in nontargeted analysis (NTA) with mass spectrometry: what to do once a “feature of interest” pops out of the data? My assessment of the general state of affairs is that the field has made tremendous progress in detection and computational prioritization, but the bottleneck is still in reliable annotation and identification. Here’s a breakdown of the landscape:

1. Explosion of Features, Few Confirmed Structures

High-resolution MS can pick up tens of thousands of features in a single run.
Only a small fraction can realistically be annotated or identified, given current spectral libraries and computational tools.
Many interesting features remain “known unknowns” (probable compound classes) or “unknown unknowns” (no plausible matches).

2. Annotation Levels and Standards

The Metabolomics Standards Initiative (MSI) and later efforts (like Schymanski’s levels of confidence framework) tried to impose some order, defining levels from “exact structure confirmed with reference standard” down to “exact mass only.”
Despite that, reporting remains uneven: some papers overstate certainty, others bury useful but low-confidence information.
This creates frustration and mistrust in cross-study comparisons.

3. Fragmentation Spectra & Computational Matching

MS/MS fragmentation is key, but libraries are sparse compared to the chemical universe.
Public repositories like GNPS, MassBank, and mzCloud are growing but still cover only a fraction of relevant chemical space.
Computational tools (SIRIUS, CSI:FingerID, MetFrag, MS-FINDER, etc.) are powerful, but their outputs are probabilistic and often yield long candidate lists.
Users sometimes misinterpret scores as ground truth, which feeds confusion.

4. Complexity from Matrix Effects & Isomers

Co-elution, in-source fragments, adducts, and isotopologues all add “phantom” features that complicate annotation.
Even when you’re sure you have the right formula, distinguishing structural isomers without a reference standard is often impossible.

5. Community Practices and Culture

Data sharing is improving, especially through repositories like MetaboLights and GNPS, but metadata quality is uneven.
Collaboration between labs is hindered because annotation pipelines vary widely.
There’s an ongoing tension between the desire for “big-picture” exploratory insights and the scientific need for rigor in structural claims.

6. Emerging Directions

In silico spectral prediction (using machine learning and quantum chemistry) is bridging the library gap.
Molecular networking and other contextualization tools are helping researchers cluster related unknowns.
Some groups advocate for “unknown management” practices: don’t ignore them, but document their properties systematically so future work can revisit them.
Integration with orthogonal data (NMR, ion mobility, retention time prediction) shows promise in reducing ambiguity.

📌 My overall assessment:
The field is vibrant and innovating rapidly, but annotation remains the rate-limiting step. Complexity and confusion aren’t just growing pains—they’re intrinsic to the mismatch between the vast chemical universe and our comparatively small spectral databases. The cultural challenge now is to balance humility (“we don’t know the exact structure”) with usefulness (sharing annotations at an appropriate level of confidence).

What is the MS Orchestra Graph™

Graph theory holds that, in a world full of discrete data objects, there can be troves of untapped information in the relationships between and among the data points themselves. We’ve left that valuable information on the table far too long.

The MSO Graph™ is an integration platform to unify and harmonize mass spec algorithms, reference libraries, and epiphanies in order to expand analytical horsepower. It’s a data standardization framework that allows unification of complementary components. In order to stake out a new standard of MS computing we re-imagined data formats just for mass spec — to support bigger data sets, more complex math, and faster search. We built the analytical engine in a proprietary graph computing platform and deployed that on cloud servers that support more parallel processing, and scale resources up and down to match computational loads (on-demand supercomputing). We built the front end inside a desktop application that organizes NTA workflows artfully and visualizations beautifully (it’s a customized fork of the eminently brilliant and globally beloved MZmine application). On top of all that, we developed whole new layers of MS informatics that integrate chemistry and computer science to harmonize critical operations that didn’t previously connect. The platform is designed to obviate mathematical compromise, and prioritize analytical performance. Version 0.1 is a down payment on things to come.

Wait, are we an MS Orchestra?

For decades, smart mass spectrometrists around the world have composed the algorithms, applications, and libraries that allowed the science to grow through early chapters. But, as data and expectations grew, we could see that those assets were just too scattered and disconnected to get us all the way to the promiseland: parameterless, accurate, reproducible, annotated NTA results. If first-generation tools could be unified and harmonized, they might become more useful to everyone. They might become something bigger than the sum of their parts. Like an orchestra. A worldwide MS Orchestra.

Grab your instrument.

We’re currently β testing the MSO Graph™ in order to refine server configurations and collect user feedback. Would you like to join the band of pioneers and advisors? Be among the first to evaluate the technology? Share your input with the developers? Apply here to be a β tester.

In memoriam: Kathy Peter