How TraceBase Handles Data

Types of Values

TraceBase consists of three basic types of information:

  • Experimental
  • Standardized
  • Calculated

TraceBase data organization diagram

Experimental information is the metadata that describes the conditions of the experiment. It provides context for the Calculated values, without which, the results are meaningless. This data is organized hierarchically by Study, Animal, and Sample, each of which is associated with a globally unique name/identifier. I.e. Individual Samples are associated with an Animal, which is itself part of one or more Studies. A Study is a collection of Animals as defined by the researcher who uploaded them. This information is provided by the Researcher in the Samples sheet of the Study Doc. Although samples are organized in this way, data from different studies can be searched, browsed, or downloaded together in TraceBase.

Standardized data refers to Sample and Compound attributes that are kept consistent across datasets in TraceBase. Examples of these consistent data include compound names, tissue names, researcher names, and key animal attributes including diet, age, sex, and infusion information. This also includes protocols for Animal Treatments and mass spectrometry (MS). This ensures that data can be compiled, compared, and searched across different studies. Consistency is ensured by developers that review data submitted for upload. Standardized data can be added or modified during a study submission to TraceBase by researchers, but is subject to curator review to ensure consistency and a standard nomenclature throughout the database.

Calculated data (or "derived data") is generated by TraceBase from the originally uploaded data. TraceBase dynamically maintains these calculated values for constant accuracy. Calculated data can be found in three types of output: PeakData, PeakGroups, and FCirc. Some calculations rely on other calculated data (e.g. Normalized Labeling of a measured compound in a PeakGroup from one sample uses the Enrichment Fraction of the tracer compound in the last serum sample from the same animal). Calculated values are comparable across experiments. Calculated values can be affected by changes to or additions of records in the database. For example, a researcher discovers previously overlooked serum sample data after an already completed (and loaded) submission. The researcher then uploads additional data for that serum sample, and TraceBase updates the (previously missing) Fcirc value (and any other calculated values related to the new data).

Raw Versus Curated Data

TraceBase provides the option to upload raw mass spectrometry files (e.g. .raw and .mzXML files) for each Sample. Importantly, TraceBase does not directly associate the curated data (PeakData or PeakGroups) with the exact .mzXML file from which it came. Instead, TraceBase associates the mass spectrometry file(s) with the Sample (which is a description of the biological entity, like the quadricep from animal 123). The exact relationship within the database is more complex but beyond the scope of this description.

Practically, you can download mzXML files associated with the Samples found in an Advanced Search of PeakData or PeakGroups. This download is uniformly organized based on the sequence and the information extracted from the mzXML file (e.g. polarity and scan range).

Searching for Raw Data Files

mzXML files from which curated peaks were derived can be found using the Advanced Search. In the rare case, when a Sample was uploaded without any curated data, the mzXML exists in TraceBase, but cannot be found using the Advanced Search page. However, any mzXML can be found on the Archive Files page.

Support for improved access to all mzXML files regardless of the existence of curated peaks will be added in the future.