How to Fill In the Study Doc
The Researcher uploading data supplies sample information into a submission template (a.k.a. the "Study Doc") to describe the samples to be included in TraceBase.
Sample information in TraceBase is kept consistent with existing data in TraceBase. This document describes in detail where sample information should be stored and how it should be formatted. It also describes what happens if you have new sample information (i.e. new Diet, Compound, etc).
General Tips
-
When in doubt, wing it
If you are unsure of how to label something and the Study Doc's header comment seems ambiguous, enter whatever you think fits, and leave it for the validation step to see if you get any errors. If you get an error during validation that still doesn't clarify what is needed, indicate what you are unsure about in the final google submission form. A curator will check the data and help you work out what is needed.
-
Fill in the sheets from left to right
Each sheet can reference one or more other sheets, usually by the other sheet's first column.
Example: The
Samplessheet'sAnimalcolumn references theNamecolumn of theAnimalssheet.Those referencing columns, unless they were automatically filled from the Start tab, will have drop-downs that are populated by the contents of the referenced sheet, but those drop-downs will be empty if the referenced sheet has not yet been filled in. Thus, the order in which you fill in the sheets in the Study Doc affects how easy it is to fill in other sheets.
**Pro Tip: The inter-sheet referencing columns allow each sheet to stand alone. A study doc can contain any subset of the contained sheets and still be loaded by itself, as long as the data it references in any other sheet has been previously loaded.
-
If you forgot a peak annotation file or made a mistake on the Start page, start over
If during the process of filling in the Study Doc, you discover that you omitted a peak annotation file (e.g. AccuCor file) or made a mistake in your original submission, it is recommended that you start over^ and upload all peak annotations on the Start page together with the corrected information again. If you had already spent time filling in the study doc, carefully copy over data from the previous version.
This is recommended for 2 reasons.
- The Start page performs cross-peak annotation checks that the validation page does not do, the main one being checks for multiple representations of compounds in a sample.
- Due to the auto-filled inter-sheet references, adding new samples and/or compounds will likely end up causing incomplete sheets that are laborious to fix manually.
^ There is a feature planned that will allow the Start page to update an existing Study Doc, but until that is implemented, starting over is much less error-prone.
-
Fill in blue, ignore gray, and pay attention to columns that affect FCirc Calculations
Columns with blue headers are required.
Gray columns are controlled by Excel formulas. Formulas usually only extend a few rows past the auto-filled rows, so if you need more, use excel's fill-down feature. The validation process on the next page does not preserve formulas, so it can be helpful to keep the original Start page download to retrieve formulas.
While some columns are optional for loading, in order for TraceBase to display accurate FCirc Rates, make sure to fill in the optional columns mentioned in FCirc Rates to ensure that the FCirc calculations will be able to be completed.
-
Don't fill in the mzXML column in the Peak Annotation Details sheet
mzXMLfilenames can be automatically matched to the sample headers in the peak annotation (e.g. AccuCor) files. The loading code is even smart enough to handle filenames that were modified to add "pos", "neg", "scan1", "scan2", etc - which are referred to as "scan labels". This column is not automatically populated^ due to the frequent presence of emptymzXMLfiles and the impracticality of uploading those files for mapping, but the loading code works this all out on the fly. The only time you have to fillmzXMLfiles in, is when the filenames differ from the sample headers outside of the scan labels.^ There is a feature planned that will allow a user to drop an entire Study directory on the Start page to map just the
mzXMLfile names to the peak annotation file sample headers based on common parent directories, and auto-fill themzXMLcolumn in thePeak Annotation Detailssheet.
Study Doc Sheet and Column Details
Unless your study includes some novel compounds, tissues, or protocols, or didn't use the Mass Spec fields on the Upload
Start page, you will likely only need to fill in the first 5 sheets (the first 3 of which should be pretty
lightweight). Thus, the main focus of your efforts will be the Animals and Samples sheets.
Study Sheet
-
Name
A name/identifier for an "experiment" or collection of animals.
This column is used to populate drop-downs in the Study column of the Animals sheet, so fill this sheet out before filling out the Animals sheet.
-
Study Description
A long form description of the study.
Describe here, the experimental design, citations (if the data is published), or any other relevant information that a researcher might need to consider when looking at the data from this study.
Tracers Sheet
Note: Individual tracer definitions can be spread across multiple rows, depending on how many different kinds of
labeled elements they have. The thing that links the rows together for a single tracer is the value in the
Tracer Row Group column._
The tracers sheet is pre-populated with extisting TraceBase tracer entries whose compounds match the compounds extracted
from your peak annotation files, but it is not uncommon for those tracers to not include the tracers in your study, so
you may need to enter them manually. You may remove any tracer rows that are unrelated to your study, if you wish. In
doing so, you do not need to ensure that the Tracer Row Groups are sequential, but if you remove a row, make sure to
remove every row with the same Tracer Row Group.
-
Tracer Row Group
Arbitrary number that identifies every row containing a label that belongs to a tracer. Each row defines 1 label and this value links them together.
The values in this column are not loaded into the database. It is only used to populate the Tracer Name column using an excel formula. All rows having the same
Tracer Row Groupare used to build theTracer Namecolumn values. -
Compound
Primary name of the compound for which this is a tracer.
The dropdown menus in this column are populated by the
Compoundcolumn in theCompoundssheet. If the compound you need is not in the dropdown, go to theCompoundssheet to enter it, then come back to select it in this column's automatically updated drop-down menu. -
Mass Number
The sum of the number of protons and neutrons of the labeled atom, a.k.a. 'isotope', e.g. Carbon 14. The number of protons identifies the element that this tracer is an isotope of. The number of neutrons in the element equals the number of protons, but in an isotope, the number of neutrons will be less than or greater than the number of protons. Note, this differs from the 'atomic number' which indicates the number of protons only.
-
Element
The type of atom that is labeled in the tracer compound.
Select a 'Element' from the dropdowns in this column. Valid values are:
C,N,H,O,S,P.For Deuterium, use
Hand ensure theMass Numberis accurate. -
Label Count
The number of labeled atoms (M+) in the tracer compound supplied to this animal. Note that the count must be greater than or equal to the number of positions.
-
Label Positions
A comma-delimited string of integers indicating the labeled atom positions in the compound. The number of known labeled positions must be less than or equal to the
Label Count.The positions of Deuterium atoms are relative to the atoms they are covalently bonded to, (which means that the position numbers can repeat when multiple deuterium atoms are bonded to the same Carbon).
-
Tracer Name
This is a read-only column that is populated by Excel formula, representing a unique name or lab identifier of the tracer, e.g.
leucine-[13C6].The values in this column are referenced by the
Tracercolumn in theInfusatessheet.
Infusates Sheet
Note: Individual infusate definitions can be spread across multiple rows, depending on how many different tracers an
infusate has. The thing that links the rows together for a single tracer is the value in the Infusate Row Group
column.
The infusates sheet is pre-populated with extisting TraceBase infusate entries whose compounds match the compounds
extracted from your peak annotation files, but it is not uncommon for those infusates to not include the infusates in
your study, so you may need to enter them manually. You may remove any infusate rows that are unrelated to your study,
if you wish. In doing so, you do not need to ensure that the Infusate Row Groups are sequential, but if you remove a
row, make sure to remove every row with the same Infusate Row Group.
-
Infusate Row Group
Arbitrary number that identifies every row containing a tracer that belongs to a single infusate. Each row defines 1 tracer (at a particular concentration) and this value links them together.
The values in this column are not loaded into the database. It is only used to populate the
Infusate Namecolumn using an excel formula. All rows having the sameInfusate Row Groupare used to build theInfusate Namecolumn values. -
Tracer Group Name
A short name or lab identifier of refering to a group of tracer compounds, e.g
6eaas. There may be multiple infusate records with this group name, each referring to the same tracers at different concentrations.You can select a
Tracer Group Namefrom the dropdowns in this column, which contains existing values in TraceBase, or enter a new value. -
Tracer
Name of a tracer in this infusate at a specific Tracer Concentration.
Select a 'Tracer' from the dropdowns in this column. The dropdowns are populated by the
Tracer Namecolumn in theTracerssheet, so if the dropdowns are empty, add rows to theTracerssheet. -
The millimolar (mM) concentration of the tracer in a specific infusate 'recipe'.
-
Infusate Name
This is a read-only column that is populated by Excel formula, representing a unique name or lab identifier of the infusate 'recipe' containing 1 or more tracer compounds at specific concentrations.
While this column is automatically populated by excel formula, the following describes the formula output.
Individual tracer compounds will be formatted as:
compound_name-[weight element count,weight element count]example:
valine-[13C5,15N1]Mixtures of compounds will be formatted as:
tracer_group_name {tracer[conc]; tracer[conc]}example:
BCAAs {isoleucine-[13C6,15N1][23.2];leucine-[13C6,15N1][100];valine-[13C5,15N1][0.9]}Note that the concentrations in the name are limited to 3 significant figures, but the saved value is as entered.
The values in this column are referenced by the
Infusatecolumn in theAnimalssheet.
Animals Sheet
-
Animal Name
A unique identifier for the animal. See recommendations for how to name an animal in Recommended Practices for Organizing Data.
-
Age
Age in weeks when the infusion started
-
Sex
"male" or "female"
-
Genotype
Most specific genotype possible. The column in the Animals sheet will have a drop-down containing the existing options on TraceBase. You can also consult the Animals page on TraceBase, where the Genotype column also contains a select list with the current unique genotypes in TraceBase. If necessary, indicate genotype as "unknown" (e.g. if the animal is a mixed background wildtype).
-
Weight in grams of the animal at the start time of infusion.
-
Infusate
The cells in this column are entered via drop-down and the content of the drop-down is populated using the Infusates and (indirectly) Tracers sheets. Consult the notes on those sheets for details on how to add new infusates/tracers. Keep reading to understand the values in the dropdown.
The infusate values in this column are a formatted description of the infusion solution (a.k.a. cocktail) given to an animal, including a shorthand name for the included tracers, an encoded tracer that includes the compound with a description of its labels, and the Millimolar (mM) concentration of each tracer in the solution.
Consult the description of the
Infusate Namecolumn in theInfusatessheet documentation above for a format description. -
Volume of infusate solution infused (microliters (ul) per minute per gram of animal body weight).
-
Diet
Description of animal diet used. Include the manufacturer identification and short description where possible. The column in the Animals sheet will have a drop-down containing the existing options on TraceBase. You can also consult the Animals page on TraceBase, where the Diet column also contains a select list with the current unique diets in TraceBase.
-
Feeding Status
fastedfedrefed
Indicate the length of fasting/feeding in the
Treatmentssheet'sTreatment Descriptioncolumn or in theStudysheet'sDescriptioncolumn. -
Treatment
A Short, unique identifier for animal treatment protocol. Details are provided in the "Treatment Description" field on the "Treatments" sheet.
Example:
T3 in drinking waterDefault:
no treatmentNote that unique diets and feeding status are indicated elsewhere, and considered distinct from "animal treatments".
-
Study
The cells in this column are entered via drop-down and the content of the drop-down is populated using the
Namecolumn of theStudysheet.A name/identifier for the "experiment" that an animal belongs to.
If an animal belongs to multiple studies in this submission, manually enter them in one cell, delimited by semicolons (
;). Every row in the Animals sheet should represent a unique animal.
Samples Sheet
-
Sample
Unique identifier for the biological sample. Generally, the sample names should match the sample headers in the AccuCor/IsoCor files, but often, such a sample header may differ from peak annotation file to peak annotation file, due to modifications of the mzXML filename for uniqueness, or it indicate the scan's polarily or range. TraceBase removes these "scan labels" from the sample names when it automatically populates this column. The original sample header in each
peak annotation fileis preserved in thePeak Annotation Detailssheet.See Recommended Practices for Organizing Data for suggestions on how to name samples
-
Date Collected
Date sample was collected (YYYY-MM-DD).
-
Researcher Name
FIRST LAST
Researcher primarily responsible for collection of this sample.
Secondary people (PI, collaborator, etc) should be mentioned in the study description.
-
Type of tissue. A tissue can be selected via drop-down menu. If the desired tissue is not in the drop-down, enter it in the
Tissuessheet, then come back and select it in the automatically updated drop-down in this column.The list of tissues in TraceBase can also be viewed on the TraceBase site's Tissues page.
-
Minutes after the start of the infusion when the tissue was collected.
Collection Time for samples collected before the infusion should be <= 0.
-
Animal
The animal from which this sample was collected. The animals is selected via drop-down menu. If the desired animal is not in the drop-down, enter it in the
Namecolumn of theAnimalssheet, then come back and select it in the automatically updated drop-down in this column.
Sequences Sheet
-
Sequence Name
This is a read-only column that is populated by Excel formula, representing a unique name for an MS Run Sequence.
Note that an MS Run Sequence is unique to a researcher, protocol, instrument (model), and date. If a researcher performs multiple such Mass Spec Runs on the same day, this single MS Run Sequence record will represent multiple runs.
Comma-delimited string combining the values from these columns in this order:
OperatorLC Protocol NameInstrumentDate
The values in this column are referenced by the 'Default Sequence' column in the 'Peak Annotation Files' sheet and the
Sequencecolumn in thePeak Annotation Detailssheet.If you used all of the metadata fields on the Upload Start page, this column will have been automatically populated. If you edit any information in the columns controlled by the excel formula, this column will update, unless the formulas were stripped by using the downloaded file from the Upload Validate page, in which case, you may be able to copy a formula from the first enpty row and paste it into the stale valued cell.
-
Operator
FIRST LAST
Researcher who operated the Mass Spec instrument. If you used the metadata fields on the Upload Start page, this column will have been automatically populated.
Select an 'Operator' from the dropdowns in this column or enter a new researcher. If the new researcher was also the sample handler, ensure the names match.
-
LC Protocol Name
Unique laboratory-defined name of the liquid chromatography method.(e.g. polar- HILIC-25-min). If you used the metadata fields on the Upload Start page, this column will have been automatically populated.
Select an 'LC Protocol Name' from the dropdowns in this column. The dropdowns are populated by the
Namecolumn in theLC Protocolssheet, so if the dropdowns are empty, add rows to theLC Protocolssheet. -
Instrument
The model name of the mass spectrometer.
Select an instrument from the dropdowns in this column. Valid values are:
- QE
- QE2
- QEPlus
- QEHF
- Exploris240
- Exploris480
- ExplorisMX
- unknown
You may enter a new model, if necessary.
-
The date that the mass spectrometer was run.
Format:
YYYY-MM-DD -
Notes
Freeform notes on this mass spectrometer run sequence.
Peak Annotation Files Sheet
-
Peak Annotation File
Peak annotation file, e.g. AccuCor, IsoCorr, etc.
If the file will not be in the top level of the study directory, include a POSIX path (where the path delimiter is a forward slash
/) relative to the study directory.The values in this column are referenced by the
Peak Annotation File Namecolumn in thePeak Annotation Detailssheet. -
File Format
Peak annotation file format. Default: automatically detected.
Select a format from the dropdowns in this column. Valid values are:
isocorraccucorisoautocorrunicorr^
^
unicorris an internal format that the common elements of the other formats are converted into for loading. There is currently no way to save an excel file in this format, so please ignore this option. -
Default Sequence
The default Sequence to use to associate peak groups with the file they were derived from, when loading a Peak Annotation File. This default can be overridden by values supplied in the
Peak Annotation File Namecolumn in thePeak Annotation Detailssheet.Refer to the
Sequence Namecolumn in theSequencessheet for format details.Select a
Default Sequencefrom the dropdowns in this column. The dropdowns are populated by theSequence Namecolumn in theSequencessheet, so if the dropdowns are empty, add rows to theSequencessheet.
Peak Annotation Details Sheet
-
Sample Name
A sample that was injected at least once during a mass spectrometer sequence.
Select a Sample Name from the dropdowns in this column. The dropdowns are populated by the
Samplecolumn in theSamplessheet, so if the dropdowns are empty, add rows to theSamplessheet. -
Sample Data Header
Sample header from the Peak Annotation File.
Note, this column is only conditionally required with
mzXML File Name. I.e. one of these 2 columns is required. -
mzXML File Name
A file representing a subset of data extracted from the raw file (e.g. an
mzXMLfile).Note, you can load any/all
mzXML File Names for aSample Namebefore the Peak Annotation File is ready to load, in which case you can just leave this value empty.Note, this column is only conditionally required with
Sample Data Header. I.e. anmzXML File Namecan be loaded without aPeak Annotation File Namevalue. -
Peak Annotation File Name
Name of the
peak annotation file. If the sample on any given row was included in a Peak Annotation File, add the name of that file here.Select a Peak Annotation File Name from the dropdowns in this column. The dropdowns are populated by the
Peak Annotation Filecolumn in thePeak Annotation Filessheet, so if the dropdowns are empty, add rows to thePeak Annotation Filessheet. -
Sequence
The Sequence associated with the
Sample Name,Sample Data Header, and/ormzXML File Nameon this row.Refer to the
Sequence Namecolumn in theSequencessheet for format details.Select a
Sequencefrom the dropdowns in this column. The dropdowns are populated by theSequence Namecolumn in theSequencessheet, so if the dropdowns are empty, add rows to theSequencessheet. -
Skip
Whether to load data associated with this sample, e.g. a blank sample.
Enter 'skip' to skip loading of the sample and peak annotation data. The mzXML file will be saved if supplied, but it will not be associated with an MSRunSample or MSRunSequence, since the Sample record will not be created. Note that the
Sample Name,Sample Data Header, andSequencecolumns must still have a unique combo value (for file validation, even though they won't be used).Boolean:
skipor '' (i.e. empty).
Peak Group Conflicts Sheet
This sheet is hidden unless peak group conflicts were detected when the Upload Start page generated the Study Doc template. TraceBase will accept only one peak group measurement for each compound in a given sample. Sometimes a compound can show up in multiple scans (e.g. in positive and negative mode scans). If the same compound was picked for the same sample in El Maven and used to generate multiple peak annotation files, the preferred peak annotation file to represent that compound must be selected. That's the purpose this sheet serves.
-
Peak Group Conflict
Peak group name, composed of 1 or more compound synonyms, delimited by
/, e.g.citrate/isocitrate. (Note, synonym(s) may confer information about the compound that is not recorded in the compound record, such as a specific stereoisomer.)A peak group that exists in multiple peak annotation files containing common samples. Only 1 peak group may represent each compound per sample. Note that different synonymns of the same compound are treated as qualitatively different compounds (to support for example, stereo-isomers).
Note that the order and case of the compound synonyms could differ in each file.
-
Selected Peak Annotation File
TraceBase will accept only one peak group measurement for each compound in a given sample. Sometimes a compound can show up in multiple scans (e.g. in positive and negative mode scans). You must select the file containing the best representation of each compound. Using the provided drop-downs, select the peak annotation file from which this peak group should be loaded for the listed samples. That compound in the remaining files will be skipped for those samples. Note, each drop-down contains only the peak annotation files containing the peak group compound for that row.
The values in this column are referenced by the
Peak Annotation Filecolumn in thePeak Annotation Filessheet. -
Common Sample Count
The number of Common Samples among the files listed for the given peak group compound.
-
Example Samples
This column contains a sampling of the Common Samples between the files in the
Selected Peak Annotation Filedrop-down.A string of sample names delimited by
;. -
Common Samples
This column contains a sorted list of sample names that multiple peak annotation files have in common, and each measure the same peak group compound.
A string of sample names delimited by
;.
Treatments Sheet
-
Animal Treatment
Short, unique identifier for animal treatment protocol. Must match the same value in
Samplessheet. -
Treatment Description
A thorough description of an animal treatment protocol. This will be useful for searching and filtering, so use standard terms and be as complete as possible.
Any difference in treatment should be indicated by a new
Animal Treatment.Example: different doses of drug
Animal Treatment Treatment Description no treatment No treatment was applied. Animal was housed at room temperature with a normal light cycle. T3 in drinking water T3 was provided in drinking water at 0.5 mg/L for two weeks prior to infusion. T3 in drinking water (1.5 mg/L) T3 was provided in drinking water at 1.5 mg/L for two weeks prior to infusion.
Tissues Sheet
-
Tissue
Short identifier used by TraceBase. Use the most specific identifier applicable to your samples. If your data contains a tissue not already listed here, create a new row.
-
Description
Long form description of TraceBase tissue.
Compounds Sheet
Note that the Upload Start page will add the Compound and Formula, as extracted from the peak annotation files
if the compound name does not exist in TraceBase as either a primary compound name or synonym. It will also add all^
complete compound records with matching formulas so that you can check if any novel compound name from the
peak annotation files should be a synonym of an existing compound._
^ "all" existing compounds from TraceBase are added to the Compounds sheet, based on the Formula with one
caveat: The formulas in peak annotation files often represent the ionized version of the compound, in which case,
an existing compound in TraceBase may not be included in the Compounds sheet because its formula is not an exact
match. In this case, after adding an HMDB ID, you will encounter a duplicate record error. Unfortunately, the
only current way to resolve this is to consult the TraceBase site to copy over the record to the Compounds sheet.
-
Compound
A unique compound name that is commonly used in the laboratory (e.g.
glucose,C16:0, etc.).The values in this column are referenced by the
Compoundcolumn in theTracerssheet. -
HMDB ID
A unique identifier for this compound in the Human Metabolome Database.
-
Formula
The molecular formula of the compound (e.g.
C6H12O6,C16H32O2, etc.). -
Synonyms
A semicolon-delimited list of unique synonymous names for a compound that is commonly used within the laboratory. (e.g.
palmitic acid,hexadecanoic acid,C16, andpalmitatemight also be synonyms forC16:0).
LC Protocols Sheet
-
LC Protocol
This is a read-only column that is populated by Excel formula, representing a unique laboratory-defined name for a liquid chromatography method that also indicates the run length. E.g.
polar- HILIC-25-minWhile this column is automatically populated by Excel formula, the following describes the formula output, if you wish to manually enter it.
E.g.
'LC Protocol'-'Run Length'-minThe values in this column are referenced by the
Sequence Namecolumn in theSequencessheet. -
Run Length
Time duration to complete a sample run through the liquid chromatography method.
Units:
minutes. Example:25Select a
Run Lengthfrom the dropdowns in this column or enter a new value. -
Description
Unique full-text description of the liquid chromatography method.