How to Fill In the Study Doc
The Researcher uploading data supplies sample information into a submission template (a.k.a. the "Study Doc") to describe the samples to be included in TraceBase.
Sample information in TraceBase is kept consistent with existing data in TraceBase. This document describes in detail where sample information should be stored and how it should be formatted. It also describes what happens if you have new sample information (i.e. new Diet, Compound, etc).
General Tips
-
When in doubt, wing it
If you are unsure of how to label something and the Study Doc's header comment seems ambiguous, enter whatever you think fits, and leave it for the validation step to see if you get any errors. If you get an error during validation that still doesn't clarify what is needed, indicate what you are unsure about in the final google submission form. A curator will check the data and help you work out what is needed.
-
Fill in the sheets from left to right
Each sheet can reference one or more other sheets, usually by the other sheet's first column.
Example: The
Samples
sheet'sAnimal
column references theName
column of theAnimals
sheet.Those referencing columns, unless they were automatically filled from the Start tab, will have drop-downs that are populated by the contents of the referenced sheet, but those drop-downs will be empty if the referenced sheet has not yet been filled in. Thus, the order in which you fill in the sheets in the Study Doc affects how easy it is to fill in other sheets.
**Pro Tip: The inter-sheet referencing columns allow each sheet to stand alone. A study doc can contain any subset of the contained sheets and still be loaded by itself, as long as the data it references in any other sheet has been previously loaded.
-
If you forgot a peak annotation file or made a mistake on the Start page, start over
If during the process of filling in the Study Doc, you discover that you omitted a peak annotation file (e.g. AccuCor file) or made a mistake in your original submission, it is recommended that you start over^ and upload all peak annotations on the Start page together with the corrected information again. If you had already spent time filling in the study doc, carefully copy over data from the previous version.
This is recommended for 2 reasons.
- The Start page performs cross-peak annotation checks that the validation page does not do, the main one being checks for multiple representations of compounds in a sample.
- Due to the auto-filled inter-sheet references, adding new samples and/or compounds will likely end up causing incomplete sheets that are laborious to fix manually.
^ There is a feature planned that will allow the Start page to update an existing Study Doc, but until that is implemented, starting over is much less error-prone.
-
Fill in blue, ignore gray, and pay attention to columns that affect FCirc Calculations
Columns with blue headers are required.
Gray columns are controlled by Excel formulas. Formulas usually only extend a few rows past the auto-filled rows, so if you need more, use excel's fill-down feature. The validation process on the next page does not preserve formulas, so it can be helpful to keep the original Start page download to retrieve formulas.
While some columns are optional for loading, in order for TraceBase to display accurate FCirc Rates, make sure to fill in the optional columns mentioned in FCirc Rates to ensure that the FCirc calculations will be able to be completed.
-
Don't fill in the mzXML column in the Peak Annotation Details sheet
mzXML
filenames can be automatically matched to the sample headers in the peak annotation (e.g. AccuCor) files. The loading code is even smart enough to handle filenames that were modified to add "pos", "neg", "scan1", "scan2", etc - which are referred to as "scan labels". This column is not automatically populated^ due to the frequent presence of emptymzXML
files and the impracticality of uploading those files for mapping, but the loading code works this all out on the fly. The only time you have to fillmzXML
files in, is when the filenames differ from the sample headers outside of the scan labels.^ There is a feature planned that will allow a user to drop an entire Study directory on the Start page to map just the
mzXML
file names to the peak annotation file sample headers based on common parent directories, and auto-fill themzXML
column in thePeak Annotation Details
sheet.
Study Doc Sheet and Column Details
Unless your study includes some novel compounds, tissues, or protocols, or didn't use the Mass Spec fields on the Upload
Start page, you will likely only need to fill in the first 5 sheets (the first 3 of which should be pretty
lightweight). Thus, the main focus of your efforts will be the Animals
and Samples
sheets.
Study Sheet
-
Name
A name/identifier for an "experiment" or collection of animals.
This column is used to populate drop-downs in the Study column of the Animals sheet, so fill this sheet out before filling out the Animals sheet.
-
Study Description
A long form description of the study.
Describe here, the experimental design, citations (if the data is published), or any other relevant information that a researcher might need to consider when looking at the data from this study.
Tracers Sheet
Note: Individual tracer definitions can be spread across multiple rows, depending on how many different kinds of
labeled elements they have. The thing that links the rows together for a single tracer is the value in the
Tracer Row Group
column._
The tracers sheet is pre-populated with extisting TraceBase tracer entries whose compounds match the compounds extracted
from your peak annotation files
, but it is not uncommon for those tracers to not include the tracers in your study, so
you may need to enter them manually. You may remove any tracer rows that are unrelated to your study, if you wish. In
doing so, you do not need to ensure that the Tracer Row Group
s are sequential, but if you remove a row, make sure to
remove every row with the same Tracer Row Group
.
-
Tracer Row Group
Arbitrary number that identifies every row containing a label that belongs to a tracer. Each row defines 1 label and this value links them together.
The values in this column are not loaded into the database. It is only used to populate the Tracer Name column using an excel formula. All rows having the same
Tracer Row Group
are used to build theTracer Name
column values. -
Compound
Primary name of the compound for which this is a tracer.
The dropdown menus in this column are populated by the
Compound
column in theCompounds
sheet. If the compound you need is not in the dropdown, go to theCompounds
sheet to enter it, then come back to select it in this column's automatically updated drop-down menu. -
Mass Number
The sum of the number of protons and neutrons of the labeled atom, a.k.a. 'isotope', e.g. Carbon 14. The number of protons identifies the element that this tracer is an isotope of. The number of neutrons in the element equals the number of protons, but in an isotope, the number of neutrons will be less than or greater than the number of protons. Note, this differs from the 'atomic number' which indicates the number of protons only.
-
Element
The type of atom that is labeled in the tracer compound.
Select a 'Element' from the dropdowns in this column. Valid values are:
C
,N
,H
,O
,S
,P
.For Deuterium, use
H
and ensure theMass Number
is accurate. -
Label Count
The number of labeled atoms (M+) in the tracer compound supplied to this animal. Note that the count must be greater than or equal to the number of positions.
-
Label Positions
A comma-delimited string of integers indicating the labeled atom positions in the compound. The number of known labeled positions must be less than or equal to the
Label Count
.The positions of Deuterium atoms are relative to the atoms they are covalently bonded to, (which means that the position numbers can repeat when multiple deuterium atoms are bonded to the same Carbon).
-
Tracer Name
This is a read-only column that is populated by Excel formula, representing a unique name or lab identifier of the tracer, e.g.
leucine-[13C6]
.The values in this column are referenced by the
Tracer
column in theInfusates
sheet.
Infusates Sheet
Note: Individual infusate definitions can be spread across multiple rows, depending on how many different tracers an
infusate has. The thing that links the rows together for a single tracer is the value in the Infusate Row Group
column.
The infusates sheet is pre-populated with extisting TraceBase infusate entries whose compounds match the compounds
extracted from your peak annotation files
, but it is not uncommon for those infusates to not include the infusates in
your study, so you may need to enter them manually. You may remove any infusate rows that are unrelated to your study,
if you wish. In doing so, you do not need to ensure that the Infusate Row Group
s are sequential, but if you remove a
row, make sure to remove every row with the same Infusate Row Group
.
-
Infusate Row Group
Arbitrary number that identifies every row containing a tracer that belongs to a single infusate. Each row defines 1 tracer (at a particular concentration) and this value links them together.
The values in this column are not loaded into the database. It is only used to populate the
Infusate Name
column using an excel formula. All rows having the sameInfusate Row Group
are used to build theInfusate Name
column values. -
Tracer Group Name
A short name or lab identifier of refering to a group of tracer compounds, e.g
6eaas
. There may be multiple infusate records with this group name, each referring to the same tracers at different concentrations.You can select a
Tracer Group Name
from the dropdowns in this column, which contains existing values in TraceBase, or enter a new value. -
Tracer
Name of a tracer in this infusate at a specific Tracer Concentration.
Select a 'Tracer' from the dropdowns in this column. The dropdowns are populated by the
Tracer Name
column in theTracers
sheet, so if the dropdowns are empty, add rows to theTracers
sheet. -
Tracer Concentration
The millimolar (mM) concentration of the tracer in a specific infusate 'recipe'.
-
Infusate Name
This is a read-only column that is populated by Excel formula, representing a unique name or lab identifier of the infusate 'recipe' containing 1 or more tracer compounds at specific concentrations.
While this column is automatically populated by excel formula, the following describes the formula output.
Individual tracer compounds will be formatted as:
compound_name-[weight element count,weight element count]
example:
valine-[13C5,15N1]
Mixtures of compounds will be formatted as:
tracer_group_name {tracer[conc]; tracer[conc]}
example:
BCAAs {isoleucine-[13C6,15N1][23.2];leucine-[13C6,15N1][100];valine-[13C5,15N1][0.9]}
Note that the concentrations in the name are limited to 3 significant figures, but the saved value is as entered.
The values in this column are referenced by the
Infusate
column in theAnimals
sheet.
Animals Sheet
-
Animal Name
A unique identifier for the animal. See recommendations for how to name an animal in Recommended Practices for Organizing Data.
-
Age
Age in weeks when the infusion started
-
Sex
"male" or "female"
-
Genotype
Most specific genotype possible. The column in the Animals sheet will have a drop-down containing the existing options on TraceBase. You can also consult the Animals page on TraceBase, where the Genotype column also contains a select list with the current unique genotypes in TraceBase. If necessary, indicate genotype as "unknown" (e.g. if the animal is a mixed background wildtype).
-
Weight
Weight in grams of the animal at the start time of infusion.
-
Infusate
The cells in this column are entered via drop-down and the content of the drop-down is populated using the Infusates and (indirectly) Tracers sheets. Consult the notes on those sheets for details on how to add new infusates/tracers. Keep reading to understand the values in the dropdown.
The infusate values in this column are a formatted description of the infusion solution (a.k.a. cocktail) given to an animal, including a shorthand name for the included tracers, an encoded tracer that includes the compound with a description of its labels, and the Millimolar (mM) concentration of each tracer in the solution.
Consult the description of the
Infusate Name
column in theInfusates
sheet documentation above for a format description. -
Infusion Rate
Volume of infusate solution infused (microliters (ul) per minute per gram of animal body weight).
-
Diet
Description of animal diet used. Include the manufacturer identification and short description where possible. The column in the Animals sheet will have a drop-down containing the existing options on TraceBase. You can also consult the Animals page on TraceBase, where the Diet column also contains a select list with the current unique diets in TraceBase.
-
Feeding Status
fasted
fed
refed
Indicate the length of fasting/feeding in the
Treatments
sheet'sTreatment Description
column or in theStudy
sheet'sDescription
column. -
Treatment
A Short, unique identifier for animal treatment protocol. Details are provided in the "Treatment Description" field on the "Treatments" sheet.
Example:
T3 in drinking water
Default:
no treatment
Note that unique diets and feeding status are indicated elsewhere, and considered distinct from "animal treatments".
-
Study
The cells in this column are entered via drop-down and the content of the drop-down is populated using the
Name
column of theStudy
sheet.A name/identifier for the "experiment" that an animal belongs to.
If an animal belongs to multiple studies in this submission, manually enter them in one cell, delimited by semicolons (
;
). Every row in the Animals sheet should represent a unique animal.
Samples Sheet
-
Sample
Unique identifier for the biological sample. Generally, the sample names should match the sample headers in the AccuCor/IsoCor files, but often, such a sample header may differ from peak annotation file to peak annotation file, due to modifications of the mzXML filename for uniqueness, or it indicate the scan's polarily or range. TraceBase removes these "scan labels" from the sample names when it automatically populates this column. The original sample header in each
peak annotation file
is preserved in thePeak Annotation Details
sheet.See Recommended Practices for Organizing Data for suggestions on how to name samples
-
Date Collected
Date sample was collected (YYYY-MM-DD).
-
Researcher Name
FIRST LAST
Researcher primarily responsible for collection of this sample.
Secondary people (PI, collaborator, etc) should be mentioned in the study description.
-
Tissue
Type of tissue. A tissue can be selected via drop-down menu. If the desired tissue is not in the drop-down, enter it in the
Tissues
sheet, then come back and select it in the automatically updated drop-down in this column.The list of tissues in TraceBase can also be viewed on the TraceBase site's Tissues page.
-
Collection Time
Minutes after the start of the infusion when the tissue was collected.
Collection Time for samples collected before the infusion should be <= 0.
-
Animal
The animal from which this sample was collected. The animals is selected via drop-down menu. If the desired animal is not in the drop-down, enter it in the
Name
column of theAnimals
sheet, then come back and select it in the automatically updated drop-down in this column.
Sequences Sheet
-
Sequence Name
This is a read-only column that is populated by Excel formula, representing a unique name for an MS Run Sequence.
Note that an MS Run Sequence is unique to a researcher, protocol, instrument (model), and date. If a researcher performs multiple such Mass Spec Runs on the same day, this single MS Run Sequence record will represent multiple runs.
Comma-delimited string combining the values from these columns in this order:
Operator
LC Protocol Name
Instrument
Date
The values in this column are referenced by the 'Default Sequence' column in the 'Peak Annotation Files' sheet and the
Sequence
column in thePeak Annotation Details
sheet.If you used all of the metadata fields on the Upload Start page, this column will have been automatically populated. If you edit any information in the columns controlled by the excel formula, this column will update, unless the formulas were stripped by using the downloaded file from the Upload Validate page, in which case, you may be able to copy a formula from the first enpty row and paste it into the stale valued cell.
-
Operator
FIRST LAST
Researcher who operated the Mass Spec instrument. If you used the metadata fields on the Upload Start page, this column will have been automatically populated.
Select an 'Operator' from the dropdowns in this column or enter a new researcher. If the new researcher was also the sample handler, ensure the names match.
-
LC Protocol Name
Unique laboratory-defined name of the liquid chromatography method.(e.g. polar- HILIC-25-min). If you used the metadata fields on the Upload Start page, this column will have been automatically populated.
Select an 'LC Protocol Name' from the dropdowns in this column. The dropdowns are populated by the
Name
column in theLC Protocols
sheet, so if the dropdowns are empty, add rows to theLC Protocols
sheet. -
Instrument
The model name of the mass spectrometer.
Select an instrument from the dropdowns in this column. Valid values are:
- QE
- QE2
- QEPlus
- QEHF
- Exploris240
- Exploris480
- ExplorisMX
- unknown
You may enter a new model, if necessary.
-
Date
The date that the mass spectrometer was run.
Format:
YYYY-MM-DD
-
Notes
Freeform notes on this mass spectrometer run sequence.
Peak Annotation Files Sheet
-
Peak Annotation File
Peak annotation file, e.g. AccuCor, IsoCorr, etc.
If the file will not be in the top level of the study directory, include a POSIX path (where the path delimiter is a forward slash
/
) relative to the study directory.The values in this column are referenced by the
Peak Annotation File Name
column in thePeak Annotation Details
sheet. -
File Format
Peak annotation file format. Default: automatically detected.
Select a format from the dropdowns in this column. Valid values are:
isocorr
accucor
isoautocorr
unicorr
^
^
unicorr
is an internal format that the common elements of the other formats are converted into for loading. There is currently no way to save an excel file in this format, so please ignore this option. -
Default Sequence
The default Sequence to use to associate peak groups with the file they were derived from, when loading a Peak Annotation File. This default can be overridden by values supplied in the
Peak Annotation File Name
column in thePeak Annotation Details
sheet.Refer to the
Sequence Name
column in theSequences
sheet for format details.Select a
Default Sequence
from the dropdowns in this column. The dropdowns are populated by theSequence Name
column in theSequences
sheet, so if the dropdowns are empty, add rows to theSequences
sheet.
Peak Annotation Details Sheet
-
Sample Name
A sample that was injected at least once during a mass spectrometer sequence.
Select a Sample Name from the dropdowns in this column. The dropdowns are populated by the
Sample
column in theSamples
sheet, so if the dropdowns are empty, add rows to theSamples
sheet. -
Sample Data Header
Sample header from the Peak Annotation File.
Note, this column is only conditionally required with
mzXML File Name
. I.e. one of these 2 columns is required. -
mzXML File Name
A file representing a subset of data extracted from the raw file (e.g. an
mzXML
file).Note, you can load any/all
mzXML File Name
s for aSample Name
before the Peak Annotation File is ready to load, in which case you can just leave this value empty.Note, this column is only conditionally required with
Sample Data Header
. I.e. anmzXML File Name
can be loaded without aPeak Annotation File Name
value. -
Peak Annotation File Name
Name of the
peak annotation file
. If the sample on any given row was included in a Peak Annotation File, add the name of that file here.Select a Peak Annotation File Name from the dropdowns in this column. The dropdowns are populated by the
Peak Annotation File
column in thePeak Annotation Files
sheet, so if the dropdowns are empty, add rows to thePeak Annotation Files
sheet. -
Sequence
The Sequence associated with the
Sample Name
,Sample Data Header
, and/ormzXML File Name
on this row.Refer to the
Sequence Name
column in theSequences
sheet for format details.Select a
Sequence
from the dropdowns in this column. The dropdowns are populated by theSequence Name
column in theSequences
sheet, so if the dropdowns are empty, add rows to theSequences
sheet. -
Skip
Whether to load data associated with this sample, e.g. a blank sample.
Enter 'skip' to skip loading of the sample and peak annotation data. The mzXML file will be saved if supplied, but it will not be associated with an MSRunSample or MSRunSequence, since the Sample record will not be created. Note that the
Sample Name
,Sample Data Header
, andSequence
columns must still have a unique combo value (for file validation, even though they won't be used).Boolean:
skip
or '' (i.e. empty).
Peak Group Conflicts Sheet
This sheet is hidden unless peak group conflicts were detected when the Upload Start page generated the Study Doc template. TraceBase will accept only one peak group measurement for each compound in a given sample. Sometimes a compound can show up in multiple scans (e.g. in positive and negative mode scans). If the same compound was picked for the same sample in El Maven and used to generate multiple peak annotation files, the preferred peak annotation file to represent that compound must be selected. That's the purpose this sheet serves.
-
Peak Group Conflict
Peak group name, composed of 1 or more compound synonyms, delimited by
/
, e.g.citrate/isocitrate
. (Note, synonym(s) may confer information about the compound that is not recorded in the compound record, such as a specific stereoisomer.)A peak group that exists in multiple peak annotation files containing common samples. Only 1 peak group may represent each compound per sample. Note that different synonymns of the same compound are treated as qualitatively different compounds (to support for example, stereo-isomers).
Note that the order and case of the compound synonyms could differ in each file.
-
Selected Peak Annotation File
TraceBase will accept only one peak group measurement for each compound in a given sample. Sometimes a compound can show up in multiple scans (e.g. in positive and negative mode scans). You must select the file containing the best representation of each compound. Using the provided drop-downs, select the peak annotation file from which this peak group should be loaded for the listed samples. That compound in the remaining files will be skipped for those samples. Note, each drop-down contains only the peak annotation files containing the peak group compound for that row.
The values in this column are referenced by the
Peak Annotation File
column in thePeak Annotation Files
sheet. -
Common Sample Count
The number of Common Samples among the files listed for the given peak group compound.
-
Example Samples
This column contains a sampling of the Common Samples between the files in the
Selected Peak Annotation File
drop-down.A string of sample names delimited by
;
. -
Common Samples
This column contains a sorted list of sample names that multiple peak annotation files have in common, and each measure the same peak group compound.
A string of sample names delimited by
;
.
Treatments Sheet
-
Animal Treatment
Short, unique identifier for animal treatment protocol. Must match the same value in
Samples
sheet. -
Treatment Description
A thorough description of an animal treatment protocol. This will be useful for searching and filtering, so use standard terms and be as complete as possible.
Any difference in treatment should be indicated by a new
Animal Treatment
.Example: different doses of drug
Animal Treatment Treatment Description no treatment No treatment was applied. Animal was housed at room temperature with a normal light cycle. T3 in drinking water T3 was provided in drinking water at 0.5 mg/L for two weeks prior to infusion. T3 in drinking water (1.5 mg/L) T3 was provided in drinking water at 1.5 mg/L for two weeks prior to infusion.
Tissues Sheet
-
Tissue
Short identifier used by TraceBase. Use the most specific identifier applicable to your samples. If your data contains a tissue not already listed here, create a new row.
-
Description
Long form description of TraceBase tissue.
Compounds Sheet
Note that the Upload Start page will add the Compound
and Formula
, as extracted from the peak annotation files
if the compound name does not exist in TraceBase as either a primary compound name or synonym. It will also add all^
complete compound records with matching formulas so that you can check if any novel compound name from the
peak annotation files
should be a synonym of an existing compound._
^ "all" existing compounds from TraceBase are added to the Compounds
sheet, based on the Formula
with one
caveat: The formulas in peak annotation files often represent the ionized version of the compound, in which case,
an existing compound in TraceBase may not be included in the Compounds
sheet because its formula is not an exact
match. In this case, after adding an HMDB ID, you will encounter a duplicate record error. Unfortunately, the
only current way to resolve this is to consult the TraceBase site to copy over the record to the Compounds
sheet.
-
Compound
A unique compound name that is commonly used in the laboratory (e.g.
glucose
,C16:0
, etc.).The values in this column are referenced by the
Compound
column in theTracers
sheet. -
HMDB ID
A unique identifier for this compound in the Human Metabolome Database.
-
Formula
The molecular formula of the compound (e.g.
C6H12O6
,C16H32O2
, etc.). -
Synonyms
A semicolon-delimited list of unique synonymous names for a compound that is commonly used within the laboratory. (e.g.
palmitic acid
,hexadecanoic acid
,C16
, andpalmitate
might also be synonyms forC16:0
).
LC Protocols Sheet
-
LC Protocol
This is a read-only column that is populated by Excel formula, representing a unique laboratory-defined name for a liquid chromatography method that also indicates the run length. E.g.
polar- HILIC-25-min
While this column is automatically populated by Excel formula, the following describes the formula output, if you wish to manually enter it.
E.g.
'LC Protocol'-'Run Length'-min
The values in this column are referenced by the
Sequence Name
column in theSequences
sheet. -
Run Length
Time duration to complete a sample run through the liquid chromatography method.
Units:
minutes
. Example:25
Select a
Run Length
from the dropdowns in this column or enter a new value. -
Description
Unique full-text description of the liquid chromatography method.