How to Fill In the Study Doc

The Researcher uploading data supplies sample information into a submission template (a.k.a. the "Study Doc") to describe the samples to be included in TraceBase.

Sample information in TraceBase is kept consistent with existing data in TraceBase. This document describes in detail where sample information should be stored and how it should be formatted. It also describes what happens if you have new sample information (i.e. new Diet, Compound, etc).

General Tips

When in doubt, wing it

If you are unsure of how to label something and the Study Doc's header comment seems ambiguous, enter whatever you think fits, and leave it for the validation step to see if you get any errors. If you get an error during validation that still doesn't clarify what is needed, indicate what you are unsure about in the final google submission form. A curator will check the data and help you work out what is needed.
Fill in the sheets from left to right

Each sheet can reference one or more other sheets, usually by the other sheet's first column.

Example: The Samples sheet's Animal column references the Name column of the Animals sheet.

Those referencing columns, unless they were automatically filled from the Start tab, will have drop-downs that are populated by the contents of the referenced sheet, but those drop-downs will be empty if the referenced sheet has not yet been filled in. Thus, the order in which you fill in the sheets in the Study Doc affects how easy it is to fill in other sheets.

**Pro Tip: The inter-sheet referencing columns allow each sheet to stand alone. A study doc can contain any subset of the contained sheets and still be loaded by itself, as long as the data it references in any other sheet has been previously loaded.
If you forgot a peak annotation file or made a mistake on the Start page, start over

If during the process of filling in the Study Doc, you discover that you omitted a peak annotation file (e.g. AccuCor file) or made a mistake in your original submission, it is recommended that you start over^ and upload all peak annotations on the Start page together with the corrected information again. If you had already spent time filling in the study doc, carefully copy over data from the previous version.

This is recommended for 2 reasons.
- The Start page performs cross-peak annotation checks that the validation page does not do, the main one being checks for multiple representations of compounds in a sample.
- Due to the auto-filled inter-sheet references, adding new samples and/or compounds will likely end up causing incomplete sheets that are laborious to fix manually.
^ There is a feature planned that will allow the Start page to update an existing Study Doc, but until that is implemented, starting over is much less error-prone.
Fill in blue, ignore gray, and pay attention to columns that affect FCirc Calculations

Columns with blue headers are required.

Gray columns are controlled by Excel formulas. Formulas usually only extend a few rows past the auto-filled rows, so if you need more, use excel's fill-down feature. The validation process on the next page does not preserve formulas, so it can be helpful to keep the original Start page download to retrieve formulas.

While some columns are optional for loading, in order for TraceBase to display accurate FCirc Rates, make sure to fill in the optional columns mentioned in FCirc Rates to ensure that the FCirc calculations will be able to be completed.
Don't fill in the mzXML column in the Peak Annotation Details sheet

mzXML filenames can be automatically matched to the sample headers in the peak annotation (e.g. AccuCor) files. The loading code is even smart enough to handle filenames that were modified to add "pos", "neg", "scan1", "scan2", etc - which are referred to as "scan labels". This column is not automatically populated^ due to the frequent presence of empty mzXML files and the impracticality of uploading those files for mapping, but the loading code works this all out on the fly. The only time you have to fill mzXML files in, is when the filenames differ from the sample headers outside of the scan labels.

^ There is a feature planned that will allow a user to drop an entire Study directory on the Start page to map just the mzXML file names to the peak annotation file sample headers based on common parent directories, and auto-fill the mzXML column in the Peak Annotation Details sheet.

Study Doc Sheet and Column Details

Unless your study includes some novel compounds, tissues, or protocols, or didn't use the Mass Spec fields on the Upload Start page, you will likely only need to fill in the first 5 sheets (the first 3 of which should be pretty lightweight). Thus, the main focus of your efforts will be the Animals and Samples sheets.

Study Sheet

Name

A name/identifier for an "experiment" or collection of animals.

This column is used to populate drop-downs in the Study column of the Animals sheet, so fill this sheet out before filling out the Animals sheet.
Study Description

A long form description of the study.

Describe here, the experimental design, citations (if the data is published), or any other relevant information that a researcher might need to consider when looking at the data from this study.

Tracers Sheet

Note: Individual tracer definitions can be spread across multiple rows, depending on how many different kinds of labeled elements they have. The thing that links the rows together for a single tracer is the value in the Tracer Row Group column._

The tracers sheet is pre-populated with extisting TraceBase tracer entries whose compounds match the compounds extracted from your peak annotation files, but it is not uncommon for those tracers to not include the tracers in your study, so you may need to enter them manually. You may remove any tracer rows that are unrelated to your study, if you wish. In doing so, you do not need to ensure that the Tracer Row Groups are sequential, but if you remove a row, make sure to remove every row with the same Tracer Row Group.

Tracer Row Group

Arbitrary number that identifies every row containing a label that belongs to a tracer. Each row defines 1 label and this value links them together.

The values in this column are not loaded into the database. It is only used to populate the Tracer Name column using an excel formula. All rows having the same Tracer Row Group are used to build the Tracer Name column values.
Compound

Primary name of the compound for which this is a tracer.

The dropdown menus in this column are populated by the Compound column in the Compounds sheet. If the compound you need is not in the dropdown, go to the Compounds sheet to enter it, then come back to select it in this column's automatically updated drop-down menu.
Mass Number

The sum of the number of protons and neutrons of the labeled atom, a.k.a. 'isotope', e.g. Carbon 14. The number of protons identifies the element that this tracer is an isotope of. The number of neutrons in the element equals the number of protons, but in an isotope, the number of neutrons will be less than or greater than the number of protons. Note, this differs from the 'atomic number' which indicates the number of protons only.
Element

The type of atom that is labeled in the tracer compound.

Select a 'Element' from the dropdowns in this column. Valid values are: C, N, H, O, S, P.

For Deuterium, use H and ensure the Mass Number is accurate.
Label Count

The number of labeled atoms (M+) in the tracer compound supplied to this animal. Note that the count must be greater than or equal to the number of positions.
Label Positions

A comma-delimited string of integers indicating the labeled atom positions in the compound. The number of known labeled positions must be less than or equal to the Label Count.

The positions of Deuterium atoms are relative to the atoms they are covalently bonded to, (which means that the position numbers can repeat when multiple deuterium atoms are bonded to the same Carbon).
Tracer Name

This is a read-only column that is populated by Excel formula, representing a unique name or lab identifier of the tracer, e.g. leucine-[13C6].

The values in this column are referenced by the Tracer column in the Infusates sheet.

Infusates Sheet

Note: Individual infusate definitions can be spread across multiple rows, depending on how many different tracers an infusate has. The thing that links the rows together for a single tracer is the value in the Infusate Row Group column.

The infusates sheet is pre-populated with extisting TraceBase infusate entries whose compounds match the compounds extracted from your peak annotation files, but it is not uncommon for those infusates to not include the infusates in your study, so you may need to enter them manually. You may remove any infusate rows that are unrelated to your study, if you wish. In doing so, you do not need to ensure that the Infusate Row Groups are sequential, but if you remove a row, make sure to remove every row with the same Infusate Row Group.

Infusate Row Group

Arbitrary number that identifies every row containing a tracer that belongs to a single infusate. Each row defines 1 tracer (at a particular concentration) and this value links them together.

The values in this column are not loaded into the database. It is only used to populate the Infusate Name column using an excel formula. All rows having the same Infusate Row Group are used to build the Infusate Name column values.
Tracer Group Name

A short name or lab identifier of refering to a group of tracer compounds, e.g 6eaas. There may be multiple infusate records with this group name, each referring to the same tracers at different concentrations.

You can select a Tracer Group Name from the dropdowns in this column, which contains existing values in TraceBase, or enter a new value.
Tracer

Name of a tracer in this infusate at a specific Tracer Concentration.

Select a 'Tracer' from the dropdowns in this column. The dropdowns are populated by the Tracer Name column in the Tracers sheet, so if the dropdowns are empty, add rows to the Tracers sheet.
Tracer Concentration

The millimolar (mM) concentration of the tracer in a specific infusate 'recipe'.
Infusate Name

This is a read-only column that is populated by Excel formula, representing a unique name or lab identifier of the infusate 'recipe' containing 1 or more tracer compounds at specific concentrations.

While this column is automatically populated by excel formula, the following describes the formula output.

Individual tracer compounds will be formatted as: compound_name-[weight element count,weight element count]

example: valine-[13C5,15N1]

Mixtures of compounds will be formatted as: tracer_group_name {tracer[conc]; tracer[conc]}

example: BCAAs {isoleucine-[13C6,15N1][23.2];leucine-[13C6,15N1][100];valine-[13C5,15N1][0.9]}

Note that the concentrations in the name are limited to 3 significant figures, but the saved value is as entered.

The values in this column are referenced by the Infusate column in the Animals sheet.

Animals Sheet

Animal Name

A unique identifier for the animal. See recommendations for how to name an animal in Recommended Practices for Organizing Data.
Age

Age in weeks when the infusion started
Sex

"male" or "female"
Genotype

Most specific genotype possible. The column in the Animals sheet will have a drop-down containing the existing options on TraceBase. You can also consult the Animals page on TraceBase, where the Genotype column also contains a select list with the current unique genotypes in TraceBase. If necessary, indicate genotype as "unknown" (e.g. if the animal is a mixed background wildtype).
Weight

Weight in grams of the animal at the start time of infusion.
Infusate

The cells in this column are entered via drop-down and the content of the drop-down is populated using the Infusates and (indirectly) Tracers sheets. Consult the notes on those sheets for details on how to add new infusates/tracers. Keep reading to understand the values in the dropdown.

The infusate values in this column are a formatted description of the infusion solution (a.k.a. cocktail) given to an animal, including a shorthand name for the included tracers, an encoded tracer that includes the compound with a description of its labels, and the Millimolar (mM) concentration of each tracer in the solution.

Consult the description of the Infusate Name column in the Infusates sheet documentation above for a format description.
Infusion Rate

Volume of infusate solution infused (microliters (ul) per minute per gram of animal body weight).
Diet

Description of animal diet used. Include the manufacturer identification and short description where possible. The column in the Animals sheet will have a drop-down containing the existing options on TraceBase. You can also consult the Animals page on TraceBase, where the Diet column also contains a select list with the current unique diets in TraceBase.
Feeding Status
- fasted
- fed
- refed
Indicate the length of fasting/feeding in the Treatments sheet's Treatment Description column or in the Study sheet's Description column.
Treatment

A Short, unique identifier for animal treatment protocol. Details are provided in the "Treatment Description" field on the "Treatments" sheet.

Example: T3 in drinking water

Default: no treatment

Note that unique diets and feeding status are indicated elsewhere, and considered distinct from "animal treatments".
Study

The cells in this column are entered via drop-down and the content of the drop-down is populated using the Name column of the Study sheet.

A name/identifier for the "experiment" that an animal belongs to.

If an animal belongs to multiple studies in this submission, manually enter them in one cell, delimited by semicolons (;). Every row in the Animals sheet should represent a unique animal.

See Recommended Practices for Organizing Data

Samples Sheet

Sample

Unique identifier for the biological sample. Generally, the sample names should match the sample headers in the AccuCor/IsoCor files, but often, such a sample header may differ from peak annotation file to peak annotation file, due to modifications of the mzXML filename for uniqueness, or it indicate the scan's polarily or range. TraceBase removes these "scan labels" from the sample names when it automatically populates this column. The original sample header in each peak annotation file is preserved in the Peak Annotation Details sheet.

See Recommended Practices for Organizing Data for suggestions on how to name samples
Date Collected

Date sample was collected (YYYY-MM-DD).
Researcher Name

FIRST LAST

Researcher primarily responsible for collection of this sample.

Secondary people (PI, collaborator, etc) should be mentioned in the study description.
Tissue

Type of tissue. A tissue can be selected via drop-down menu. If the desired tissue is not in the drop-down, enter it in the Tissues sheet, then come back and select it in the automatically updated drop-down in this column.

The list of tissues in TraceBase can also be viewed on the TraceBase site's Tissues page.
Collection Time

Minutes after the start of the infusion when the tissue was collected.

Collection Time for samples collected before the infusion should be <= 0.
Animal

The animal from which this sample was collected. The animals is selected via drop-down menu. If the desired animal is not in the drop-down, enter it in the Name column of the Animals sheet, then come back and select it in the automatically updated drop-down in this column.

Sequences Sheet

Sequence Name

This is a read-only column that is populated by Excel formula, representing a unique name for an MS Run Sequence.

Note that an MS Run Sequence is unique to a researcher, protocol, instrument (model), and date. If a researcher performs multiple such Mass Spec Runs on the same day, this single MS Run Sequence record will represent multiple runs.

Comma-delimited string combining the values from these columns in this order:
- Operator
- LC Protocol Name
- Instrument
- Date
The values in this column are referenced by the 'Default Sequence' column in the 'Peak Annotation Files' sheet and the Sequence column in the Peak Annotation Details sheet.

If you used all of the metadata fields on the Upload Start page, this column will have been automatically populated. If you edit any information in the columns controlled by the excel formula, this column will update, unless the formulas were stripped by using the downloaded file from the Upload Validate page, in which case, you may be able to copy a formula from the first enpty row and paste it into the stale valued cell.
Operator

FIRST LAST

Researcher who operated the Mass Spec instrument. If you used the metadata fields on the Upload Start page, this column will have been automatically populated.

Select an 'Operator' from the dropdowns in this column or enter a new researcher. If the new researcher was also the sample handler, ensure the names match.
LC Protocol Name

Unique laboratory-defined name of the liquid chromatography method.(e.g. polar- HILIC-25-min). If you used the metadata fields on the Upload Start page, this column will have been automatically populated.

Select an 'LC Protocol Name' from the dropdowns in this column. The dropdowns are populated by the Name column in the LC Protocols sheet, so if the dropdowns are empty, add rows to the LC Protocols sheet.
Instrument

The model name of the mass spectrometer.

Select an instrument from the dropdowns in this column. Valid values are:
- QE
- QE2
- QEPlus
- QEHF
- Exploris240
- Exploris480
- ExplorisMX
- unknown
You may enter a new model, if necessary.
Date

The date that the mass spectrometer was run.

Format: YYYY-MM-DD
Notes

Freeform notes on this mass spectrometer run sequence.

Peak Annotation Files Sheet

Peak Annotation File

Peak annotation file, e.g. AccuCor, IsoCorr, etc.

If the file will not be in the top level of the study directory, include a POSIX path (where the path delimiter is a forward slash /) relative to the study directory.

The values in this column are referenced by the Peak Annotation File Name column in the Peak Annotation Details sheet.
File Format

Peak annotation file format. Default: automatically detected.

Select a format from the dropdowns in this column. Valid values are:
- isocorr
- accucor
- isoautocorr
- unicorr^
^ unicorr is an internal format that the common elements of the other formats are converted into for loading. There is currently no way to save an excel file in this format, so please ignore this option.
Default Sequence

The default Sequence to use to associate peak groups with the file they were derived from, when loading a Peak Annotation File. This default can be overridden by values supplied in the Peak Annotation File Name column in the Peak Annotation Details sheet.

Refer to the Sequence Name column in the Sequences sheet for format details.

Select a Default Sequence from the dropdowns in this column. The dropdowns are populated by the Sequence Name column in the Sequences sheet, so if the dropdowns are empty, add rows to the Sequences sheet.

Peak Annotation Details Sheet

Sample Name

A sample that was injected at least once during a mass spectrometer sequence.

Select a Sample Name from the dropdowns in this column. The dropdowns are populated by the Sample column in the Samples sheet, so if the dropdowns are empty, add rows to the Samples sheet.
Sample Data Header

Sample header from the Peak Annotation File.

Note, this column is only conditionally required with mzXML File Name. I.e. one of these 2 columns is required.
mzXML File Name

A file representing a subset of data extracted from the raw file (e.g. an mzXML file).

Note, you can load any/all mzXML File Names for a Sample Name before the Peak Annotation File is ready to load, in which case you can just leave this value empty.

Note, this column is only conditionally required with Sample Data Header. I.e. an mzXML File Name can be loaded without a Peak Annotation File Name value.
Peak Annotation File Name

Name of the peak annotation file. If the sample on any given row was included in a Peak Annotation File, add the name of that file here.

Select a Peak Annotation File Name from the dropdowns in this column. The dropdowns are populated by the Peak Annotation File column in the Peak Annotation Files sheet, so if the dropdowns are empty, add rows to the Peak Annotation Files sheet.
Sequence

The Sequence associated with the Sample Name, Sample Data Header, and/or mzXML File Name on this row.

Refer to the Sequence Name column in the Sequences sheet for format details.

Select a Sequence from the dropdowns in this column. The dropdowns are populated by the Sequence Name column in the Sequences sheet, so if the dropdowns are empty, add rows to the Sequences sheet.
Skip

Whether to load data associated with this sample, e.g. a blank sample.

Enter 'skip' to skip loading of the sample and peak annotation data. The mzXML file will be saved if supplied, but it will not be associated with an MSRunSample or MSRunSequence, since the Sample record will not be created. Note that the Sample Name, Sample Data Header, and Sequence columns must still have a unique combo value (for file validation, even though they won't be used).

Boolean: skip or '' (i.e. empty).

Peak Group Conflicts Sheet

This sheet is hidden unless peak group conflicts were detected when the Upload Start page generated the Study Doc template. TraceBase will accept only one peak group measurement for each compound in a given sample. Sometimes a compound can show up in multiple scans (e.g. in positive and negative mode scans). If the same compound was picked for the same sample in El Maven and used to generate multiple peak annotation files, the preferred peak annotation file to represent that compound must be selected. That's the purpose this sheet serves.

Peak Group Conflict

Peak group name, composed of 1 or more compound synonyms, delimited by /, e.g. citrate/isocitrate. (Note, synonym(s) may confer information about the compound that is not recorded in the compound record, such as a specific stereoisomer.)

A peak group that exists in multiple peak annotation files containing common samples. Only 1 peak group may represent each compound per sample. Note that different synonymns of the same compound are treated as qualitatively different compounds (to support for example, stereo-isomers).

Note that the order and case of the compound synonyms could differ in each file.
Selected Peak Annotation File

TraceBase will accept only one peak group measurement for each compound in a given sample. Sometimes a compound can show up in multiple scans (e.g. in positive and negative mode scans). You must select the file containing the best representation of each compound. Using the provided drop-downs, select the peak annotation file from which this peak group should be loaded for the listed samples. That compound in the remaining files will be skipped for those samples. Note, each drop-down contains only the peak annotation files containing the peak group compound for that row.

The values in this column are referenced by the Peak Annotation File column in the Peak Annotation Files sheet.
Common Sample Count

The number of Common Samples among the files listed for the given peak group compound.
Example Samples

This column contains a sampling of the Common Samples between the files in the Selected Peak Annotation File drop-down.

A string of sample names delimited by ;.
Common Samples

This column contains a sorted list of sample names that multiple peak annotation files have in common, and each measure the same peak group compound.

A string of sample names delimited by ;.

Treatments Sheet

Animal Treatment

Short, unique identifier for animal treatment protocol. Must match the same value in Samples sheet.

Treatment Description

A thorough description of an animal treatment protocol. This will be useful for searching and filtering, so use standard terms and be as complete as possible.

Any difference in treatment should be indicated by a new Animal Treatment.

Example: different doses of drug

Animal Treatment	Treatment Description
no treatment	No treatment was applied. Animal was housed at room temperature with a normal light cycle.
T3 in drinking water	T3 was provided in drinking water at 0.5 mg/L for two weeks prior to infusion.
T3 in drinking water (1.5 mg/L)	T3 was provided in drinking water at 1.5 mg/L for two weeks prior to infusion.

Tissues Sheet

Tissue

Short identifier used by TraceBase. Use the most specific identifier applicable to your samples. If your data contains a tissue not already listed here, create a new row.
Description

Long form description of TraceBase tissue.

Compounds Sheet

Note that the Upload Start page will add the Compound and Formula, as extracted from the peak annotation files if the compound name does not exist in TraceBase as either a primary compound name or synonym. It will also add all^ complete compound records with matching formulas so that you can check if any novel compound name from the peak annotation files should be a synonym of an existing compound._

^ "all" existing compounds from TraceBase are added to the Compounds sheet, based on the Formula with one caveat: The formulas in peak annotation files often represent the ionized version of the compound, in which case, an existing compound in TraceBase may not be included in the Compounds sheet because its formula is not an exact match. In this case, after adding an HMDB ID, you will encounter a duplicate record error. Unfortunately, the only current way to resolve this is to consult the TraceBase site to copy over the record to the Compounds sheet.

Compound

A unique compound name that is commonly used in the laboratory (e.g. glucose, C16:0, etc.).

The values in this column are referenced by the Compound column in the Tracers sheet.
HMDB ID

A unique identifier for this compound in the Human Metabolome Database.
Formula

The molecular formula of the compound (e.g. C6H12O6, C16H32O2, etc.).
Synonyms

A semicolon-delimited list of unique synonymous names for a compound that is commonly used within the laboratory. (e.g. palmitic acid, hexadecanoic acid, C16, and palmitate might also be synonyms for C16:0).

LC Protocols Sheet

LC Protocol

This is a read-only column that is populated by Excel formula, representing a unique laboratory-defined name for a liquid chromatography method that also indicates the run length. E.g. polar- HILIC-25-min

While this column is automatically populated by Excel formula, the following describes the formula output, if you wish to manually enter it.

E.g. 'LC Protocol'-'Run Length'-min

The values in this column are referenced by the Sequence Name column in the Sequences sheet.
Run Length

Time duration to complete a sample run through the liquid chromatography method.

Units: minutes. Example: 25

Select a Run Length from the dropdowns in this column or enter a new value.
Description

Unique full-text description of the liquid chromatography method.