Uploading FAQ
How "ready" does my data have to be to upload to TraceBase?
Every data submission to TraceBase (sample metadata, peak annotation files
(AccuCor
/IsoCor
/Iso-AutoCor
), and
RAW
/mzXML
files) is described/organized in a submission template we refer to as a Study Doc
(an Excel
Spreadsheet). You can create a study doc that contains the samples/animals associated with as few as one peak
annotation file, an entire MS Run, a whole Study, or even multiple studies. We recommend that as soon as you have a
peak annotation file, you draft a submission to TraceBase.
The submission process uses the peak annotation files
to automate the entry of a large portion of the metadata when
you download the template, such as sample names and compounds, but some manual metadata entry (for example, describing
the animals and samples) is required. The required^ columns are highlighted in blue in the downloaded Study Doc. See
How to Upload Data to TraceBase for details.
^ Note that in order for FCirc calculations to be displayed on TraceBase, some optional columns described at the top of the FCirc Rates are required.
The upload process ensures that the data integrity is preserved from study to study and from sample to sample. For example, the process ensures:
- Samples are labeled accurately
- Animal, Sample, and Study names are unique
- Consistent nomenclature is used
Your data is initially uploaded to a private folder, where a curator checks the data to ensure it is formatted correctly before it is loaded. When all checks have passed, the curator adds the data to TraceBase. This means it is OK (and expected) for your data to be imperfectly labeled when you initially submit for upload, however the process provided empowers each user to be able to solve problems on their own. As the author of the data, you are the most knowledgeable person to fix issues that come up. However, you can choose to engage as much as you want in the validation of your data.
Do my compound names need to match TraceBase compound names?
No. TraceBase maintains a list of primary compound names associated with synonyms. If you upload data with a new compound name, we will contact you to resolve the difference. If it is a new compound, then your name becomes the primary compound name. If your name matches an existing compound in TraceBase, then your name is added as a synonym, and your next upload will not have any issues.
Ideally, every new compound will have an HMDB ID associated with it. If HMDB does not have a record
for your compound, enter a fake HMDB ID in the form FakeHMDB0000
in order to validate associated data (because it's a
required value), and add a compound synonym with the compound's PubChem ID in the
form PubChem0000000
.^
Note however that currently, tracer/infusate names are always converted to TraceBase's primary compound name for consistent/uniform search results and that PeakGroup names always use whatever synonym is present in the peak annotation files. The original design thinking was that this would support distinct stereo-isomers, but this may change in the future.
^ Support for PubChem is a planned feature that will make either an HMDB or PubChem ID required, eliminating the need for the "fake" HMDB ID.
I have a new Tissue. How do I upload?
In the sample information workbook, on the Tissues tab, add your new Tissue name to the list. This will update the tissue dropdown in the tissue column of the Samples sheet, allowing you to select your new Tissue name. When you submit the google form, tell the developer you are adding a new Tissue.
Can I upload multiple data files at once?
Yes! Upload as many data files as you want. Ideally, use only one Study Doc. This will allow the software to catch multiple representations of the same compound picked for the same sample(s) in multiple peak annotation files (e.g. the same compound picked in positive versus negative mode). TraceBase allows only one representation of a compound in a sample.
My sample names in one Accucor/Isocorr file are not unique. Can I upload these together?
E.g. Samples Mouse1_Q
, Mouse2_Q
, Mouse3_Q
are in one data file for one experiment, and Mouse1_Q
, Mouse2_Q
,
Mouse3_Q
in a second data file for a second experiment.
Yes, but this may require some special attention. Ideally, every sample name in the data file should correspond to one
unique biological sample in a Study
. If that's not the case, and the 2 files containing this name collision are
uploaded together on the Start page, TraceBase will assume they are the same biological samples and create a single
sample row for each in the Samples sheet. This can be fixed manually, but in this case, it is far easier to create
separate Study Docs to avoid the errors.
If you overlooked the existence of this same-named but different biological sample case and generate a single Study Doc,
you may (or may not) see any errors, despite the existing problem (missing distinct samples). The errors you might see
are MultipleRepresentation
errors on the Start page and a Peak Group Conflicts
sheet in the downloaded Study
Doc. Whether this happens or not depends on the compounds in the peak annotation files. If 2 of the same-named
different samples analyze the same compounds (i.e. you picked the same peaks), since TraceBase thinks there was a single
sample, it assumes that you picked peaks for the same compound twice. Only 1 such compound representation is allowed
per sample, so TraceBase issues the error and prompts you to pick one of the 2 compound representations in the Peak
Group Conflicts
sheet. But since the samples should be different biological samples, picking a representative
compound will only make the problem worse.
The ultimate fix is to modify one or both of each sample name pair in both the Samples
sheet and the sample name
column in the Peak Annotation Details
sheet, then remove the associated rows in the Peak Group Conflicts
sheet, but
this can all be avoided if you create separate Study docs for the commonly named samples.
Let us know when you have this issue and a curator can make the sample name modification for you after your submission is received.
I added or edited sample rows manually. Can I upload these files?
Yes. Any study-specific data can be manually edited. Note that corresponding edits should be made in related sheets.
For example, if you add or edit a sample row, rows in the Peak Annotation Details
sheet must also added/edited.
Edited data that is subject to stricter nomenclature control and curator approval is the data in sheets:
- Tissues
- Compounds
- LC Protocols
Additions to these sheets are simpler, as long as it is not redundant. If you make modifications to these sheets, let us know. We will help you come up with an easy solution for uploading modified files. Just upload what you have and we will contact you to confirm our solution is OK.
Can I upload some data now, and upload more data from the same samples later?
Yes. TraceBase will add new data to existing samples. If the same compound is uploaded a second time, TraceBase will use the latest upload. The same is true of all data in the Study Doc.
Edited rows (outside of compound synonyms) that were already loaded are a different story and require special curator attention. Let us know if you need to modify any previously loaded data and a TraceBase curator will make the update.