Microbial Genomics Gold Found in Old Data

C. Titus Brown is associate professor in the UC Davis School of Veterinary Medicine and Genome Center.
C. Titus Brown is associate professor in the UC Davis School of Veterinary Medicine and Genome Center.

There’s gold in those old databases. Analyses of genomic data often miss a large amount of information, but genome scientists at UC Davis have now created an automated analysis pipeline to dig out this hidden information. 

In a new study published in the journal GigaScience the researchers mine a huge marine microbial dataset from the Microbial Transcriptome Sequencing Project (MMETSP) to find new results.

Previous work on the MMETSP sequenced 678 transcriptomes (a set of all expressed RNA sequences) and assembled genes spanning 396 different strains of marine eukaryotes. This dataset has been an invaluable resource for ocean science, exponentially expanding the accessible genetic information base of marine protistan life.

But in the 5 years since the original analysis, tools, techniques and databases have all improved. Reanalysis of previously generated data with new tools is not commonplace, and it is unclear what the best practice would be. Running analyses again produces different results and the effects of using different workflows, or “pipelines”, are poorly understood, making it difficult to determine the usefulness of the new results relative to the previous findings.

C. Titus Brown, associate professor in the Department of Population Health and Reproduction, UC Davis School of Veterinary Medicine, graduate student Lisa Johnson and postdoctoral researcher Harriet Alexander went back to the original raw data from MMETSP and created an automated pipeline to assemble and annotate it. The resulting new transcriptome assemblies were then automatically evaluated and compared against previously-generated results from the original assembly pipeline developed by the National Center for Genome Research. As there is no one-size-fits-all protocol for transcriptome assembly, and as software tools are constantly improving, Brown and colleagues’ pipeline enabled improvements to be tested and quantified.

Provided by Scott Edmunds, executive editor of GigaScience. Adapted from a blog post published by the journal. 

Read More on the UC Davis Egghead blog