Developing a Reproducible WDL-Based Workflow for RNASeq Data Using Modular, Software Engineering-Based Approaches
TimeSunday, November 11th10:30am - 11am
DescriptionComputational workflows have become standard in many disciplines, including bioinformatics and genomics. Workflow languages, such as the Workflow Description Language (WDL) and Common Workflow Language (CWL) have been developed to express workflow processing syntax. These languages can be highly expressive and customizable however this can result in perpetuating the complex tangle of code that can be difficult to maintain and comprehend. The Moffitt Cancer Center participates in the ORIEN Avatar project, a multi-center project that has generated molecular profiles (DNASeq, RNASeq) on ~1,000 tissues to date. To support reproducibility in the analysis of RNASeq data for this project, we have implemented an RNA Sequencing Genomics analysis pipeline using Cromwell, a WDL-based workflow engine, in our HPC environment. Constraining the language to specific structural conventions and emphasizing modularity, we have built a pipeline suitable for operational purposes and maintainability. We implemented individual tasks with built-in unit testing and nested levels of workflow integration for successively complex integration testing. This pipeline has been successfully used by bioinformatics staff at Moffitt Cancer Center with minimal training.