Source code¶

The following code can all be in /home/ubuntu/scripts. Some of these scripts are used by the pipeline, others are not currently incorporated but may still prove useful to your 16S-related work.

Python modules¶

Formatting.py - Module to house miscellaneous formatting methods, e.g. conversion from classic dense format to BIOM format, OTU table transposition, etc.
QualityControl.py - Methods for quality control diagnostics on a dataset.
preprocessing_16S.py - Methods and wrappers for raw 16S sequence data processing.
Taxonomy.py - Methods for taxonomy-related feature extraction and analytics. Includes functions for things like: -adding latin names to a GreenGenes-referenced OTU table -collapsing abundances at different taxonomic levels
Phylogeny.py - Methods for phylogenetic feature extraction, e.g. left/right (LR) abundance ratios at each node of a phylogenetic tree.
Analytics.py - Generic statistical analysis tools, e.g. Wilcoxon tests across all available taxa.
Regressions.py - Performs different types of regressions en masse.
PipelineFilesInterface.py - Methods for reading and moving around raw data files and for reading specific groups of attributes from the summary file.

Scripts and routines¶

Master.py - Master script that calls relevant processing pipelines, e.g. raw2otu.py.
raw2otu.py - Pipeline for converting raw 16S FASTQ sequence files to OTU tables. Handles parallelization requirements in these processing steps automatically. Takes as input a directory that contains a summary file and the raw data.
dbotu.py - Module for doing distribution-based OTU calling.
rdp_classify.py - Wrapper for calling the RDP classifier jar file, found at /home/ubuntu/tools/RDPTools/classifier.jar.