Source code¶
The following code can all be in /home/ubuntu/scripts
. Some of these
scripts are used by the pipeline, others are not currently incorporated
but may still prove useful to your 16S-related work.
Python modules¶
- Formatting.py - Module to house miscellaneous formatting methods, e.g. conversion from classic dense format to BIOM format, OTU table transposition, etc.
- QualityControl.py - Methods for quality control diagnostics on a dataset.
- preprocessing_16S.py - Methods and wrappers for raw 16S sequence data processing.
- Taxonomy.py - Methods for taxonomy-related feature extraction and analytics. Includes functions for things like: -adding latin names to a GreenGenes-referenced OTU table -collapsing abundances at different taxonomic levels
- Phylogeny.py - Methods for phylogenetic feature extraction, e.g. left/right (LR) abundance ratios at each node of a phylogenetic tree.
- Analytics.py - Generic statistical analysis tools, e.g. Wilcoxon tests across all available taxa.
- Regressions.py - Performs different types of regressions en masse.
- PipelineFilesInterface.py - Methods for reading and moving around raw data files and for reading specific groups of attributes from the summary file.
Scripts and routines¶
- Master.py - Master script that calls relevant processing pipelines, e.g. raw2otu.py.
- raw2otu.py - Pipeline for converting raw 16S FASTQ sequence files to OTU tables. Handles parallelization requirements in these processing steps automatically. Takes as input a directory that contains a summary file and the raw data.
- dbotu.py - Module for doing distribution-based OTU calling.
- rdp_classify.py - Wrapper for calling the RDP classifier jar file,
found at
/home/ubuntu/tools/RDPTools/classifier.jar
.