HmmUFOtu is an HMM based Ultra-fast OTU assignment tool for baterial 16S and amplicon sequencing research, it has two core algorithms, the CSFM-index (Consensus Sequence FM-index) powered banded-HMM algorithm, and SEP (Seed-Estimate-Place) local phylogenetic-placement based taxonomy assignment algorithm.
The main program hmmufotu takes single or paired-end NGS FASTA/FASTQ reads and generate taxonomy assignment results of every read. The main program hmmufotu-sum then generates phylogeny-based OTUs, a reference tree based OTU-tree, and consensus-based representative sequences for the OTUs. See the details on GitHub.
HmmUFOtu supports all major DNA substitution models and an optional Discrete Gamma (dΓ) model (Yang 1994) for capturing among-site variations.
Please download the source code (written in pure C++98) or pre-compiled binaries from GitHub.
You need to build an HmmUFOtu database before assigning taxonomies to your 16S or other amplicon sequencing reads. You can build your own database using hmmufotu-build (which may take ~10 mins with 6 processors), or alternatively download the pre-built databases below.
- gg_97_otus_GTR GreenGenes (v13.8) species-level (97% OTU) reference + GTR DNA model. This is recommended for most bacteria 16S studies.
- gg_97_otus_TN93 GreenGenes (v13.8) species-level (97% OTU) reference + TN93 DNA model
- gg_97_otus_HKY85 GreenGenes (v13.8) species-level (97% OTU) reference + HKY85 DNA model
- gg_79_otus_GTR "GreenGenes 79% OTU + GTR") GreenGenes (v13.8) middle-level (79% OTU) reference + GTR DNA model
- gg_79_otus_TN93 "GreenGenes 79% OTU + TN93") GreenGenes (v13.8) middle-level (79% OTU) reference + TN93 DNA model
- gg_79_otus_HKY85 "GreenGenes 79% OTU + HKY85") GreenGenes (v13.8) middle-level (79% OTU) reference + HKY85 DNA model
HmmUFOtu is currently being peer-reviewed on Genome Biology.