Bioinformatic Tools

In Grice Lab, we develop cutting-edge and state-of-art computational tools for skin microbiome, molecular ecology, and general translational genomics research.

AlignerBoost

AlignerBoost is a tool for boosting the precision and sensitivity of NextGen-seq aligners.

Introduction
Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present.

Main function
AlignerBoost is a generalized software toolkit for boosting the mapping accuracy of model NGS aligners, which utilizes a Bayesian-based framework to accurately estimate the mapping quality of ambiguously mapped NGS reads.
We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds.
AlignerBoost is also SNP-aware and higher-quality alignments can be achieved if provided with known SNPs. AlignerBoost’s algorithm is computationally efficient and can process one million alignments within 30 seconds on a typical desktop computer.

Download
AlignerBoost is implemented as a uniform Java application and is freely available at GitHub.

Citations
Please cite AlignerBoost on PubMed for using this tool.

Contact us
Please contact Qi Zheng or Elizabeth Grice with any questions.

HmmUFOtu

HmmUFOtu is an HMM and phylogenetic placement-based Ultra-fast OTU assignment tool for bacterial 16S and amplicon sequencing research. 

Introduction

HmmUFOtu is an HMM-based Ultra-fast OTU assignment tool for bacterial 16S and amplicon sequencing research, it has two core algorithms, the CSFM-index (Consensus Sequence FM-index) powered banded-HMM algorithm, and SEP (Seed-Estimate-Place) local phylogenetic-placement based taxonomy assignment algorithm.

The main program hmmufotu takes single or paired-end NGS FASTA/FASTQ reads and generates taxonomy assignment results for every read. The main program hmmufotu-sum then generates phylogeny-based OTUs, a reference tree-based OTU-tree, and consensus-based representative sequences for the OTUs. See the details on GitHub.

Supported models

HmmUFOtu supports all major DNA substitution models and an optional Discrete Gamma (dΓ) model (Yang 1994) for capturing among-site variations.

Download

Please download the source code (written in pure C++98) or pre-compiled binaries from GitHub.

Pre-built databases

You need to build an HmmUFOtu database before assigning taxonomies to your 16S or other amplicon sequencing reads. You can build your own database using hmmufotu-build (which may take ~10 mins with 6 processors), or alternatively download the pre-built databases below. Note: HmmUFOtu is backward compatible; you don't need to download the databases again even if HmmUFOtu has been updated since your last download.

  • gg_97_otus_GTR GreenGenes (v13.8) species-level (97% OTU) reference + GTR DNA model. This is recommended for most bacteria 16S studies.

  • gg_97_otus_TN93 GreenGenes (v13.8) species-level (97% OTU) reference + TN93 DNA model

  • gg_97_otus_HKY85 GreenGenes (v13.8) species-level (97% OTU) reference + HKY85 DNA model

  • gg_79_otus_GTR GreenGenes (v13.8) middle-level (79% OTU) reference + GTR DNA model

  • gg_79_otus_TN93 GreenGenes (v13.8) middle-level (79% OTU) reference + TN93 DNA model

  • gg_79_otus_HKY85 GreenGenes (v13.8) middle-level (79% OTU) reference + HKY85 DNA model

  • gg_99_otus_GTR (part0part1) GreenGenes (v13.8) strain-level (99% OTU) reference + GTR DNA model. Warning: you may need at least 48 GB free memory to use this database.

Citations
Please cite HmmUFOtu on PubMed for using this tool.

Contact us

Please contact Qi Zheng or Elizabeth Grice with any questions.