Exclusive: Soil Ecology

Analytical methods for metagenomic technology and microbial community diversity

  • PENG Xi ,
  • FENG Kai ,
  • LI Shuzhen ,
  • DENG Ye
Expand
  • 1. CAS Key Laboratory of Environmental Biology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China;
    2. College of Resources and Environment, University of Chinese Academy Sciences, Beijing 100049, China;
    3. Key Laboratory of Industrial Ecology and Environmental Engineering, Ministry of Education,School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China

Received date: 2021-06-21

  Revised date: 2021-12-17

  Online published: 2022-03-25

Abstract

The rapid development of omics technology represented by metagenomics technology has greatly promoted our understanding of microbial diversity,composition,structure,and function in the natural ecosystem.However,the big data generated by the technology can be a great challenge to researchers'data analyzing and mining abilities.This review,based on two technical aspects,amplicon and whole-genome shotgun sequencing,summarizes the analysis workflow of metagenomics for discovering microbial community.Then,the concepts of microbial community diversity,the related principles of statistical analysis,and related interpretation of statistical tests are illustrated.Finally,the paper points out that to overcome the complexity of metagenomic data and huge amount information,using big data analysis so as to illuminate analytical results are the common challenge for environmental microbiologists,bioinformaticians,and statisticians.

Cite this article

PENG Xi , FENG Kai , LI Shuzhen , DENG Ye . Analytical methods for metagenomic technology and microbial community diversity[J]. Science & Technology Review, 2022 , 40(3) : 99 -111 . DOI: 10.3981/j.issn.1000-7857.2022.03.009

References

[1] Borneman J, Skroch P W, O'Sullivan K M, et al. Molecular microbial diversity of an agricultural soil in Wisconsin[J]. Applied and Environmental Microbiology, 1996, 62(6):1935-1943.
[2] Trevors J T. Bacterial biodiversity in soil with an emphasis on chemically-contaminated soils[J]. Water, Air, and Soil Pollution, 1998, 101(1):45-67.
[3] 魏子艳,金德才,邓晔.环境微生物宏基因组学研究中的生物信息学方法[J].微生物学通报, 2015, 42(5):890-901.
[4] Handelsman J, Rondon M R, Brady S F, et al. Molecular biological access to the chemistry of unknown soil microbes:A new frontier for natural products[J]. Chemistry & Biology, 1998, 5(10):R245-R249.
[5] Mardis E R. Next-generation DNA sequencing methods[J]. Annual Review of Genomics and Human Genetics, 2008, 9:387-402.
[6] Eid J, Fehr A, Gray J, et al. Real-time DNA sequencing from single polymerase molecules[J]. Science, 2009, 323(5910):133-138.
[7] Niedringhaus T P, Milanova D, Kerby M B, et al. Landscape of next-generation sequencing technologies[J]. Analytical Chemistry, 2011, 83(12):4327-4341.
[8] Tremblay J, Yergeau E. Systematic processing of ribosomal RNA gene amplicon sequencing data[J]. GigaScience, 2019, 8(12):giz146.
[9] Zhou J, Wu L, Deng Y, et al. Reproducibility and quantitation of amplicon sequencing-based detection[J]. The ISME Journal, 2011, 5(8):1303-1313.
[10] Liu Y X, Qin Y, Chen T, et al. A practical guide to amplicon and metagenomic analysis of microbiome data[J]. Protein Cell, 2020, 12:315-330.
[11] Magoc T, Salzberg S L. FLASH:Fast length adjustment of short reads to improve genome assemblies[J]. Bioinformatics, 2011, 27(21):2957-2963.
[12] Edgar R C. UPARSE:Highly accurate OTU sequences from microbial amplicon reads[J]. Nature Methods, 2013, 10:996-998.
[13] Callahan B J, McMurdie P J, Rosen M J, et al. DADA2:High-resolution sample inference from Illumina amplicon data[J]. Nature Methods, 2016, 13:581-583.
[14] Edgar R C. UNOISE2:Improved error-correction for Illumina 16S and ITS amplicon sequencing[J]. BioRxiv, 2016, doi:10.1101/081257.
[15] Amir A, McDonald D, Navas-Molina J A, et al. Deblur rapidly resolves single-nucleotide community sequence patterns[J]. mSystems, 2017, 2(2):e00191-16.
[16] Milanese A, Mende D R, Paoli L, et al. Microbial abundance, activity and population genomic profiling with mOTUs2[J]. Nature Communications, 2019, 10:1014.
[17] Ewels P, Magnusson M, Lundin S, et al. MultiQC:Summarize analysis results for multiple tools and samples in a single report[J]. Bioinformatics, 2016, 32(19):3047-3048.
[18] Chen S, Zhou Y, Chen Y, et al. fastp:An ultra-fast allin-one FASTQ preprocessor[J]. Bioinformatics, 2018, 34(17):i884-i890.
[19] Bolger A M, Lohse M, Usadel B. Trimmomatic:A flexible trimmer for Illumina sequence data[J]. Bioinformatics, 2014, 30(15):2114-2120.
[20] De Coster W, D'Hert S, Schultz D T, et al. Nanopack:Visualizing and processing long-read sequencing data[J]. Bioinformatics, 2018, 34(15):2666-2669.
[21] Li D, Liu C M, Luo R, et al. MEGAHIT:An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph[J]. Bioinformatics, 2015, 31(10):1674-1676.
[22] Li D, Luo R, Liu C M, et al. MEGAHIT v1.0:A fast and scalable metagenome assembler driven by advanced methodologies and community practices[J]. Methods, 2016, 102:3-11.
[23] Bankevich A, Nurk S, Antipov D, et al. SPAdes:A new genome assembly algorithm and its applications to single-cell sequencing[J]. Journal of Computational Biology, 2012, 19(5):455-477.
[24] Nurk S, Meleshko D, Korobeynikov A, et al. metaSPAdes:A new versatile metagenomic assembler[J]. Genome Research, 2017, 27(5):824-834.
[25] Peng Y, Leung H C, Yiu S M, et al. IDBA-UD:A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth[J]. Bioinformatics, 2012, 28(11):1420-1428.
[26] Luo R, Liu B, Xie Y, et al. SOAPdenovo2:An empirically improved memory-efficient short-read de novo assembler[J]. GigaScience, 2012, 1(1):2047-217X-1-18.
[27] Luo R, Liu B, Xie Y, et al. Erratum:SOAPdenovo2:An empirically improved memory-efficient short-read de novo assembler[J]. GigaScience, 2015, 4(1):s13742-015-0069-2.
[28] Koren S, Walenz B P, Berlin K, et al. Canu:Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation[J]. Genome Research, 2017, 27(5):722-736.
[29] Bertrand D, Shaw J, Kalathiyappan M, et al. Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes[J]. Nature Biotechnology, 2019, 37(8):937-944.
[30] Kolmogorov M, Bickhart D M, Behsaz B, et al. metaFlye:Scalable long-read metagenome assembly using repeat graphs[J]. Nature Methods, 2020, 17(11):1103-1110.
[31] Kang D D, Li F, Kirton E, et al. MetaBAT 2:An adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies[J]. PeerJ, 2019, 7:e7359.
[32] Wu Y W, Tang Y H, Tringe S G, et al. MaxBin:An automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm[J]. Microbiome, 2014, 2:26.
[33] Wu Y W, Simmons B A, Singer S W. MaxBin 2.0:An automated binning algorithm to recover genomes from multiple metagenomic datasets[J]. Bioinformatics, 2016, 32(4):605-607.
[34] Alneberg J, Bjarnason B S, de Bruijn I, et al. Binning metagenomic contigs by coverage and composition[J]. Nature Methods, 2014, 11(11):1144-1146.
[35] Nissen J N, Johansen J, Allesøe R L, et al. Improved metagenome binning and assembly using deep variational autoencoders[J]. Nature Biotechnology, 2021, 39(5):555-560.
[36] Uritskiy G V, DiRuggiero J, Taylor J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis[J]. Microbiome, 2018, 6(1):158.
[37] Buchfink B, Reuter K, Drost H G. Sensitive protein alignments at tree-of-life scale using diamond[J]. Nature Methods, 2021, 18(4):366-368.
[38] Wood D E, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2[J]. Genome Biology, 2019, 20:257.
[39] Menzel P, Ng K L, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju[J]. Nature Communications, 2016, 7:11257.
[40] Beghini F, McIver L J, Blanco-Miguez A, et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3[J]. eLife, 2021, 10:e65088.
[41] Hyatt D, LoCascio P F, Hauser L J, et al. Gene and translation initiation site prediction in metagenomic sequences[J]. Bioinformatics, 2012, 28(17):2223-2230.
[42] Hamilton A J. Species diversity or biodiversity?[J]. Journal of Environmental Management, 2005, 75(1):89-92.
[43] Naeem S, Duffy J E, Zavaleta E. The functions of biological diversity in an age of extinction[J]. Science, 2012, 336(6087):1401-1406.
[44] Whittaker R H. Vegetation of the Siskiyou Mountains, Oregon and California[J]. Ecological Monographs, 1960, 30:279-338.
[45] Whittaker R J, Willis K J, Field R. Scale and species richness:Towards a general, hierarchical theory of species diversity[J]. Journal of Biogeography, 2001, 28(4):453-470.
[46] Sanders H L. Marine benthic diversity:A comparative study[J]. The American Naturalist, 1968, 102(925):243-282.
[47] McMurdie P J, Holmes S. Waste not, want not:Why rarefying microbiome data is inadmissible[J]. PLoS Computational Biology, 2014, 10(4):e1003531.
[48] Chao A. Nonparametric estimation of the number of classes in a population[J]. Scandinavian Journal of Statistics, 1984, 11(4):265-270.
[49] Shannon C E. A mathematical theory of communication[M]. New York:Bell System Technical Journal.
[50] Spellerberg I F, Fedor P J. A tribute to Claude Shannon (1916-2001) and a plea for more rigorous use of species richness, species diversity and the'Shannon-Wiener'index[J]. Global Ecology and Biogeography, 2003, 12:177-179.
[51] Lemos L N, Fulthorpe R R, Triplett E W, et al. Rethinking microbial diversity analysis in the high throughput sequencing era[J]. Journal of Microbiological Methods, 2011, 86(1):42-51.
[52] Magurran A E. Measuring biological diversity[M]. Hoboken:Wiley-Blackwell, 2004.
[53] Simpson E H. Measurement of diversity[J]. Nature, 1949, 163(4148):688.
[54] Hill M O. Diversity and evenness:A unifying notation and its consequences[J]. Ecology, 1973, 54:427-432.
[55] Chao A, Gotelli N J, Hsieh T C, et al. Rarefaction and extrapolation with Hill numbers:A framework for sampling and estimation in species diversity studies[J]. Ecological Monographs, 2014, 84(1):45-67.
[56] Alberdi A, Gilbert M T P. hilldiv:An R package for the integral analysis of diversity based on Hill numbers[J]. bioRxiv, 2019, doi:10.1101/545665.
[57] Vane-Wright R I, Humphries C J, Williams P H. What to protect?-systematics and the agony of choice[J]. Biological Conservation, 1991, 55(3):235-254.
[58] Jaccard P. The distribution of the flora in the alpine zone[J]. New Phytologist, 1912, 11(2):37-50.
[59] Sørensen T. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analysis of the vegetation on Danish commons[J]. Biologiske Skrifter, 1948, 5(4):1-34.
[60] Dice L R. Measures of the amount of ecologic association between species[J]. Ecology, 1945, 26(3):297-302.
[61] Odum E P. Bird populations of the highlands (North Carolina) plateau in relation to plant succession and avian invasion[J]. Ecology, 1950, 31(4):587-605.
[62] Bray J R, Curtis J T. An ordination of the upland forest communities of Southern Wisconsin[J]. Ecological Monographs, 1957, 27(4):325-349.
[63] Legendre P, Legendre L. Developments in Environmental Modelling[M]. Amsterdam:Elsevier, 1998.
[64] Legendre P, De Cáceres M. Beta diversity as the variance of community data:Dissimilarity coefficients and partitioning[J]. Ecology Letters, 2013, 16(8):951-963.
[65] R Core Team. R:A language and environment for statistical computing[EB/OL].[2021-01-09]. https://www.rproject.org/.
[66] Oksanen J, Blanchet F G, Friendly M, et al. vegan:Community Ecology Package[EB/OL].[2021-06-14]. https://github.com/vegandevs/vegan.
[67] Faith D P. Conservation evaluation and phylogenetic diversity[J]. Biological Conservation, 1992, 61(1):1-10.
[68] Webb C O, Ackerly D D, McPeek M A, et al. Phylogenies and community ecology[J]. Annual Review of Ecology and Systematics, 2002, 33(1):475-505.
[69] Faith D P, Baker A M. Phylogenetic diversity (PD) and biodiversity conservation:Some bioinformatics challenges[J]. Evolutionary Bioinformatics Online, 2007, 2:121-128.
[70] Lozupone C, Knight R. UniFrac:A new phylogenetic method for comparing microbial communities[J]. Applied and Environmental Microbiology, 2005, 71(12):8228-8235.
[71] Paliy O, Shankar V. Application of multivariate statistical techniques in microbial ecology[J]. Molecular Ecology, 2016, 25(5):1032-1057.
[72] Clarke K R. Non-parametric multivariate analyses of changes in community structure[J]. Australian Journal of Ecology, 1993, 18(1):117-143.
[73] Anderson M J. A new method for non-parametric multivariate analysis of variance[J]. Austral Ecology, 2001, 26(1):32-46.
[74] Mielke P W, Berry K J. Permutation methods:A distance function approach[M]. New York:Springer, 2001.
[75] Mantel N. The detection of disease clustering and a generalized regression approach[J]. Cancer Research, 1967, 27(2):209-220.
[76] Pearson K L. On lines and planes of closest fit to systems of points in space[J]. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901, 2(11):559-572.
[77] Hotelling H. Analysis of a complex of statistical variables into principal components[J]. Journal of Educational Psychology, 1933, 24(6):417-441.
[78] Hill M O. Correspondence analysis:A neglected multivariate method[J]. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1974, 23(3):340-354.
[79] Hill M O. Reciprocal averaging:An eigenvector method of ordination[J]. Journal of Ecology, 1973, 61:237-249.
[80] Rao C R. The use and interpretation of principal component analysis in applied research[J]. Sankhyā:The Indian Journal of Statistics, Series A (1961-2002), 1964, 26(4):329-358.
[81] Ter Braak C J F. Canonical correspondence analysis:A new eigenvector technique for multivariate direct gradient analysis[J]. Ecology, 1986, 67(5):1167-1179.
[82] 马海霞,张丽丽,孙晓萌,等.基于宏组学方法认识微生物群落及其功能[J].微生物学通报, 2015, 42(5):902-912.
Outlines

/