Even with today's most sophisticated sequencing technologies, any microbial diversity study just scratches the surface of microbial diversity: they communities are simply to rich for us to detect any substantial part of it. As a result, the actual number of microbial species in a sample, environment, or biosphere remains largely unknown. Several years ago, we asked the question of how one could statistically estimate the whole on the basis of a small sample of this whole. The result was an incredible journey into the jungle of statistical tools and methods, and in collaboration with John Bunge from Cornell University we made important advances in this area. We developed a novel methodology that can predict, using parametric modeling, the number of species that must exist in the sample to account for what microbiologists can empirically observe in this sample (Hong et al. 2006). With support from NSF, we are building a tool kit that will be web-based and freely available so that every microbiologist could take results of - admittedly modest - survey of a specific environment and predict its real richness. We are also using this methodology to answer some of the most basic questions in microbial diversity studies: how many species exist on our planet, how they are distributed, how and where they originate, and what ecological and evolutionary forces drive this diversity?

Examples of species frequency distributions Hong et al 2006 used to develop the new set of parametric tools to size the microbial world