Why We Should Care About the Numbers

This chapter sets the stage for what is to unfold in upcoming chapters. If you feel the urge to find some number of interest now, you can jump to any vignette in the book and come back later to this chapter which presents both the overall logic and the basic tools used to craft biological numeracy. Each of the ≈102 vignettes in the book can be read as a stand-alone answer to a quantitative question on cell biology by the numbers. The formal structure for the remainder of the book is organized according to different classes of biological numbers ranging from the sizes of things (Chapter 1) to the quantitative rules of information management in living organisms (Chapter 5) and everything in between. The goal of this first chapter is decidedly more generic, laying out the case for biological numeracy and providing general guidelines for how to arrive at these numbers using simple estimates. We also pay attention to the question of how to properly handle the associated uncertainty in both biological measurements and estimates. We build on the principles developed in the physical sciences where estimates and uncertainties are common practice, but in our case require adaptation to the messiness of biological systems.

What is gained by adopting the perspective of biological numeracy we have called “cell biology by the numbers”? The answer to this question can be argued along several different lines.  For example, one enriching approach to thinking about this question is by appealing to the many historic examples where the quantitative dissection of a given problem is what provided the key to its ultimate solution.    Examples abound, whether from the classic discoveries in genetics that culminated in Sturtevant’s map of the geography of the Drosophila genome or Hodgkin and Huxley’s discoveries of the quantitative laws that govern the dynamics of nerve impulses. More recently, the sharpness of the questions as formulated from a quantitative perspective has yielded insights into the limits of biological information transmission in processes ranging from bacterial chemotaxis to embryonic development and has helped establish the nature of biological proofreading that makes it possible for higher fidelity copying of the genetic material than can be expected from thermodynamics alone (some of these examples appear in our paper “A feeling for the numbers in biology”, PNAS 106:21465, 2010).

A second view of the importance of biological numeracy centers on the way in which a quantitative formulation of a given biological phenomenon allows us to build sharp and falsifiable claims about how it works.  Specifically, the state of the art in biological measurements is beginning to reach the point of reproducibility, precision and accuracy where we can imagine discrepancies between theoretical expectations and measurements that can uncover new and unexpected phenomena. Further, biological numeracy allows scientists an “extra sense”, as already appreciated by Darwin himself, to decide whether a given biological claim actually makes sense. Said differently, with any science, in the early stages there is a great emphasis on elucidating the key facts of the field.  For example, in astronomy, it was only in light of advanced naked-eye methods in the hands of Tycho Brahe that the orbit of Mars was sufficiently well understood to elucidate central facts such as that Mars travels around the sun in an elliptical path with the sun at one of the foci.  But with the maturity of such facts comes a new theoretical imperative, namely, to explain those facts on the basis of some underlying theoretical framework.  For example, in the case of the observed elliptical orbits of planets, it was an amazing insight to understand how this and other features of planetary orbits were the natural consequence of the inverse-square law of gravitation. We believe that biology has reached the point where there has been a sufficient accumulation of solid quantitative facts that this subject too can try to find overarching principles expressed mathematically that serve as theory to explain those facts and to reveal irregularities when they occur. In the chapters that follow, we provide a compendium of such biological facts, often presented with an emphasis that might help as a call to arms for new kinds of theoretical analysis.

Another way to think about this quest for biological numeracy is to imagine some alien form coming to Earth and wishing to learn more about what our society and daily lives look like. For example, what if we imagined that we could give the friendly alien a single publication, what such publication might prove most useful? Though different readers may come up with different ideas of their own, our favorite suggestion would be the report of the bureau of statistics that details everything from income to age at marriage to level of education to the distributions of people in cities and in the country.  The United Nations has been posting such statistics on their website: https://unstats.un.org/unsd/default.htm.

Hans Rosling has become an internet sensation as a result of the clever and interesting ways that he has found not only to organize data, but also to milk it for unexpected meaning.  Our goal is to provide a kind of report of the bureau of statistics for the cell and to attempt to find the hidden and unexpected meaning in the economy and geography of the cell.

As an example of the kind of surprising insights that might emerge from this exercise, we ask our reader to join us in considering mRNA, the “blueprint” for the real workhorses of the cell, the proteins.  Quickly, ask yourself: which is larger, the blueprint or the thing being blueprinted? Our intuition often thinks of the blueprint for a giant skyscraper and it is immediately obvious that the blueprint is but a tiny and flattened caricature of the building it “codes for”. But what of our mRNA molecule and the protein it codes for?  What is your instinct about the relative size of these two molecules? As we will show in the vignette on “What is larger, mRNA or the protein it codes for?”, most people’s intuition is way off with the mRNA molecule actually being substantially larger than the protein it codes for. This conclusion has ramifications for example for whether it is easier to transport the blueprint or the machine it codes for.

Finally, we are also hopeful for a day when there is an increasing reliance in biology on numerical anomalies as an engine of discovery. As the measurements that characterize a field become more mature and reproducible using distinct methodologies, it becomes possible to reliably ask the question of when a particular result is anomalous. Until the work of David Keeling in the 1950s, no one could even agree on what the level of CO2 in the atmosphere was, let alone figure out if it was changing. Once Keeling was able to show the rhythmic variations in CO2 over the course of a year, then questions about small overall changes in the atmospheric CO2 concentration over time could be addressed.  Perhaps more compellingly, Newton was repeatedly confounded by the 20% discrepancy between his calculated value for the speed of sound and the results from measurements.  It was only many years later that workers such as Laplace realized that a treatment of the problem as an adiabatic versus isothermal process could explain that discrepancy. The recent explosion of newly discovered extrasolar planets is yet another example where small numerical anomalies are received with such confidence that they can be used as a tool of discovery. In our view, there is no reason at all to believe that similar insights don’t await those studying living matter once our measurements have been codified to the point that we know what is irregular when we see it. In a situation where there are factors of 100 distinguishing different answers to the same question such as how many proteins are in an E. coli cell, there is little chance to discern even regularities, let alone having confidence that anomalies are indeed anomalous.  Often, the great “effects” in science are named such because they were signaled as anomalous.  For example, the change in wavelength of an oncoming ambulance siren is the famed Doppler effect.  Biochemistry has effects of its own such as the Bohr effect which is the shift in binding curves for oxygen to hemoglobin as a function of the pH.  We suspect that there are many such effects awaiting discovery in biology as a result of reproducibly quantifying the properties of cells and then paying close attention as to what those numbers can tell us.

 

The Bionumbers resource –

As a reminder of how hard certain biological numbers are to come by, we recommend the following quick exercise for the reader. Pick a topic of particular interest from molecular or cell biology and then seek out the corresponding numbers through an internet search or by browsing your favorite textbooks. For example, how many ribosomes are there in a human cell?  Or, what is the binding affinity of a celebrated transcription factor to DNA? Or, how many copies are there per cell of any famous receptor such as those of chemotaxis in bacteria or of growth hormones in mammalian cells? Our experience is that such searches are at best time consuming, and often, inconclusive or even futile. As an antidote to this problem, essentially all of the numbers presented in this book can be found from a single source, namely, the BioNumbers website  (http://bionumbers.hms.harvard.edu/).  The idea of this internet resource is to serve as an easy jumping off point for accessing the vast biological literature in which quantitative data is archived.  In particular, the data to be found in the BioNumbers database has been subjected to manual curation, has full references to the primary literature from which the data is derived and provides a brief description of the method used to obtain the data in question.

As signposts for the reader, each and every time that we quote some number, it will be tied to a reference for a corresponding BioNumbers Identification Number (BNID).  Just as our biological readers may be familiar with the PMID which is a unique identifier assigned to published articles from the biological and medical literature, the BNID serves as a unique identifier of different quantitative biological data. For example, BNID 103023 points us to one of several determinations of the number of mRNA per yeast cell.  The reader will find that both our vignettes and the data tables are filled with BNIDs and by pasting this number into the BioNumbers website (or just Googling “BNID 103023”), the details associated with that particular quantity can be uncovered.