Ben Lambert’s “A Student’s Guide to Bayesian Statistics” offers a straightforward and pedagogical introduction to theoretical and applied Bayesian statistics. The book is couched in an elegant language adorned with a typically British sense of humour. Mathematics are kept to a minimum. The examples are appealing to a wide audience.
Lambert produces a lot of useful analogies and memorable examples (e.g. the Bayesian coastguard in 7.4.1. which shows how Bayesian updating works with search and rescue or the explanation of Hamiltonian Monte Carlo with a sledge in 15.3). This should help Bayesian beginners to better grasp on difficult concepts. The chapters have a clear structure and finish with two pedagogically practical devices: a chapter summary and chapter outcomes. In our reading group, we did not attempt the problem sets but having skimmed through them, there are a lot of interesting datasets to play with. Lambert has also produced terrific videos that accompany the book.
This is the perfect book to start a Bayesian adventure or to refresh one’s memory. It is also a perfect complement to other introductory books on Bayesian statistics (e.g. the ones I know: Gill, Jackman, Kruschke, McElreath) and a stepping stone to more complex books (BDA3). Given the “Gelmanian” flavour of the book, it is a good addition to the “Stanverse” and the newly published “Regression and Other Stories”.
After an engaging introduction that spells out, among other things, the benefits of the Bayesian approach, Lambert divides his book into four parts.
Part I introduces Bayesian inference and contrasts it with frequentism. The discussions about the nature of parameters, probability and subjective/objective inference are particularly enlightening for beginner statisticians because they raises deeper and justified discussions that are typically dismissed as “philosophy”. Chapter 3 delves right into the technicalities of probability distributions, providing useful clarifications and visualizations along the way (e.g. the difference between probability and likelihood, between probability mass and probability densities, what is a valid probability distribution).
Part II of the book decomposes (and demystifies) the intimidating Bayesian formula into several chapters: the likelihood (where there is also an illuminating discussion of maximum likelihood estimation and how it differs from Bayesian analysis), the prior, the denominator and the posterior. While it can seem lengthy and daunting for beginners, this part of the book works very well and deserves to be studied in depth.
Part III focuses on analytic Bayesian methods (i.e. when the posterior can be calculated rather than estimated). While this part of the book opens with a much needed “introduction to distributions for the mathematically uninclined”, I am not sure about the usefuleness of the chapter on conjugate priors, given that they are not widely used. Lambert admits as much and argues that we need to discuss the issue to see why MCMC is so useful. Other books omit this issue altogether and manage to keep the exposition of MCMC understandable for beginners. The space devoted to this chapter could be relocated to part V of the book where more applied examples of Bayesian regression could be shown, which is always useful for students.
Chapter 10 on model evaluation introduces the handy idea of posterior predictive checks (comparing simulated data generated by the model and the actual data, which is well explained with the example of mosquito recaptures, showing how a binomial model compares to a beta-binomial one). Lambert also presents various information criteria and cross-validation measures (AIC, DIC, WAIC and LOO-CV). It would have been nice if these measures would have also been discussed in the context of model averaging or stacking. The sensitivity analysis (10.8) seems to be more of a robustness analysis: in the next edition, if hopefully there is one, it would be instructive for students to have an example of sensitivity analysis to confounders from a causal inference perspective. This part finishes with a (long) discussion of how to use priors to make Bayesian statistics objective through weakly informative priors.
Part IV on computational Bayes introduces important concepts on various iterations of the Markov Chain Monte Carlo and algorithms exploring the posterior distribution (Random Walk Metropolis, Gibbs sampling, and Hamiltonian Monte Carlo). Devoting several chapters to this issue is perhaps too much for beginners to stomach and can test the patience of many readers. Other books on Bayesian statistics have explored this issue in fewer chapters, to great effect. But this longish excursion into parameter estimation is compensated by Lambert’s talent to find very good analogies that help students understand what this is all about (e.g. MCMC parameter estimation is like trying to map out mountains) and some very good explanations of key concepts (e.g. divergent iterations or effective sample size). A very minor point: perhaps the chapter on Stan could explain where the name of the software comes from (mathematician Stanisław Ulam).1
Part V on regression models follows an original pattern: usually, stats books start with simple models and build towards more complex models, like hierarchical ones. But Lambert reverses this order by suggesting that hierarchical models should be the default option. Many beginners can be scared by complex concepts like partial pooling and non-centered models. But again, Lambert does a brilliant job explaining these ideas. Sadly, given Lambert’s pedagogical skills, there are only three chapters dealing with applied regression analysis. More examples (for instance, there is only one small section dedicated to interactions, the chapter on GLM is very short, examples with a time dimension could be useful, not to mention missing data and measurement problems) and perhaps more material on causal inference could be really useful for social scientists (political science, international relations, public policy).
To summarize, Lambert’s book is certainly an instant classic in introductory Bayesian books. From the point of view of students in political science, international relations, public policy and so on, the book could be improved on several levels for the next editions. For instance, it’s too long on certain topics (e.g. conjugate priors, various MCMC algorithms). The use of raw Stan is commendable but can be daunting and quite demanding for beginners (especially in non-economic social sciences on the MA level) who have to learn Statistical and Bayesian concepts as well as R at the same time. Perhaps the next edition could either convert the code to R packages like rstanarm or brms to expand the universe of examples (extremely important for students), building for instance on the wonderful work of Solomon Kurz. Such improvements would make this marvellous book more approachable for students in social sciences - that is, where it is needed most.
PS: I wish to thank Alex Moise and Aki Suzuki for feedback and discussions!
English readers can peek into the memoirs of Stanisław Ulam, while Polish readers can learn about the mathematical school where Ulam started. ↩