Tackling Selection Bias in Sentencing Data Analysis



For reasons of methodological convenience statistical models analysing judicial decisions tend to focus on the duration of custodial sentences. These types of sentences are however quite rare (8% of the total in England and Wales), which generates a problem of selection bias, and raises questions about the external validity of much of the literature on key Criminological and Legal topics (e.g. discrimination, deterrence, court cultures).

While this problem has been acknowledged for more than four decades no adequate solutions are presently available to sentencing data researchers. Some have relied on left-censored Tobit models to specify the duration of custodial sentences while simultaneously incorporating non-custodial outcomes as if they were somehow equivalent to negative days in prison. Distributions of custodial sentences, however, do not really resemble a type of left-censored distribution, which violates the parametric assumptions of this approach. Another group of researchers has relied on two-stage Heckman adjustments, implicitly assuming that the sentencing process can be divided in two stages, a decision to imprison, followed by a choice of the duration. Once again this is not a realistic assumption, which can be demonstrated by the impossibility to find a valid auxiliary variable that could be affecting the probability of determining a custodial sentence, while simultaneously being unrelated to the duration of such sentence.

This project has developed an original approach based on Bayesian statistics, aggregated views from judges, and the new sentencing guidelines, capable of modelling simultaneously custodial and non-custodial outcomes. Specifically different distributions of the relative severity of four major sentence outcomes (fines, community orders, suspended sentences, and custodial sentences) are specified into the same model. This solution does not only eliminate the problem of selection bias; by making use of the information available on non-custodial outcomes (i.e. duration of suspended sentences, fine amounts, etc.) it is also more efficient than any of the alternative approaches used in the literature.


The Sentencing Council for England and Wales has adopted the scale of sentence severity developed in this project. This scale is currently used in every assessment of the impact of their sentencing guidelines on sentence severity.

Publications and outputs

Pina-Sánchez, J., Gosling, J. P., Chung, H., Geneletti, S., Bourgeois, E., and Marder, I. (2019). Have the England and Wales guidelines influenced sentence severity? An empirical analysis using a scale of sentence severity and time-series analyses. British Journal of Criminology.

Pina-Sánchez, J. and Gosling, J. P. Tackling selection bias in sentencing data analysis: A new approach based on the estimation of scale of severity and Bayesian statistics. (under review)

Project website