Why am I seeing a negative R^2 value?

For nonlinear regression models where the distinction between dependent and independent variables is unambiguous, the calculator will display the coefficient of determination, \(R^{2}.\)

In most cases this value lies between \(0\) and \(1\) (inclusive), but it is technically possible for \(R^{2}\) to lie outside of that range.

This might initially appear strange. First, a common way to interpret \(R^{2}\) is as the fraction of variability in the dependent variable that the model accounts for, and this interpretation only makes sense for values between \(0\) (accounts for none of the variability) and \(1\) (accounts for all of it). Second, the name itself gives the impression that some quantity \(R\) is being squared to produce a result. Either way, it seems that \(R^{2}\) should probably lie in \([0,1]\), or at the very least it should be nonnegative.

The computational definition of \(R^{2}\), however, is divorced from both the notation and this common interpretation. Apart from the special case of a linear regression model with an intercept term, \(R^{2}\) is not actually equal to the square of any particular quantity. It is calculated by taking the mean of the squared errors, dividing by the variance of the dependent variable, and subtracting this ratio from \(1\). Since there is no limit to how bad a model’s predictions can be—and thus no limit to how big the errors can get—it’s possible for this ratio to become arbitrarily large, and \(1\) minus a large value is negative.

In practice, \(R^{2}\) will be negative whenever your model’s predictions are worse than a constant function that always predicts the mean of the data.

Was this article helpful?
66 out of 105 found this helpful