Refining Our 3PT Shooting Prediction Model
This time, we're even more confident Drew Eubanks isn't the best 3PT shooter
In our last post, we detailed our model that predicts 3PT shooting. The starting point was looking at basketball references top 3PT shooting percentages from last year (Drew Eubanks top with 100% 3PT shooting) and developing a model that takes into account shot volume.
How can we get a better prediction of Drew Eubanks 3PT shooting when he’s only taken 2 shots?
Our previous model looked around the league and said if Drew Eubanks only attempted two shots (and made both of them), his 3PT% could be anything. It could be anywhere from 25% to 50%, but its most likely somewhere around 32-37%.
But we know more about Drew Eubanks. We know he’s a center. That alone might give us an indication his 3PT shooting is less likely to be amazing. We know his free throw shooting. In 2021, he shot 72.6% at the line, also indicating his 3PT% is not likely to be amazing. All of his other stats can help our model give a better prediction for his 3PT%.
Let’s start by making a “Refined Model” that incorporates free throw shooting into the 3PT% prediction.
At first glance, our 3PT% predictions in our refined model look very similar to our original model. In both models, Drew Eubanks is very likely a worse shooter than Steph (obviously), and the models are much less certain about Drew’s 3PT% than Steph’s (since it has less data to go on). But the key thing to notice here is the gap between Steph and Drew in the refined model. The Refined Model is more certain that Steph is better than Drew Eubanks. Essentially, the model looked at Drew Eubanks free throw shooting and said: “There’s no way you’re as good as Steph at 3PT shooting”.
Now, let’s directly compare the original and refined models for each player.
The three key things to notice are:
For Drew Eubanks, the refined model lowers his 3PT% prediction. This makes sense: his free throw shooting indicated he probably shoots worse on 3 pointers.
The refined model has a tighter prediction about Drew Eubanks 3PT% compared to the original model (the spread is less). Again, this makes sense: the model is using more data in the prediction, so it can be more certain.
The refined model didn’t have much of an effect on Steph. This is because Steph has such a high volume of 3 pointers, incorporating his free throw percentage doesn’t really move the needle.
As an extreme example, let’s look at Steven Adams. In 2021, he took three 3PT attempts and made none of them. The original model didn't have much to go on, but incorporating his 44% free throw shooting, the refined model is fairly certain he is a bad 3PT shooter.
Looking ahead
We can keep adding more stats into the model to help refine our 3PT predictions. We fully intend to just dump everyone’s entire stat line into the model in the next iteration. But we have so many other models we want to show you guys, maybe we’ll come back to this in the future. Let us know if you prefer further refining this model or entirely new models.
Stan Model
You can stop reading. This section is only for people curious about the underlying probability model. Either because they want to understand the details or they want to expand on it themselves. As you can see, this model has grown dramatically in complexity since last time. Modeling the noise in free throw shooting was based on this.
// A binomial regression model.
// Models the Three Point Percentage
// using Free Throw data.
// Two extensions:
// 1) Measurement Error in free throw data modeled
// 2) Hierarchical intercepts for each player
data {
int<lower=0> players;
int<lower=0> n_attempts[players];
int<lower=0> n_successes[players];
int<lower=0> t_attempts[players];
int<lower=0> t_successes[players];
}
parameters {
vector[players] alpha;
real alpha_bar;
real<lower=0> sigma_bar;
real<lower=0>beta;
real mu;
real<lower=0> sigma;
vector<lower=0, upper=1>[players] three_point_pct_true;
}
model {
// Priors
mu ~ normal(0.3, 10);
sigma ~ cauchy(0, 5);
alpha_bar ~ normal(0, 10);
sigma ~ cauchy(0, 5);
three_point_pct_true ~ normal(mu, sigma);
alpha ~ normal(alpha_bar, sigma_bar);
beta ~ normal(0, 10);
// Measurement Error
t_successes ~ binomial(t_attempts, three_point_pct_true);
n_successes ~ binomial_logit(n_attempts, alpha + beta * three_point_pct_true);
}
generated quantities {
vector<lower=0, upper=1>[players] player_estimate;
for(player in 1:players){
player_estimate[player] = inv_logit(alpha[player] + beta * three_point_pct_true[player]);
}
}