If you sort basketball-references’s 3PT% last season, you’ll see Drew Eubanks is the best 3PT shooter at 100%.
Instinctually, everyone knows he’s not the best three point shooter and it’s just an artifact of him taking 2 shots and making both of them. Because of this, it’s common to have qualifying statements like “Joe Harris is the best 3PT shooter last season, among players with at least 50 attempts”. But where does 50 come from? Why not 20? Why not 200?
Predicting 3PT%, considering shot attempts
We can use a principled model that predicts a player’s 3PT% that takes into account how many attempts they’ve taken so far.
If a player is making 50% of their shots on 300 attempts, we’re probably pretty confident they’re a 50% shooter. Alternatively, if a player has taken 2 shots and made 1 of them, what do we do? Our model takes into account the uncertainty in the prediction due to the low shot volume. And it looks around to the rest of the league to see what other players typically shoot and considers that. And as a player shoots more, it automatically weights the player’s actual numbers more than the leagues baseline.
Simultaneously, the model is also learning how much variation there is across players in the league. If a player goes makes 80% on 20 shots, it looks across every other player and notices 80% isn’t a realistic 3PT%, so it attributes his 3PT% to luck, and dials down the prediction accordingly.
Going back to Drew Eubanks, our model predicts he’s a 36% 3PT shooter.
But there’s two key things to notice
The uncertainty in the model’s prediction of his 3PT shooting is massive. This makes sense though! He’s only taken 2 shots, and made them both. So we don’t really have much to go on.
Even though he has a high 3PT percentage, the model is confident he’s worse than Steph Curry. Also note that the model is more certain about Steph Curry’s 3PT%, since he’s taken a lot more shots.
So who are the top 3PT shooters in the 2022 season? We don’t need a hacky qualifying statement to answer that question anymore.
I think everyone would agree this list makes a lot more sense than the basketball-reference screenshot at the top.
Looking Ahead
We can extend this model in principled ways. The most obvious way is to use other things we know about the player to refine their prediction.
For example, Drew Eubanks is a center, and we know typically centers shoot lower on 3-pointers. So we should update our prediction of his three point shooting on this alone. Also, his free throw percentage was at 79% last season. That must tell us something about his 3-point shooting. We’re going to throw all of this into the model in a principled way.
Stan Model
You can stop reading. This section is only for people curious about the underlying probability model. Either because they want to understand the details or they want to expand on it themselves. We’re going to expand on this model in the future, so this is a good starting point for understanding the simpler model.
// Model 3PT%, hierarchical
data {
int<lower=0> players;
int<lower=0>n_attempts[players];
int<lower=0>n_successes[players];
}
parameters {
vector<lower=0, upper=1>[players] theta;
real<lower=0, upper=1> theta_bar;
real<lower=0> sigma_bar;
}
model {
theta_bar ~ normal(0, 10);
sigma_bar ~cauchy(0, 5);
theta ~ normal(theta_bar, sigma_bar);
for(player in 1:players) {
n_successes[player] ~ binomial(n_attempts[player], theta[player]);
}
}