Playoff plus-minus data Part I: The value and the noise

During the Greatest Peaks series, you probably noticed I referenced 3-year playoff plus-minus quite a bit. And some of you may have wondered “why 3 years?” or “how valuable is that data?” To which I’d say:

  1. Plus-minus data is super valuable
  2. Playoff plus-minus data is also super valuable
  3. Plus-minus data is noisy
  4. Playoff plus-minus data is super noisy

A ton of words have been written on how noisy plus-minus is, why it isn’t a player ranker, and how it’s applied without understanding role and context. These issues still apply in the playoffs, especially the sample size caveat, which is why I’ve defaulted to multiple years of data. But that doesn’t entirely “solve” the issue of sample size.

The multi-year tradeoff

To increase sample, just add more games. But hundreds of games can’t be played in a short period of time, so changes over time become a new confounding variable.

For instance, sample size is not much of an issue in a 10-year adjusted plus-minus study…but player aging will be. Studies like that blend together someone’s play at 21 with his play at 31 and effectively treat is as the “same” player. This is a major issue if we want to figure out how high someone peaked, to say nothing of possible role changes and team mutations.

It’s even tricker in the playoffs, because most players won’t play the same role for 10 consecutive playoffs. (Most players don’t make 10 consecutive playoffs.) We need a sample that spans fewer years, but as we shrink the number of seasons, we run back into the noise of a small sample.

Ugh.

Three years is just about the smallest sample I’m comfortable looking at, and even that can be a small sample for some players. 1

Single-season noise

So one-number metrics that use plus-minus data are incredibly noisy in a single postseason. To help curb this issue, we can use “regressed” plus-minus data — plus-minus that has been anchored to something more realistic for each player. Single-season Augmented Plus-Minus (AuPM), designed to estimate RAPM in smaller samples without play-by-play data, is built on the back of this regressed plus-minus data.

One-number metrics relying on raw plus-minus data are adding a ton of noise. When we see Damian Lillard’s +43.8 on/off in this year’s first round against the Nuggets, we’re really just adding randomness to something designed for a larger, steadier sample. There’s no stat that’s ever been trained on NBA data where +40 and +50 on/off values are normal.2

As an aside, what happens in small playoff samples for NBA stars is pretty interesting. They almost never have huge bumps in on-court value. Instead, it’s the super noisy off court values that create extreme results like Dame’s, or even the occasional negative on/off values for superstars like Nikola Jokic.

This year, Portland was +5.5 with Lillard and -38.8 without him. You’re probably thinking “how can a team be -39!?” and the answer is basically “they can’t be.” At least not in any remotely meaningful sample. But they can easily be in a really small sample, and it turns out that in the 50 minutes Lillard sat, Portland was outscored by 35 points. That’s not actually atypical, but it can mislead anyone who doesn’t carefully check the sample size before citing a stat.

Stats like RAPTOR are designed based on regular season data, where -15 is just about the floor for an NBA team. The whole reason on/off has value is because it’s a reasonable indicator of how much the team improves with Lillard on the floor — it’s an estimate of his impact on team over a reasonable sample. Whether the Blazers are truly +2 or -5 without him, they are certainly not -39 when Dame sits on the bench. But every one-number metric (including raw AuPM) “thinks” that they are. So Lillard’s astronomical, off-the-charts value in something like 538’s RAPTOR is largely coming from a bench sample that doesn’t reflect what it’s intended to reflect. 3

This is fundamentally different than box score data. If someone has an absurd four game series and posts a BPM of something ridiculous, like +15, we can say that they were incredibly productive/good/valuable in a small sample. Even though we don’t expect it to last, there is a huge value add from white hot production. We cannot say the same thing about plus-minus data in small samples: Lillard’s Blazers were outscored thoroughly in his 50 bench minutes, but that has nothing to do with Lillard’s play…and it might not have much to do with the play of his teammates!4

Using the regressed technique I’m currently running for single-season AuPM, Lillard’s new playoff on/off is +9.4, and most importantly, his off value shrinks to -5.5. Again, Portland might really be -15 without Dame against the Nuggets, or they might be -2, but -5.5 is a much more likely value than -39 based on everything we know, including Portland’s regular season performance with Lillard off the court.

But why is playoff plus-minus data so valuable?

For the same reason it’s so key in the regular season. Counting stats (points, rebounds, assists) don’t necessarily equal impact. But changes in the scoreboard are everything — it’s really the thing we care about when we reference “impact.” Real-world plus-minus data can never nail this perfectly, but that’s because we don’t shuffle teammates around in a million game sample. (I can dream, right?)

The scoreboard5 has shown us the value of rim protectors, floor spacers, shot creators and more. But almost all of that insight is regular-season basedThat probably hasn’t been that big of an issue until the last decade. But what if isolation midrange scorers (“Hero Ballers”) are slightly more valuable in the playoffs? What if shot blocking at the rim is less valuable in the playoffs? We need to look at non box score playoff data to understand these trends, and to do that we need to make some sense of the noisy postseason results on the scoreboard.

Before we take our deep dive, let’s make some other methodological notes:

1. The Off sample is the noisiest component

It’s hard to get a big off sample for most players in the playoffs. After 500 minutes, If a team has a bad quarter — let’s say 35-15 — a player’s off value will dip by almost 2 points per 48 minutes, and if a team wins a quarter like that with a player on the bench, the off value can swing the other way nearly 2 points.

Regressing plus-minus data in the postseason helps most with stabilizing the off sample. It’s still influenced by the postseason results (the more off minutes, the more weight the playoff sample takes on), but a ton of the noise is removed.

2. Opponents are not all equal

Is it easier to outscore a 40-win team by 7 points per game or a 60-win team by 7 points game? We know we should think about opponent quality when comparing ON values, but we should also think about them when comparing net on/off, because some players can be difference makers against an 8th seed with a good team in place, but those same players won’t be swinging the game quite as much against a 1st seed on the road.

Adding some kind of adjustment for opponent quality is a clear next step for future work on this data in my estimation.

3. HCA

Technically, we could even adjust for home-court advantage, because higher seeded teams can often play 5-game series where they quietly rack up a few extra home games, and home-court is another minor factor that can skew on-court values and even on/off numbers for some players. (eg maybe role players perform better at home than on the road, so excessive home games drive up a star’s off-court value.)

4. Look for interactions and collinearity with teammates

Which is more impressive, having a +10 on court value when two other teammates have a similar value, or having a +10 on court value when the next closest teammates are at +3?

As far as I can tell, this is a relatively rare issue, since most teams don’t platoon lineups around a single star anymore, and so huge differences like this are a bit rarer to find. Still, there are some teams with three or four starters posting similar plus-minus profiles while playing similar minutes, and we should take note of that.

Augmented Plus-Minus adjusts for these kinds of effects to some degree, but in its current version, it doesn’t detect the “platoon squad” very well, which means Frank Vogel’s old Indiana Pacers make each other look good because they often shared the court together. 6

5. Beware of when injuries occur

This is the Steph Curry corollary.7 If a star misses first round games — especially when his team is a higher seed — that team can beat up on weaker teams, which inflates the off court value for that star. On the flip side, if a star misses later round games in the playoffs, we could see the opposite effect, with a supporting cast that’s overrun in a ton of minutes against a top-seed’s starters. The Kawhi Leonard corollary.8

In Curry’s case, he returned for good in the ninth game of the 2016 playoffs and the seventh game of the 2018 playoffs, and the difference between his raw 4-year on/off (+8.3) and his on/off in only the games he played (+17.4) is substantial. Later in the series we’ll look at how this impacted his single-season AuPM.

6. Watch out for uneven samples

This final tricky trap is related to the last one, but more of a mathematical oddity. If a player logs a ton of minutes on a great team one season, with an on/off of absolutely 0 (the team is same with or without him), then the next season they get old and head to the bench and the team isn’t good, but again he has an on/off of 0, what happens to his net on/off over the multiple seasons?

It looks great! Despite the player having a 0 on/off in each individual season!

Normally this isn’t a big deal because weak teams don’t advance far in the playoffs, but in more extreme cases it would skew someone’s results if their role changed drastically. Imagine Journeyman Bob. In 2020 he posts a +4 on court and +4 off court value in the playoffs, but his team is bounced in the second round. He’s then traded to a weaker team who doesn’t play him too much, posts a -4 on court and -4 off court value before they’re eliminated in the second round. Journyeman Bob has made no impact on either team, and his overall on/off would be…that’s right, +6.6.9

This will rarely rear its head, but it’s something to look out for. (David Robinson’s 1998-2001 stretch might be a real life example.) It’s also a great reason to have single-year data to reference.

Accuracy

So how close can AuPM get us to something in the ballpark of a “postseason RAPM?”10

Right now, I’d say “kind of close.” If we make the adjustments above, we should get pretty nice value indicators from AuPM, even if those don’t perfectly match APM/RAPM.11

There are, no doubt, improvements to be made to something like this — a basic one would be some attempt to control for opponent quality in the playoffs versus the more neutral regular season environment — but at least this gives us a good start for incorporating non-box data into meaningful playoff analysis.

In Part II of this series, we’ll look at playoff plus-minus (and AuPM) among players since 1997.

  1. I like to look at more than just a 3-year sample.
  2. I erroneously said Lillard was +54.5 last week, because, to the point, when the samples are that small, excluding a few minutes from the calculation shifts the value by 11 points.
  3. Lillard’s playoff RAPTOR was +21.6, double the value of the third-best playoff performer, Luka Doncic.
  4. For example, a cold shooting night from Portland combined with a hot shooting night from the opposition can lead to a 35 point loss.
  5. and by extension, plus-minus perspectives at the game-by-game or play-by-play level.
  6. That’s Paul George, David West, George Hill, Roy Hibbert and Lance Stephenson.
  7. From Steph’s missed early playoff games in 2016 and 2018.
  8. From his 2017 injury in Game 1 versus the juggernaut Warriors. Although we might need another candidate, since the Spurs series is balanced by the Spurs closing out Houston by 39 points in a game without Kawhi.
  9. Assuming he played about 500 minutes in 2020 and sat for 500 in the following season.
  10. Actual playoff RAPM is insanely noisy because of the sample sizes.
  11. AuPM performs well in general in regular season tests — I’d say slightly worse than PIPM from what I’ve seen. Also, Jacob Goldstein calculated multi-year PIPM in the PS somewhat differently, so when it comes to plus-minus data specifically, I defer to playoff AuPM.