Triangulation is an important concept in the social sciences. It allows us to hone in on a result without having a singular, definitive measurement. In Part III of this historical impact series, I ran two huge regressions based on 60 years of game results to determine whose presence correlated the most with his team’s improvement. The differences in those WOWYR results — presented using a “prime” and career value — demonstrated some instability in the regression. So if we want to be confident about how valuable older players were, we’ll need snapshots from different perspectives. We could use a little triangulation.
How accurate is WOWYR?
Prime WOWYR can match a 17-year adjusted plus-minus (RAPM) study for predicting lineup results at the game level. WOWYR correlates well with players from that 17-year RAPM set (from 1997-2014, by Jerry Engelmann), with a correlation coefficient of 0.67 (for scaled results) and an average error (MAE) of 1.1 points. Every player was within 4.7 points of his RAPM value, although among higher-minute players the max error was 2.9 points.
In other words, over long periods of time, WOWYR data and RAPM are quite similar — all players will be within three points of each other and most will be within a point. We wouldn’t expect the values to be identical, because WOWYR and RAPM are measuring two similar, but slightly different, events. Still, despite the convergence, WOWYR is plagued by two major problems.
First, it’s sample size isn’t large enough for every player. Sometimes players log years with the same combination of teammates or even a single teammate (Stockton and Malone). Although they played hundreds of games, the play-by-play analogue would be Wilt Chamberlain logging 45 minutes a game, and then trying to infer his value based on 250 minutes of time off the court. It gives rise to the dreaded collinearity issue, and we’re less confident in those kinds of results.
Removing a season or three of data can alter a player’s values by a few points per game, which isn’t always a result of him playing differently in those seasons. In order to accurately solve for “what’s the most likely impact for Larry Bird on all of his lineups?” we need to know about the value of his teammates, like Reggie Lewis. And since Lewis only played a few years, his estimate is a bit fuzzy, and that in turn effects Bird’s estimate.
Second, like any RAPM study that’s too long, it smoothes over differences between peak years, ignoring aging and injury. There are some ways around this — one of which is to use smaller time periods — but other potential solutions are for another post.
10-Year GPM: Another perspective
WOWYR is one perspective; it’s a bunch of weighted WOWY data that is regressed. Building off of the the same idea, Backpicks reader Zachary Stone has tackled historical games with a slightly different approach that I’ll call GPM (Game-level adjusted Plus-Minus). GPM is more analogous to “pure” RAPM in that each game result is a row in the equation, whereas WOWYR combines games and weights the lineups. The details of Zach’s version of GPM:
- It uses only players who played at least 25 minutes per game during a season, so those games where Draymond Green is ejected early still count as a game played for him.
- It uses a “replacement” player cutoff of 260 games. (The other studies below use 82 games.)
- It’s run on data from 1957-2017.
- (Technical detail: This version of GPM chose a lambda using the computationally expensive generalized cross-validation, not the chunkier k-fold method used for WOWYR in Part III.)
But there’s still the issue of time to consider. We don’t want the model thinking that Michael Jordan in his Wizard years is actually the Michael Jordan. So Zach ran the regression in 10-year slices, from 1957-66, 1958-67, 1959-68 and so on, and then grabbed each player’s best 10-year run. Finally, he scaled the results to allow for apples to apples comparisons across eras.
In theory, this will yield a better ballpark of those players with relatively consistent 10-year primes. Combined with WOWYR, this will give us multiple snapshots of the past based on game-level results. Additionally, I’ve added an alternative version of WOWYR to the table below that uses 20 minutes per game as a cutoff for qualifying players — a version that was slightly worse at predicting lineup results than the prime WOWYR published in Part III, but contained enough variability to throw into the mix.
Together, this triangulation won’t produce retina display clarity of past players, but it’s not exactly fuzzy in most cases. Anyone who fares well in all three of these areas was likely impacting the scoreboard when they played. In the table below, I’ve averaged the three regressions and included the variability among the three as a measure of stability (smaller is better). The “GPM years” column is the period of time Zach’s model picked for each player – some of the lesser names like Don Buse have been excluded:
|Player||Scaled WOWYR||Alt Scaled||10-yr Scaled GPM||GPM Years||Avg.||Variability|
Because this lacks the granularity of play-by-play data, more interpretation is required per player. For instance, guys like Stockton and Malone suffer from small-sampled collinearity; based largely on the 18 games Stockton missed to start the 1998 season, the models have no choice but to solve for the two of them by giving Stockton a larger share of credit. (Utah improved in its final 64 games that year.) Meanwhile, Bird has lots of instability in his result because of Reggie Lewis; in the 1955-84 set from Part II of this series, Bird was first among all players based on his first five seasons.
Next, although Zach’s GPM doesn’t have this problem — he scaled the results of each 10-year run — WOWYR does not account for varying point differentials over the years . So someone like Bill Russell requires an upward mental adjustment, while Wilt Chamberlain’s WOWYR scores are inflated a touch compared to Russell’s because of the early ’70s expansion. GPM is the only regression above that accounts for era differences, and it peaks Wilt from 1960-69, slotting him behind Jerry West and Oscar Robertson.
About half of the MVP Shares in history belong to the first 22 players in the above table. John Havlicek, one of a few non-MVPs in the top 20, is likely aided by the collapse of Charlie Scott and premature demise of Dave Cowens. Still, his results are impressive. Paul Pierce’s are too, although his number is likely inflated by the models having no way to account for the true “replacement level” quality of Kevin Garnett’s teammates in Minnesota. And — scoring blindness alert! — I think we’ve all underestimated Dikembe Mutombo, who looks quite good in non-box metrics.
Then there are the decorated players who struggle in these regressions. Dwyane Wade’s disappointing number is likely the result of two injury-plagued seasons dragging down his value, along with two more years in physical decline. Allen Iverson, echoing his play-by-play numbers, shows no evidence of playing at an MVP level. George Gervin, dampened by a few post prime years, posts a small value given his five consecutive top-six MVP finishes.
There are still future tweaks that can be made to these models. However, they will always have certain limitations, and at this point I’m confident in saying that there’s not too much mileage left from them. These results paint a fuzzy picture for some players, and compelling arguments for others. Even for players with strong signals, the precision of the models should not be overlooked; they are not for declaring that a player was exactly 1.2 points more valuable than a contemporary. However, for most players across NBA history, they provide a fairly accurate approximation of value.