Advanced Stats Series
couching
2018-01-09
Welcome back, and a happy holiday season to you all! Last month we left off with the advent of tracking shots on goal, and making an attempt to justify finding even the smallest of competitive advantages through mathematics and analysis. Post-expansion, the NHL game transitioned through phases of exorbitant toughness, extreme skill gaps between players like Gretzky, Dionne, and Bossy and the players of old, and a clutch-and-grab style that brought a frustrated Mario Lemieux to retirement in the 1990’s. Official statistical measurement remained stagnant through this period however; focusing on scoring, shots and penalty minutes, but as the game evolved and the new millenium dawned, new ideas began to surface.
The Buffalo Sabres are currently a bit of a laughing stock around the NHL in 2017, but back in 2001, writer Tim Barnes, writing under the pen name Vic Ferrari, heard then-GM Darcy Regier talking about shot differentials as an evaluation metric the Sabres were exploring. Some time later, Barnes began tracking and publicizing information on a new metric encompassing all shots, including blocked and missed shots. Barnes named this metric after Buffalo goaltending coach Jim Corsi and thus, Corsi was born. Over time, websites like HockeyAbstract, ExtraSkater and BehindTheNet began tracking data from games and publicizing their findings in easily navigable tables, and by the time the 2010’s came around, the analytics revolution was well underway.
Before we move forward, I’d like to reiterate a major point of this series of articles. There is a strong need for an understanding of the meaning and implication of the term correlation. It seems we hear of new correlations in data sets that are framed as impactful in our society every week. Food X may lead to brain cancer, soft drink Y leads to more obesity in children, etc. What is often not represented is the idea that correlation absolutely does not imply causation. In layman’s terms, just because an experiment shows that thing A often leads to result B does not mean that thing A will lead to result B. For example, if you smoke a pack of cigarettes a day for fifty years and you don’t develop lung cancer, it doesn’t mean that smoking doesn’t lead to lung cancer or that science is wrong. Many experiments on large populations around the world have shown a correlation between smoking and lung cancer, but it doesn’t guarantee a thing on an individual basis. The same principles apply in hockey, even if some may say the game is too noisy to analyze. I would ask the same people if they consider the causes of something like obesity or lung cancer to be just as noisy. The correlation vs. causation discrepancy only fuels more research instead of discrediting previous work.
Research by names like Tyler Dellow, Gabriel Desjardins and Vic Ferrari have shown clear correlations between Corsi and goals for percentage, which is an extremely strong correlate to wins. The mantra quickly became “if your team is shooting the puck more than the other team, you’re more likely to score more, therefore you’re more likely to win more.” Nowhere in that sentence is a guarantee, but as seen in examples like the 2013-14 Toronto Maple Leafs, the 2014-15 Colorado Avalanche, and so far the 2017-18 Ottawa Senators it appears that teams outperforming their expected results from Corsi analysis often fall back to Earth over time.
Corsi is not without shortcomings, and its offshoots began a revolution in analysis and tracking. The first obstacle was adding context to raw measures. For example, if a player has a Corsi For percentage of 52%, but if the team as a whole sits at 55%, is that the mark of a positive possession player? If that same player is generating a ton of Corsi attempts while on the ice, but many of the attempts are being blocked by the opponent, is that as valuable? Relative metrics (a players impact on a statistic relative to their team) and Fenwick (unblocked shot attempts) appeared in quick succession as 2010 approached to tackle such issues. Clever analysts discovered differences in play when a team is at home vs. away, what the score is, and the quality of the players they were playing with and against. Score adjustments are arguably the most important of all however, as teams down two goals or more outshoot their opponents far more often, but games are often within a goal or tied, so Corsi/Fenwick close is often relied upon heavily. Players who could drive shot differentials in close games at even strength against top competition became a gold standard.
Next month we’ll look into the different use cases for different metrics, but for now, I would recommend exploring Corsica for as long as the website will work for you, as well as Micah Blake McCurdy’s excellent hockeyviz.com to see a graphical representation of various metrics. I will reiterate that all these metrics do not guarantee results nor a good player, but they do show trends in the right direction. Many teams have been picking off writers, websites and students to work in analytics departments, and most are showing clear signs of improvement. Those who have dabbled in the past but appear to have gone in a different direction such as Edmonton, Montreal and Florida, show a clear downward trajectory. Again, these trends are just trends, and may not be linked to a reversion to older reasoning, but the changes these teams have made were almost universally condemned by the analytics community. There may not be causation at play, but there is significant correlation, and that’s really all we can provide as statisticians.
Will Scouch