Advanced Stats Series




Welcome back, and a happy holiday season to you all! Last month we left off with the advent of tracking shots on goal, and making an attempt to justify finding even the smallest of competitive advantages through mathematics and analysis. Post-expansion, the NHL game transitioned through phases of exorbitant toughness, extreme skill gaps between players like Gretzky, Dionne, and Bossy and the players of old, and a clutch-and-grab style that brought a frustrated Mario Lemieux to retirement in the 1990’s. Official statistical measurement remained stagnant through this period however; focusing on scoring, shots and penalty minutes, but as the game evolved and the new millenium dawned, new ideas began to surface.


The Buffalo Sabres are currently a bit of a laughing stock around the NHL in 2017, but back in 2001, writer Tim Barnes, writing under the pen name Vic Ferrari, heard then-GM Darcy Regier talking about shot differentials as an evaluation metric the Sabres were exploring. Some time later, Barnes began tracking and publicizing information on a new metric encompassing all shots, including blocked and missed shots. Barnes named this metric after Buffalo goaltending coach Jim Corsi and thus, Corsi was born. Over time, websites like HockeyAbstract, ExtraSkater and BehindTheNet began tracking data from games and publicizing their findings in easily navigable tables, and by the time the 2010’s came around, the analytics revolution was well underway.


Before we move forward, I’d like to reiterate a major point of this series of articles. There is a strong need for an understanding of the meaning and implication of the term correlation. It seems we hear of new correlations in data sets that are framed as impactful in our society every week. Food X may lead to brain cancer, soft drink Y leads to more obesity in children, etc. What is often not represented is the idea that correlation absolutely does not imply causation. In layman’s terms, just because an experiment shows that thing A often leads to result B does not mean that thing A will lead to result B. For example, if you smoke a pack of cigarettes a day for fifty years and you don’t develop lung cancer, it doesn’t mean that smoking doesn’t lead to lung cancer or that science is wrong. Many experiments on large populations around the world have shown a correlation between smoking and lung cancer, but it doesn’t guarantee a thing on an individual basis. The same principles apply in hockey, even if some may say the game is too noisy to analyze. I would ask the same people if they consider the causes of something like obesity or lung cancer to be just as noisy. The correlation vs. causation discrepancy only fuels more research instead of discrediting previous work.


Research by names like Tyler Dellow, Gabriel Desjardins and Vic Ferrari have shown clear correlations between Corsi and goals for percentage, which is an extremely strong correlate to wins. The mantra quickly became “if your team is shooting the puck more than the other team, you’re more likely to score more, therefore you’re more likely to win more.” Nowhere in that sentence is a guarantee, but as seen in examples like the 2013-14 Toronto Maple Leafs, the 2014-15 Colorado Avalanche, and so far the 2017-18 Ottawa Senators it appears that teams outperforming their expected results from Corsi analysis often fall back to Earth over time.


Corsi is not without shortcomings, and its offshoots began a revolution in analysis and tracking. The first obstacle was adding context to raw measures. For example, if a player