7 Comments
User's avatar
Grant Marn's avatar

Several years ago at an analytics conference, Bill James described a process that he believed was more likely to yield greater clarity and useful insights. Specifically, you should always start with a question that is interesting or provocative (preferably a large or significant question) and THEN go find data to answer it. Too often these days, he said he reads some piece where the author clearly went the other direction - they found or created some new data and then went hunting for a question - any question - that it answers. The results are both obvious and suboptimal.

The problems with this approach are many. Your question and data are often too narrow in scope and less likely to be truly insightful or invite further research or questions to advance the thinking. The data leads the question and too often excludes other valuable data that would be more probative either alone or in combination in searching for truth. It's all too siloed and limiting with insufficient collaboration or peer review.

Importantly, this approach reduces the analyst at a human level to simply "the [fill in the blank acronym] person" as in Cheryl is the xABOE% person and not a more sought after broader thinker on issues. In turn, the analyst is too often tied to their own data and overly uses it in future research to the exclusion of different and better sources or approaches.

It strikes me that the NFL analytics community is awash in this precise malady which continues to grow. Too many overly narrow data sources with increasingly illogical acronyms that are sold as presumptively insightful with small sample sizes and even smaller numbers of people who can explain them at a rudimentary level along with their limitations. It's become one black box after another that few really understand but "feel" smarter just by dropping the acronym on everyone..."Terry McLaurin's xQOR is better than Justin Jefferson's. Can you believe it!?"

Well no, actually I can't. Yet, somehow these sorts of abberant conclusions which belie common sense and what we observe week to week ironically add mystique and credibility instead of scrutiny to the purported number. We are instructed that solely because it's presented in numerical form or as a "grade" it's accurate and precise...even laughably to two decimal places.

It's one thing to have this in baseball where performance is more isolated, but it's utterly fatal in football given all the interdependencies at play. This hesitancy in the belief in the proliferating numbers is precisely why film is still around as a counter, and why this debate still exists. Analytics has failed to win the day in football. Imagine people in baseball arguing the need to review game film to reach conclusions - that is a symbol of how far football has to go.

Football requires a broader lens and holistic analytical approach versus the current myopic linking together of a series of unrelated data source acronyms and hoping that it says something...about something.

Thanks for raising the issue.

Expand full comment
Robbie Marriage's avatar

Grant man, use the publish button! People would subscribe to read this kind of thing. We've had this conversation before, so I'm just pulling your leg, but fantastic comment here.

It's crucial to go in asking the question. If you go into an analysis with the answer, not only will you end up with a really poor question as you eloquently explained, but you'll likely end up with a worse proof of your answer than you would've ended up with if you'd gone in asking the question. It's fun to create and use data, but if you can find no use for it, then there is no obligation to use it. I'd know better than anybody. I've invented two NFL QB statistics and haven't been able to find any use for either of them LOL.

Ironically, despite me knowing this very well, I can be described as somebody who violates all of the principles I was just claiming to subscribe to. Granted, I'm a storyteller. I've never claimed to be a football analyst, but I've been known to go into a project with a narrative in mind and intentionally marginalise data that doesn't support it. The difference is that 'marginalise' does not mean 'leave out altogether,' which is what a lot of people tend to do.

I know what you mean about analysts becoming 'the [insert metric here] guy.' Sometimes this is deserved. Sometimes it isn't. For instance, despite being fantastic at what he does, Ben Baldwin became known as 'the EPA guy,' and has to spend a silly amount of his time justifying a very easily understood statistic over and over and over. I would shudder to think what happens to those who use statistics with opponent adjustments, or any other component that's difficult to understand.

I think we've discussed before (in the context of WAR) how not even being able to explain a stat on a rudimentary level should disqualify you from using it. I don't know what WAR's strengths and weaknesses are. I have no idea whatsoever how it's calculated. Therefore, how can I feel comfortable using such a statistic in my analysis? To me, this also casts doubts on stats where the formula is private, like all the ESPN ones, where nobody being able to explain it is the point. I don't like that.

LOL if anybody thinks that the second decimal place on any football stat actually matters. I run into this all the time with Lamar Jackson lovers and haters, with the lovers saying 'did you know the Ravens' run offence is 0.04 EPA/Play better when Lamar is on the field?' and the haters responding that 0.04 EPA/Play doesn't matter, because it doesn't. On that particular debate, I am firmly on the hater side, but this isn't about Lamar Jackson on/off numbers.

I don't believe that data shouldn't be used to disprove misconceptions that arise from watching every week, but it should be done in an easily understandable way, not with a statistic that only stat majors can understand.

For instance, a cursory look at his career CPOE will reveal that CJ Stroud (for now) is still quite a middling player. Anybody who had the Texans as a SB contender based on him being elite were either projecting at best or simply incorrect at worst. CPOE is a very simple and accurate measure of arm talent. If you can make an analysis contrary to the popular belief in such a simple way, I say go for it.

Your eyes can only get you so far. My eyes tell me that CJ Stroud is quite an accurate thrower, but he isn't, and he never has been. He's spent a whole career so far hovering around average. Your eyes will also tell you he's quite good at avoiding negative plays. The data says he's not especially great at that either. These are the types of cases that sports data is really good for.

All in all, I think I have more faith in numbers than you do Grant, but we still agree on a lot of things about this whole 'proliferation of numbers in sports' debate.

Expand full comment
Robbie Marriage's avatar

This article is twofold. There is an initial how-to for how to use NFL data, and then a collection of data sources.

As far as the collection of data sources, all I can say is that I dig it. I respect and thank you very much, but as for the how-to parts, I have a bit more to say.

I think you hit the nail right on the head that you have to delve into the data asking a question. If you go in with your opinion entrenched already, you are going to end up way out in left field as far as the results you get. I also think you're very right that you must understand the nuances of the data that you're using, or you're going to make yourself look like a fool. I have a perfect example.

I could say that Patrick Mahomes has been a negative CPOE passer for years now, and doesn't look to be improving, and therefore his arm talent is a weakness for the Chiefs moving forward. Technically, this is correct. I haven't given any misinformation here. What I have done is use NFL NGS's version of the CPOE statistic, which is crap, and dramatically worse than the Ben Baldwin version, because for some reason it elects to reward QBs for throwing into coverage. Patrick does not throw into coverage often, and therefore ends up with a ludicrously low NGS CPOE, not because Patrick is bad but because the statistic is bad.

If I didn't know that about NGS CPOE, I could've very possibly made a full length piece saying Patrick Mahomes is overrated, all based around that one poorly designed statistic. I would've made myself look like a fool, which is why it's absolutely crucial to understand the data you're working with.

I have no explicit aversion to watching film, but do it carefully, and do it with respect for the number one rule of sports analysis (coined by the excellent Soccernomics): don't trust your eyes. This isn't to say your eyes are worthless, but what it does say is that your eyes see a subset of plays most of the time, and even if they do see the full sample your brain will not remember it all. Data sees and remembers everything. Therefore, I've learned that if the data and my eyes are in a serious dispute, the data has a better than 50/50 chance at being correct, but that's just my philosophy.

Once again, thanks for the collection of data sources Troy! I've seen you've followed me recently, which feels fantastic because you're quality my friend. I'll surely be clicking back on this one every now and again. Keep up the good work.

Expand full comment
Troy Chapman's avatar

Thanks for the insight and kind words! Interesting on the NGS CPOE model, I was unaware and will keep that in mind in the future. I’m still working to truly understand CPOE and what it tells us.

Expand full comment
Grant Marn's avatar

Robbie, thank you for taking the time to read my comment here and offer such a thoughtful response. I appreciate how valuable time is, and you taking some of yours to provide your reactions here is worth a ton. My kids constantly chide me about commenting on Substack, "dude, you get no likes, no comments or even a thumbs up...and they're so ridiculously long, nobody is reading them. Why do you do it?"

I tell them that I still subscribe to the old notion that ideas are like trees - you plant them knowing that you'll likely never see them to fruition, but you remain hopeful and optimistic that they will somehow in some way positively impact another person in the future. Even if you never know it. Now, I can show them your comment as proof!

In the spirit of your mission, I'll leave you with a true story. In the early 90s, I communicated with an award-winning writer from the Akron Beacon Journal about a particular Cleveland Cavalier player that he was fond of, but that I found unproductive and inefficient. I wrote a very nice letter outlining my valid analytical arguments using some box score data. As background, he is an amazing writer (I have read almost all his books) and an even better person from the many people who know him personally that I have spoken with. When his letter back to me came in the mail, I couldn't have been more excited.

When I opened it up, his response was very polite but quite short. It noted my admission that I had not seen this player play a lot due to the limited number of NBA games being broadcast in those days. He said that he, on the other hand, was fortunate as a writer to have seen every game he had played in. He concluded by noting that since I hadn't really watched him play, I wasn't able to really comment on his value.

While completely professional and polite, it still stung. Beyond that, his comment bothered me on a deeper level literally for decades - not because it was dismissive (it wasn't really - his view was the prevailing one at that time), but because it seemed so wrong from someone I viewed as so knowledgeable. It didn't seem to help that a few years later, the player at issue was washed out of the League and labelled a "bust."

What resonated long after, was the key question he raised - do you need to "see" a performance to evaluate it?

In the 25 years since his letter found its way into my mailbox, the world has moved decidedly towards my 1991 worldview. Analytics has become a huge industry in sports, and everyone speaks now of numbers as synonymous with truth. If you don't worship at the altar of numbers cloaked under the label of "analytics" you are a foolish Neanderthal not worthy of consideration.

Yet, at a moment in time when I should have been able to declare my intellectual victory, I was troubled. Some of it what that most of the numbers were not analytics at all but statistics (and nobody seemed to know the difference), while others were just middle of the bell curve crude averages partially comprised of wholly irrelevant use cases than the one before the coach or manager in the game.

Beyond that, most were calculated by people who never saw a single minute of the performance that generated that number. For example, I read somewhere people calculated updated WAR stats overnight in some nondescript office building long after the games were complete, and the stadium lights turned off. They went about their business with the utmost confidence that they were truth tellers...oracles who weren't required by the gods to watch the contests.

I wasn't so sure.

Age robs of you many things, but what it gives back is perspective. A few years ago, I found myself at an airport when my flight was delayed. On the television in the lobby was one of those morning "hot take" shows that I never watch. Held captive, I watched as one panel member trotted out some statistics that I was very familiar with and had personally used in evaluating players - the kind of analysis I might use today if writing a letter to a sportswriter.

When he finished, the other panelist simply said "nice numbers, but you act as if I haven't actually watched the games. Because I have, I know that those numbers are complete garbage." That struck me and took me back 30 years in an instant.

What he said I agreed with. I had watched his games too and found the numbers to be misleading at best. From that point forward, I started paying closer attention to how numbers were being used and how they were presented to the public.

I don't dislike numbers - to the contrary. They are invaluable in helping us see the penumbras of truth in the complex interactions that take place within a game. What's troubling though is that too many of them are foisted upon the public without any real understanding of their calculation or limitations by a far too often smug and sanctimonious media intelligentsia.

Strangely, I increasingly disagree with the "numbers people," not because they are using numbers, but because they are misusing numbers in my view. It's taken 30 plus years, but I've finally arrived at a place of perspective where I see the other point - the others side of the argument.

Numbers are terrific when used carefully and appropriately and offered on bended knee, but you really need to see the games too. At a minimum, I know this. Whether it changes your opinion or not, watching sports - particularly with other people - is a whole lot more fun that silently updating WAR numbers while oblivious to what happened.

And isn't that the point of sports after all? Thanks again for the time.

Expand full comment
Troy Chapman's avatar

This was a great read Grant.

Expand full comment
Grant Marn's avatar

Many thanks and glad you found some value in it.

Expand full comment