Hey everyone, TC here,
I am glad you are here! I received a follower question asking which data sources I use for my newsletter and insights. Shocker…I use a lot of different sources.
Last week I read through two articles on the use of public-facing data for NFL analysis, both caught my attention and were filled with great insight.
The first article was by
on the latest availability of public-facing data. The second article was from Sean Clement at Sumer Sports.I’ve always advocated searching the data yourself, trying to understand it, and asking questions when you don’t understand it. This is a standard I have attempted to follow myself over the past 3 years. What started with a subscription to ProFootball Focus turned into a wide branching set of data sources, both paid and free.
To power this newsletter, I utilize subscription payments and invest back into the newsletter. I wanted to use this opportunity to touch on a few specific data sources and their metrics. I love how the market has expanded, and how available data has become over the past three years.
I don’t use every available data point or metric. If I don’t use it, the two primary reasons are that I don’t like the metric or understand it. Regarding the latter, I will do my best to better understand the metric before using it.
Over the past two years, I have also put considerable effort into watching game films via All-22.
An interesting concept I have learned and embraced with film review is to avoid looking at data metrics before watching the film. Looking at data before the film pushed me to create bias towards certain players, instead of watching objectively.
Data Pitfalls
I often find myself in data overload paralysis. I will start with a question, starting pulling in data…and more data…and more data. Then I have so much data, that I’ve lost the scope original question.
Another pitfall I sometimes run into, as noted above, is not understanding what the data point or metric is telling me. On the other hand, I understand the metric but I fail to identify any weaknesses with it.
One last pitfall is using data to validate your opinion. I find it very easy to pull a singular data point or metric and filter it down to ensure it validates my opinion. I try to avoid this.
Instead, I try to use the data to answer a question. Find two or three data points to help you answer that question. The question could be value-based (who is the best at…) or it could be projection-based (who will do well against…).
The Process
Below is my process for evaluating an NFL game and player performance.
Watch the game live.
Watch the All-22. (join me in the subscriber All-22 Chat Room)
Check the data, and compare it to what my eyes told me. Are they the same? If not, do more research.
Generate insights from the data and film. Develop a few data points to answer the questions.
About 95% of my data insights are developed based on data and metrics pulled from the following five websites.
Fantasy Points Data Suite (FDSP) has become my go-to for offensive data and some bits of team defensive data. Their extensive amount of season and play filters and data split types help me drill down and essentially peel back some layers of the data. Want to know how often a quarterback is under pressure when the defense performs a stunt versus no stunt? Do you want to see how Nico Collins is performing against MOFC vs MOFO looks? If there is an offensive metric needed, FPDS likely has it.
The ability to filter the passing and receiving data by coverage type (beyond man or zone) is invaluable to me. The 1st Read % and Highly Accurate Throw % are two metrics I reference in my valuation of the passing game and the scheme.
In the receiving game, I often utilize the split data for past games and upcoming opponents. FPDS also just released its receiver separation score, and it is available by data type: alignment, coverage, route breaks, and route types.
Within the rushing data, I utilize success rate data based on the blocking scheme type.
The defensive data is lacking with FPDS, which is to be expected as the website is primarily fantasy data-driven. There are instances where you can reverse engineer the data to generate defensive information.
This is just a small bit available within the FPDS.
Sports Information Solutions ($)
Sports Information Solutions (SIS) is huge in MLB baseball and a forgotten asset within the NFL world. SIS is by far the most expensive data package I subscribe to, and does offer valuable information. Like FPDS, the SIS data suite has an extensive filter option library.
Forget Player Grades, Total Points Earned is where individual player assessment is at now for me.
Defined as The total of a player’s EPA responsibility on passes using the Total Points system that distributes credit among all players on the field for a given play. Totals are scaled up to map to the average points scored or allowed on a team level, with the player's snap count determining how much to adjust.
A more generalized version would be how many points on the scoreboard is a player worth. The metric incorporates offensive line play, off-target passes, broken tackles, run direction, fumbles, sacks, off-target passes, dropped passes, etc.
The primary area I reference with SIS is offensive line performance. Their blown block metrics in the run-and-pass game helped me develop insights into offensive line play. SIS has interesting data points for bounce rate (running back bouncing to another running lane) and designed gap rate for run blocking.
Their boom (EPA greater than 1.0) and bust (EPA less than -1.0) on all aspects of position groups or players helps me decipher which position groups are giving up chunk plays.
SIS also tracks unblocked and solo sacks. This is a critical piece in assessing pass rusher performance for me. Again these are just tiny parcels of data availability with the SIS data hub.
SIS also has a free view of their data via the Data Hub.
ProFootball Focus (PFF) is the OG. PFF still offers a high-quality dataset. The portion of their data, that I have moved away from is their individual player grades. In the past, I was reliant upon the PFF grading system instead of watching the game film. Today, if I need to pull an overall metric for how a player is doing, then I will refer over to SIS Total Points.
Due to the lack of defensive metrics available with FPDS, I will utilize PFF (SIS) for defensive data points. The PFF pass rush win rate with the option to pull win rates in true pass sets is very helpful. The PFF run defense metrics including average depth of tackle is another data point I like to look at for defensive line play.
PFF’s lack of filters is my biggest gripe. There is no option to filter by down, distance, coverage type, and other filters available through other websites.
NFL Pro Premium ($)
NFL Pro Premium is a work in progress with the first iteration released at the start of the season. I use their premium plan for quick access to specific All-22 clips with their fantastic film filter options. The film room is outstanding.
I also do a high-level review of the data to help funnel where I want to go with the data, often shifting back over to FPDS for the final export. My biggest complaints with the data are the lack of export to CSV, and the lack of position filters. I also avoid using defensive pressure numbers and wide receiver separation data from Premium. Their pressure numbers are based on proximity data from the player's GPS tracking data. In my opinion, that can lead to false pressure numbers. I trust charted pressure data versus GPS data. The separation data, while interesting, is based on GPS data 0.2 seconds before the football arrives at said receiver.
RBSDM (Free)
Ben Baldwin has built an invaluable set of tools over at RBSDM.com. The data offers a quick set of filters with CSV download. I use this source for success rates for offense and defense.
Other Sources:
FTN Fantasy ($)
FTN has grown quickly recently, especially with bringing over DVOA metrics after Football Outsiders closed down. I currently do not subscribe to FTN but I am hoping to do so in the future if the paid subscriber count increases further.
Stathead ($)
Stathead is the paywall version of Pro Football Reference. Stathead has an immense amount of data available. Using their filter tools can be a bit cumbersome. I primarily use Stathead for historical data references.
ESPN Analytics (Free)
ESPN’s receiver open score model is a good metric for providing an overall look at receiver performance. The 4th Down models are great to look at for coaching decisions.
NFELo Ratings (Free)
Robby offers some interesting data for quarterbacks and NFL Teams.
Fun Data Tools
With this tool, you can build a report using any field from the public NFL dataset.
DOUG
The DOUG (Design Outstandingly Unique Graphs) is a tool to gives you the ability to build tables and scatter plots based on data fields selected from multiple data sources.
Datawrapper is awesome. This is the website I use when building charts and tables. I don’t know how this website is still free. Datawrapper can complete about 90% of what GGPlot can do without the need for knowledge in R.
A few examples I built out recently (non-interactive versions):
NFLVerse
If you are knowledgeable in R or Python, then the NFLVerse is for you. Free data, covering hundreds of data points for every game. Packages such as NFLFastR, and NFLReadr give you the tools to do everything for free.
Closing
There is an immense amount of public-facing data for the NFL today. Some of that data is behind a paywall. My recommendation is to use the data to develop your own opinions, with the warning that you need to use it correctly. And watch the film.
If you are unable to do this, which I understand, make sure you are following smart people on social media.
Thanks for your continued support.
-TC
Several years ago at an analytics conference, Bill James described a process that he believed was more likely to yield greater clarity and useful insights. Specifically, you should always start with a question that is interesting or provocative (preferably a large or significant question) and THEN go find data to answer it. Too often these days, he said he reads some piece where the author clearly went the other direction - they found or created some new data and then went hunting for a question - any question - that it answers. The results are both obvious and suboptimal.
The problems with this approach are many. Your question and data are often too narrow in scope and less likely to be truly insightful or invite further research or questions to advance the thinking. The data leads the question and too often excludes other valuable data that would be more probative either alone or in combination in searching for truth. It's all too siloed and limiting with insufficient collaboration or peer review.
Importantly, this approach reduces the analyst at a human level to simply "the [fill in the blank acronym] person" as in Cheryl is the xABOE% person and not a more sought after broader thinker on issues. In turn, the analyst is too often tied to their own data and overly uses it in future research to the exclusion of different and better sources or approaches.
It strikes me that the NFL analytics community is awash in this precise malady which continues to grow. Too many overly narrow data sources with increasingly illogical acronyms that are sold as presumptively insightful with small sample sizes and even smaller numbers of people who can explain them at a rudimentary level along with their limitations. It's become one black box after another that few really understand but "feel" smarter just by dropping the acronym on everyone..."Terry McLaurin's xQOR is better than Justin Jefferson's. Can you believe it!?"
Well no, actually I can't. Yet, somehow these sorts of abberant conclusions which belie common sense and what we observe week to week ironically add mystique and credibility instead of scrutiny to the purported number. We are instructed that solely because it's presented in numerical form or as a "grade" it's accurate and precise...even laughably to two decimal places.
It's one thing to have this in baseball where performance is more isolated, but it's utterly fatal in football given all the interdependencies at play. This hesitancy in the belief in the proliferating numbers is precisely why film is still around as a counter, and why this debate still exists. Analytics has failed to win the day in football. Imagine people in baseball arguing the need to review game film to reach conclusions - that is a symbol of how far football has to go.
Football requires a broader lens and holistic analytical approach versus the current myopic linking together of a series of unrelated data source acronyms and hoping that it says something...about something.
Thanks for raising the issue.
Robbie, thank you for taking the time to read my comment here and offer such a thoughtful response. I appreciate how valuable time is, and you taking some of yours to provide your reactions here is worth a ton. My kids constantly chide me about commenting on Substack, "dude, you get no likes, no comments or even a thumbs up...and they're so ridiculously long, nobody is reading them. Why do you do it?"
I tell them that I still subscribe to the old notion that ideas are like trees - you plant them knowing that you'll likely never see them to fruition, but you remain hopeful and optimistic that they will somehow in some way positively impact another person in the future. Even if you never know it. Now, I can show them your comment as proof!
In the spirit of your mission, I'll leave you with a true story. In the early 90s, I communicated with an award-winning writer from the Akron Beacon Journal about a particular Cleveland Cavalier player that he was fond of, but that I found unproductive and inefficient. I wrote a very nice letter outlining my valid analytical arguments using some box score data. As background, he is an amazing writer (I have read almost all his books) and an even better person from the many people who know him personally that I have spoken with. When his letter back to me came in the mail, I couldn't have been more excited.
When I opened it up, his response was very polite but quite short. It noted my admission that I had not seen this player play a lot due to the limited number of NBA games being broadcast in those days. He said that he, on the other hand, was fortunate as a writer to have seen every game he had played in. He concluded by noting that since I hadn't really watched him play, I wasn't able to really comment on his value.
While completely professional and polite, it still stung. Beyond that, his comment bothered me on a deeper level literally for decades - not because it was dismissive (it wasn't really - his view was the prevailing one at that time), but because it seemed so wrong from someone I viewed as so knowledgeable. It didn't seem to help that a few years later, the player at issue was washed out of the League and labelled a "bust."
What resonated long after, was the key question he raised - do you need to "see" a performance to evaluate it?
In the 25 years since his letter found its way into my mailbox, the world has moved decidedly towards my 1991 worldview. Analytics has become a huge industry in sports, and everyone speaks now of numbers as synonymous with truth. If you don't worship at the altar of numbers cloaked under the label of "analytics" you are a foolish Neanderthal not worthy of consideration.
Yet, at a moment in time when I should have been able to declare my intellectual victory, I was troubled. Some of it what that most of the numbers were not analytics at all but statistics (and nobody seemed to know the difference), while others were just middle of the bell curve crude averages partially comprised of wholly irrelevant use cases than the one before the coach or manager in the game.
Beyond that, most were calculated by people who never saw a single minute of the performance that generated that number. For example, I read somewhere people calculated updated WAR stats overnight in some nondescript office building long after the games were complete, and the stadium lights turned off. They went about their business with the utmost confidence that they were truth tellers...oracles who weren't required by the gods to watch the contests.
I wasn't so sure.
Age robs of you many things, but what it gives back is perspective. A few years ago, I found myself at an airport when my flight was delayed. On the television in the lobby was one of those morning "hot take" shows that I never watch. Held captive, I watched as one panel member trotted out some statistics that I was very familiar with and had personally used in evaluating players - the kind of analysis I might use today if writing a letter to a sportswriter.
When he finished, the other panelist simply said "nice numbers, but you act as if I haven't actually watched the games. Because I have, I know that those numbers are complete garbage." That struck me and took me back 30 years in an instant.
What he said I agreed with. I had watched his games too and found the numbers to be misleading at best. From that point forward, I started paying closer attention to how numbers were being used and how they were presented to the public.
I don't dislike numbers - to the contrary. They are invaluable in helping us see the penumbras of truth in the complex interactions that take place within a game. What's troubling though is that too many of them are foisted upon the public without any real understanding of their calculation or limitations by a far too often smug and sanctimonious media intelligentsia.
Strangely, I increasingly disagree with the "numbers people," not because they are using numbers, but because they are misusing numbers in my view. It's taken 30 plus years, but I've finally arrived at a place of perspective where I see the other point - the others side of the argument.
Numbers are terrific when used carefully and appropriately and offered on bended knee, but you really need to see the games too. At a minimum, I know this. Whether it changes your opinion or not, watching sports - particularly with other people - is a whole lot more fun that silently updating WAR numbers while oblivious to what happened.
And isn't that the point of sports after all? Thanks again for the time.