Earlier this year, the NFL set a record for most accepted penalties through week 3. There has been speculation that this increase in penalties is due to fewer full-contact practices as mandated in the CBA, and further speculation that the refs are idiots that can’t manage a game clock and have no idea what a catch is.
I know that I’ve done my fair share of bitching about officiating this season, but I also recall complaining about calls every other season. This led me to wonder, is this year really any different? I know, I can use the power of STATISTICS!!
First, a round of applause (or applesauce if you prefer) and acknowledgment to the fine folk at NFL Savant for compiling play by play data and providing it free of charge to anyone with the skills to use a mouse.
This post will compare penalties from weeks 1-8 of the 2014 season to penalties from weeks 1-8 of this season. Rulings as to what constitutes a catch, or the refs attempts to travel through time using the game clock as a Tardis are not covered. This is ALL ABOUT THE FLAGS BABY!
Now, ONWARDS! TO THE DATA!
2014 | 2015 |
---|---|
Games : 121 | Games : 119 |
Plays : 21736 | Plays : 21678 |
Penalties : 1925 | Pentalties : 1971 |
Plays per Game : 179.6364 | Plays per Game : 182.1681 |
Penalties per Play : 0.08856275 | Penalties per Play : 0.09092167 |
At first things don’t look so hot. With fewer games and fewer plays than last year, there are already 46 more penalties. People frequently look at the number of penalties per game, but imo that is a dumb thing to do. Penalties are assessed on a per-play basis. During (or after) any given play, a player either does, or does not get called for a penalty. Therefore it makes more sense to look at penalties per play, or the percentage chance of a penalty being called on (or after) any particular play. Since games do not all have the same number of plays, looking at penalties per game is not an apples to apples comparison.
Looking at penalties per play evens things out a bit. There are more plays per game this year. This could be due to more teams using a hurry-up offense, but it could also be due to more teams repeating downs due to penalties. Whatever the cause of the increase, more plays gives more opportunities for penalties to be called. (I should note that a play is listed twice in the data if two penalties are called on the same play. This could bias the data, but given the infrequent nature of multi-penalty plays I’m ignoring it for now.) Looking at the penalties per play number, those look pretty close. Given the large number of plays, is there a significant difference between the two numbers? For the craps playing degenerates here, the chance of a penalty being called on a play is about the same as making a 6 or 8 hardway.
The standard tool for comparing summary statistics is called a t-test. Technically, it’s called the Student’s t-test, named after a guy who wrote under the pseudonym “A Student.” It’s a way to test simple hypothesis about the data. Here the hypothesis that we are testing is that the difference between the 2014 value of penalties per play and the 2015 value is 0.
Expand for more nerdy nerdness, you nerd
I mentioned earlier that the penalties per play number can be considered the probability of a penalty being called on any particular play. Thinking about it this way means the penalty data will be best fit using a Bernoulli distribution, with the sample value of penalties per play as the event probability. If you are remembering your Stats 101 class, you may think that we can’t use a t-test here, since the data is not normally distributed. We don’t actually need the data to be normal to apply a t-test, rather we need the value of the statistic we are testing to follow a normal distribution. Since we have a very large sample size here, the penalty probability value we are looking at will follow an approximate normal distribution.
Running a Welch Two Sample t-test on the penalty data for each year gives a p-value of 0.3899, which basically means we can’t reject the idea that the probability of a penalty is the same in both years. That there is weaselly statistics talk for “Same old shit, this year and last.” In general, people look for a 95% confidence level when making comparisons like this. That p-value gives the confidence level, though backwards from what you might think, we would want to see a p-value of <0.05 before considering the difference to be statistically significant.
“But Zymm! It’s not just about the overall number of penalties, what about penalty yardage? What about the types of penalties?”
Excellent points Other Zymm! Let’s take a deeper look, shall we?
Previously, I was only looking at whether a penalty was called, without considering whether or not it was accepted. When looking at yardage, we limit ourselves to looking only at accepted penalties, as no penalty yards are assessed if the penalty is declined. It turns out this doesn’t really matter, as penalties were declined at basically the same rate both years. At this point last year, 13522 penalty yards were assessed for an average of 8.034462 yards per penalty. So far this year there have been 14254 penalty yards assessed for an average of 8.229792 yards per penalty.
Super-Secret Made-up Bonus Statistic!
If we assume that the refs are awarded a touchdown for every 100 yards they assess in penalties, the 2015 officials are leading the 2014 officials 994-945. The 2014 officials are in field goal range, but they’ll need to make some halftime adjustments if they want to win this!
Due to the way penalty yards are assigned, it’s a little more difficult to compare yards/penalty year over year. Instead, I looked at the portion of penalties over 5 yards, over 10 yards, and over 15 yards (so basically pass interference calls only). There was no significant difference between any of these groups. This kinda makes me think that there won’t be a huge difference in the types of calls either. Have I been going “WTF IS UP WITH ALL THESE OFFENSIVE PI CALLS!?” unnecessarily all year?
Penalty | 2014 | 2015 |
---|---|---|
Unspecified | 3 | 0 |
BLOCKED INTO PUNTER | 1 | NA |
CHOP BLOCK | 4 | 10 |
CLIPPING | 5 | 5 |
DEFENSIVE 12 ON-FIELD | 25 | 28 |
DEFENSIVE DELAY OF GAME | 0 | 2 |
DEFENSIVE HOLDING | 169 | 146 |
DEFENSIVE OFFSIDE | 88 | 118 |
DEFENSIVE PASS INTERFERENCE | 124 | 117 |
DELAY OF GAME | 79 | 75 |
DISQUALIFICATION | 3 | NA |
ENCROACHMENT | 26 | 15 |
FACE MASK (15 YARDS) | 40 | 46 |
FAIR CATCH INTERFERENCE | 1 | 4 |
FALSE START | 288 | 271 |
HORSE COLLAR TACKLE | 9 | 6 |
ILLEGAL BLINDSIDE BLOCK | 4 | 5 |
ILLEGAL BLOCK ABOVE THE WAIST | 44 | 66 |
ILLEGAL CONTACT | 81 | 41 |
ILLEGAL FORMATION | 35 | 51 |
ILLEGAL FORWARD PASS | 3 | 2 |
ILLEGAL MOTION | 6 | 9 |
ILLEGAL PEELBACK | 0 | 3 |
ILLEGAL SHIFT | 7 | 23 |
ILLEGAL SUBSTITUTION | 15 | 6 |
ILLEGAL TOUCH KICK | 1 | NA |
ILLEGAL TOUCH PASS | 3 | 5 |
ILLEGAL USE OF HANDS | 119 | 86 |
ILLEGAL WEDGE | 0 | NA |
INELIGIBLE DOWNFIELD KICK | 2 | 2 |
INELIGIBLE DOWNFIELD PASS | 10 | 15 |
INTENTIONAL GROUNDING | 17 | 14 |
INTERFERENCE WITH OPPORTUNITY TO CATCH | 1 | NA |
INVALID FAIR CATCH SIGNAL | 0 | 1 |
LOW BLOCK | 2 | 1 |
NEUTRAL ZONE INFRACTION | 56 | 67 |
OFFENSIVE 12 ON-FIELD | 6 | 4 |
OFFENSIVE HOLDING | 368 | 404 |
OFFENSIVE OFFSIDE | 4 | 2 |
OFFENSIVE PASS INTERFERENCE | 59 | 65 |
OFFSIDE ON FREE KICK | 12 | 8 |
PERSONAL FOUL | 32 | NA |
PLAYER OUT OF BOUNDS ON PUNT | 3 | 11 |
ROUGHING THE KICKER | 2 | 1 |
ROUGHING THE PASSER | 47 | 54 |
RUNNING INTO THE KICKER | 5 | 9 |
TAUNTING | 8 | 11 |
TRIPPING | 3 | 8 |
UNNECESSARY ROUGHNESS | 77 | 118 |
UNSPORTSMANLIKE CONDUCT | 28 | 34 |
ILLEGAL CRACKBACK | NA | 1 |
LEAPING | NA | 1 |
There are some data issues here, the main one being the “Personal Foul” category in 2014. We can probably assume these are all unnecessary roughness calls. There are also a fair number of penalties that are only called a handful of times, which we can’t really do much with, so while it’s antecdotally interesting that there have been almost 4x as many “Player out of bounds on the punt” calls this year, there’s not really much we can say about that.
Methodology and Sample Size notes. Exciting!
When comparing the types of penalties called, we’re getting much more specific, so our sample size is decreasing. There are two factors to consider when deciding if your sample size is sufficient to confidently use a t-test, the overall number of observations and the frequency of the event. For the more common penalties, we can continue to use a t-test, though for the less common penalties we can’t assume the distribution is close enough to normal to use the t-test. In this case, the events will follow a Poisson distribution. We can still compare the ratio of two events, and test the hypothesis that the ratio is 1 (i.e. that the events occur at the same rate) but now we’ll be using an exact test comparing our test statistic with the binomial distribution.
Let’s look at some of the more common calls. It’ll probably surprise no one that the most common call is offensive holding. Just eyeballing it, it appears there are quite a few more offensive holding calls this year, oddly enough, counterbalanced by fewer defensive holding calls. Surprisingly enough, the difference in offensive holding isn’t significant, but the defensive holding difference is! So, our first significant result is that the refs are calling defensive holding less frequently than they did last year. Why? Who the hell knows, I don’t have a theory on that one.
There’s no significant difference in OPI or PI calls, which is probably good news, since these particular calls usually have a pretty large impact on a drive.
There are really only two other penalties that show a significant difference from last year. Defensive offside calls have significantly increased this year, so maybe there’s something to all that “Aaron Rodgers is a genius with his hard count, blah blah blah” stuff, though I’m too lazy to actually go through and look at offside by team. The other one is the decrease in Illegal Use of Hands calls. Feel free to speculate on the reason for that one.
There’s no significant change in Roughing the Passer penalties, so despite all those “OMG, he touched the QB’s helmet, throw ALL THE FLAGS” calls, they were doing that last year too.
Last but not least, a quick break down of when penalties are called, by quarter and down.
2014 | 2015 |
---|---|
Q1 : 408 | Q1 : 412 |
Q2 : 564 | Q2 : 586 |
Q3 : 463 | Q3 : 460 |
Q4 : 483 | Q4 : 503 |
OT : 7 | OT : 10 |
No Down(Kickoffs, Extra Points) : 86 | No Down(Kickoffs, Extra Points) : 78 |
1st Down: 625 | 1st Down: 662 |
2nd Down: 494 | 2nd Down: 498 |
3rd Down: 478 | 3rd Down: 469 |
4th Down: 242 | 4th Down: 264 |
The main thing I find interesting here is the data for the second quarter. There are significantly more penalties called in the second quarter than any other quarter. My interpretation here, the second quarter is frequently the most competitive part of the game. It’s rare that a team is totally out of it by the half, but the urge to keep the score close going into half time might lead to more bending of the rules. The same patter doesn’t emerge in the 4th quarter due to garbage time.
I’m not really going to go into the down data. The large number for 1st down is a bit misleading, as there are more 1st downs than 2nd, 3rd and 4th.
So this seems long enough already. To conclude, officiating is pretty much as annoying this year as last. If you want to feel smarter than your friends, complain loudly next time there’s an offside call.
Fantastic. That 2nd quarter conjecture is very interesting. I’m gonna start watching games differently.
Amazing job! My engineering degree self has a boner. My reptilian brain says FUCK GOODELL AND THOSE SHITTY REFS!
Good stuff!
Damn, this makes me so har…………….. well done, great stuff.
http://38.media.tumblr.com/7064a5df00bdfbbb59f9387a3da4e6b7/tumblr_nkk6ipcYLV1ruwpoco1_500.gif
I’ll be back to bitch and rant about bitching and ranting about the live game officiating.
http://38.media.tumblr.com/d03619b66371dd6aa2fd3c9a0d1d96d6/tumblr_nkk6ipcYLV1ruwpoco7_400.gif
I’m having flashbacks to my quantitative analysis series, two classes of which were taught by a dude with a thick Cherman accent.
Now I’ll have nightmares about kriging again.
Side note: what the heck happened to the site? I’ve checked on two browsers and it seems we’re stuck in “old folks’ text point,” which is having a weird effect on my eyes.
THIS TEXT IS VERY LOUD!
It keeps changing. Either DTZM is playing with the site, or he’s having a stroke.
Seems to be just this post. I guess the actual quality content broke WordPress?
http://media0.giphy.com/media/JEGIloZ79M46c/giphy.gif
I think the tables messed with something in WP’s formatting. Everything works, even if it is a bit wonky, so I’m not bothering to go back and try and fix it.
WHY DID MY CRAPPY LAP TOP TURN INTO A JITTERBUG PHONE?!?!
I’m rather glad there’s no statistically significant change in roughing the QB penalties, as the most frequent flags I notice this season are for that pick play shit and things involving any sort of defense in long, exciting pass plays. Now I have to wonder if I can even trust my eyes based on the data presented, as it seems that such penalties on these plays are on par with last year.
Maybe the real issue here is that the penalties themselves seem random, arbitrarily enforced, and poorly explained, which also leads to interruption in the game’s flow.
Humans are really good at finding patterns, even when patterns aren’t there. This is where you get into a lot of the biases that behavioral psych people talk about. In this case, those penalties get the most attention, even though they aren’t the most frequent. Since we spend more time looking at them, thinking about them, talking about them, we think of them as more frequent than they really are.
Football pareidolia. All I needed in my life.
This entire math post has got me all hot and bothered now.
Approves:
http://media.salon.com/2013/02/Screen-Shot-2013-02-01-at-9.33.57-AM.png
That’s a lot of goddamn penalties.
Excellent work Frau Doktor!
http://cdn.meme.am/instances/500x/57953404.jpg
You know that part of your brain everyone tells you to shut off when you’re watching a truly awful movie they all insist is awesome? Just do the opposite of that.
I want to leave a comment that is relative to this article, well phrased, and meaningful – but numbers make my brain feel like The Ben after he has too many Choco Tacos and malt liquor before getting on that stair climbing machine
Damn, now I want a Choco Taco
Thank you for this. I may send this to my wife’s colleagues, who have a Cargo Cult understanding of stats: they slavishly (attempt) to perform the rituals without any understanding of WHAT THEY ACTUALLY MEAN, in hopes that it will please the god Peyr Revoower.
I am so goddamn dumb.
That’s what I took from this post.
We all knew Dok was smarter than us, now we have a handy empirical guide to HOW MUCH so.
“Needs more targeting penalties.”
-Taylor Mayes
I enjoyed my Stats 101 class so much in college, I switched my major to math and thought about becoming an actuary. Then I started taking college-level math classes and realized I’d made a huge mistake. Majoring in psych allowed me to continue my heroic level of drinking.
The point is, I respect anyone who can nerd it up like this. Well done.
Somehow, Stats 101 was added to the undergrad curriculum for environmental engineering at NC State. I took it my very last semester, after having already taken all 3 semesters of college-level calculus, plus differential equations.
Fun indeed.
Interestingly, I took stats for engineering AND for my MBA. The MBA stats class was much more fun.
This was great. I need to check that site out and see how far the data goes back and for what. I’d love to do some irresponsible and way less statistics-savvy investigations of my own.
One thing I found funny is that whole “less full practices” argument, which is being used, apparently, for everything. I’ve seen it as a very serious rationalization about injuries to teams–and specifically San Diego, even though they’ve been CONTINUOUSLY at the bottom of the league in time missed to injury.
I seem to recall reading a stats analysis of injuries that pretty much debunked the claim that injuries are related to the decreased full practices. There’s a pretty good chance I found the article because someone here linked to it though, so you may have already read it.