Let ‘Em Play! (91% of the time)

Doktor Zymm

Doktor Zymm

An expert at time travel*, Doktor Zymm also has the ability to move objects with her mind** and can breath underwater***.

*Forward only, at a preset rate
**Via her hands, usually
***When the water is contained in a glass

Doktor Zymm





Earlier this year, the NFL set a record for most accepted penalties through week 3. There has been speculation that this increase in penalties is due to fewer full-contact practices as mandated in the CBA, and further speculation that the refs are idiots that can’t manage a game clock and have no idea what a catch is.

Is this dog maintaining control of the frisbee as he goes to the ground?

I know that I’ve done my fair share of bitching about officiating this season, but I also recall complaining about calls every other season. This led me to wonder, is this year really any different? I know, I can use the power of STATISTICS!!

First, a round of applause (or applesauce if you prefer) and acknowledgment to the fine folk at NFL Savant for compiling play by play data and providing it free of charge to anyone with the skills to use a mouse.

This post will compare penalties from weeks 1-8 of the 2014 season to penalties from weeks 1-8 of this season. Rulings as to what constitutes a catch, or the refs attempts to travel through time using the game clock as a Tardis are not covered. This is ALL ABOUT THE FLAGS BABY!

I will haunt you with thousands of tiny yellow ghosts!

Now, ONWARDS! TO THE DATA!

2014 2015
Games : 121 Games : 119
Plays : 21736 Plays : 21678
Penalties : 1925 Pentalties : 1971
Plays per Game : 179.6364 Plays per Game : 182.1681
Penalties per Play : 0.08856275 Penalties per Play : 0.09092167

At first things don’t look so hot. With fewer games and fewer plays than last year, there are already 46 more penalties. People frequently look at the number of penalties per game, but imo that is a dumb thing to do. Penalties are assessed on a per-play basis. During (or after) any given play, a player either does, or does not get called for a penalty. Therefore it makes more sense to look at penalties per play, or the percentage chance of a penalty being called on (or after) any particular play. Since games do not all have the same number of plays, looking at penalties per game is not an apples to apples comparison.

Looking at penalties per play evens things out a bit. There are more plays per game this year. This could be due to more teams using a hurry-up offense, but it could also be due to more teams repeating downs due to penalties. Whatever the cause of the increase, more plays gives more opportunities for penalties to be called. (I should note that a play is listed twice in the data if two penalties are called on the same play. This could bias the data, but given the infrequent nature of multi-penalty plays I’m ignoring it for now.) Looking at the penalties per play number, those look pretty close. Given the large number of plays, is there a significant difference between the two numbers? For the craps playing degenerates here, the chance of a penalty being called on a play is about the same as making a 6 or 8 hardway.

The standard tool for comparing summary statistics is called a t-test. Technically, it’s called the Student’s t-test, named after a guy who wrote under the pseudonym “A Student.” It’s a way to test simple hypothesis about the data. Here the hypothesis that we are testing is that the difference between the 2014 value of penalties per play and the 2015 value is 0.

Expand for more nerdy nerdness, you nerd

I mentioned earlier that the penalties per play number can be considered the probability of a penalty being called on any particular play. Thinking about it this way means the penalty data will be best fit using a Bernoulli distribution, with the sample value of penalties per play as the event probability. If you are remembering your Stats 101 class, you may think that we can’t use a t-test here, since the data is not normally distributed. We don’t actually need the data to be normal to apply a t-test, rather we need the value of the statistic we are testing to follow a normal distribution. Since we have a very large sample size here, the penalty probability value we are looking at will follow an approximate normal distribution.

Running a Welch Two Sample t-test on the penalty data for each year gives a p-value of 0.3899, which basically means we can’t reject the idea that the probability of a penalty is the same in both years. That there is weaselly statistics talk for “Same old shit, this year and last.” In general, people look for a 95% confidence level when making comparisons like this. That p-value gives the confidence level, though backwards from what you might think, we would want to see a p-value of <0.05 before considering the difference to be statistically significant.

“But Zymm! It’s not just about the overall number of penalties, what about penalty yardage? What about the types of penalties?”

Excellent points Other Zymm! Let’s take a deeper look, shall we?

Previously, I was only looking at whether a penalty was called, without considering whether or not it was accepted. When looking at yardage, we limit ourselves to looking only at accepted penalties, as no penalty yards are assessed if the penalty is declined. It turns out this doesn’t really matter, as penalties were declined at basically the same rate both years. At this point last year, 13522 penalty yards were assessed for an average of 8.034462 yards per penalty. So far this year there have been 14254 penalty yards assessed for an average of 8.229792 yards per penalty.

Super-Secret Made-up Bonus Statistic!

If we assume that the refs are awarded a touchdown for every 100 yards they assess in penalties, the 2015 officials are leading the 2014 officials 994-945. The 2014 officials are in field goal range, but they’ll need to make some halftime adjustments if they want to win this!

Due to the way penalty yards are assigned, it’s a little more difficult to compare yards/penalty year over year. Instead, I looked at the portion of penalties over 5 yards, over 10 yards, and over 15 yards (so basically pass interference calls only). There was no significant difference between any of these groups. This kinda makes me think that there won’t be a huge difference in the types of calls either. Have I been going “WTF IS UP WITH ALL THESE OFFENSIVE PI CALLS!?” unnecessarily all year?

Penalty 2014 2015
Unspecified 3 0
BLOCKED INTO PUNTER 1 NA
CHOP BLOCK 4 10
CLIPPING 5 5
DEFENSIVE 12 ON-FIELD 25 28
DEFENSIVE DELAY OF GAME 0 2
DEFENSIVE HOLDING 169 146
DEFENSIVE OFFSIDE 88 118
DEFENSIVE PASS INTERFERENCE 124 117
DELAY OF GAME 79 75
DISQUALIFICATION 3 NA
ENCROACHMENT 26 15
FACE MASK (15 YARDS) 40 46
FAIR CATCH INTERFERENCE 1 4
FALSE START 288 271
HORSE COLLAR TACKLE 9 6
ILLEGAL BLINDSIDE BLOCK 4 5
ILLEGAL BLOCK ABOVE THE WAIST 44 66
ILLEGAL CONTACT 81 41
ILLEGAL FORMATION 35 51
ILLEGAL FORWARD PASS 3 2
ILLEGAL MOTION 6 9
ILLEGAL PEELBACK 0 3
ILLEGAL SHIFT 7 23
ILLEGAL SUBSTITUTION 15 6
ILLEGAL TOUCH KICK 1 NA
ILLEGAL TOUCH PASS 3 5
ILLEGAL USE OF HANDS 119 86
ILLEGAL WEDGE 0 NA
INELIGIBLE DOWNFIELD KICK 2 2
INELIGIBLE DOWNFIELD PASS 10 15
INTENTIONAL GROUNDING 17 14
INTERFERENCE WITH OPPORTUNITY TO CATCH 1 NA
INVALID FAIR CATCH SIGNAL 0 1
LOW BLOCK 2 1
NEUTRAL ZONE INFRACTION 56 67
OFFENSIVE 12 ON-FIELD 6 4
OFFENSIVE HOLDING 368 404
OFFENSIVE OFFSIDE 4 2
OFFENSIVE PASS INTERFERENCE 59 65
OFFSIDE ON FREE KICK 12 8
PERSONAL FOUL 32 NA
PLAYER OUT OF BOUNDS ON PUNT 3 11
ROUGHING THE KICKER 2 1
ROUGHING THE PASSER 47 54
RUNNING INTO THE KICKER 5 9
TAUNTING 8 11
TRIPPING 3 8
UNNECESSARY ROUGHNESS 77 118
UNSPORTSMANLIKE CONDUCT 28 34
ILLEGAL CRACKBACK NA 1
LEAPING NA 1

There are some data issues here, the main one being the “Personal Foul” category in 2014. We can probably assume these are all unnecessary roughness calls. There are also a fair number of penalties that are only called a handful of times, which we can’t really do much with, so while it’s antecdotally interesting that there have been almost 4x as many “Player out of bounds on the punt” calls this year, there’s not really much we can say about that.

Methodology and Sample Size notes. Exciting!

When comparing the types of penalties called, we’re getting much more specific, so our sample size is decreasing. There are two factors to consider when deciding if your sample size is sufficient to confidently use a t-test, the overall number of observations and the frequency of the event. For the more common penalties, we can continue to use a t-test, though for the less common penalties we can’t assume the distribution is close enough to normal to use the t-test. In this case, the events will follow a Poisson distribution. We can still compare the ratio of two events, and test the hypothesis that the ratio is 1 (i.e. that the events occur at the same rate) but now we’ll be using an exact test comparing our test statistic with the binomial distribution.

Let’s look at some of the more common calls. It’ll probably surprise no one that the most common call is offensive holding. Just eyeballing it, it appears there are quite a few more offensive holding calls this year, oddly enough, counterbalanced by fewer defensive holding calls. Surprisingly enough, the difference in offensive holding isn’t significant, but the defensive holding difference is! So, our first significant result is that the refs are calling defensive holding less frequently than they did last year. Why? Who the hell knows, I don’t have a theory on that one.

There’s no significant difference in OPI or PI calls, which is probably good news, since these particular calls usually have a pretty large impact on a drive.

There are really only two other penalties that show a significant difference from last year. Defensive offside calls have significantly increased this year, so maybe there’s something to all that “Aaron Rodgers is a genius with his hard count, blah blah blah” stuff, though I’m too lazy to actually go through and look at offside by team. The other one is the decrease in Illegal Use of Hands calls. Feel free to speculate on the reason for that one.

There’s no significant change in Roughing the Passer penalties, so despite all those “OMG, he touched the QB’s helmet, throw ALL THE FLAGS” calls, they were doing that last year too.

Last but not least, a quick break down of when penalties are called, by quarter and down.

2014 2015
Q1 : 408 Q1 : 412
Q2 : 564 Q2 : 586
Q3 : 463 Q3 : 460
Q4 : 483 Q4 : 503
OT : 7 OT : 10
No Down(Kickoffs, Extra Points) : 86 No Down(Kickoffs, Extra Points) : 78
1st Down: 625 1st Down: 662
2nd Down: 494 2nd Down: 498
3rd Down: 478 3rd Down: 469
4th Down: 242 4th Down: 264

The main thing I find interesting here is the data for the second quarter. There are significantly more penalties called in the second quarter than any other quarter. My interpretation here, the second quarter is frequently the most competitive part of the game. It’s rare that a team is totally out of it by the half, but the urge to keep the score close going into half time might lead to more bending of the rules. The same patter doesn’t emerge in the 4th quarter due to garbage time.

I’m not really going to go into the down data. The large number for 1st down is a bit misleading, as there are more 1st downs than 2nd, 3rd and 4th.

So this seems long enough already. To conclude, officiating is pretty much as annoying this year as last. If you want to feel smarter than your friends, complain loudly next time there’s an offside call.


Doktor Zymm
Doktor Zymm
An expert at time travel*, Doktor Zymm also has the ability to move objects with her mind** and can breath underwater***. *Forward only, at a preset rate **Via her hands, usually ***When the water is contained in a glass
Please Login to comment
17 Comment threads
14 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
20 Comment authors
Don TballsofsteelandfuryMoose -The End Is Well NighSpanky DatassLow Commander of the Super Soldiers Recent comment authors
  Subscribe  
Notify of
Don T

Fantastic. That 2nd quarter conjecture is very interesting. I’m gonna start watching games differently.

ballsofsteelandfury

Amazing job! My engineering degree self has a boner. My reptilian brain says FUCK GOODELL AND THOSE SHITTY REFS!

Good stuff!

Moose -The End Is Well Nigh
Moose -The End Is Well Nigh

Damn, this makes me so har…………….. well done, great stuff.

http://38.media.tumblr.com/7064a5df00bdfbbb59f9387a3da4e6b7/tumblr_nkk6ipcYLV1ruwpoco1_500.gif

Moose -The End Is Well Nigh
Moose -The End Is Well Nigh

I’ll be back to bitch and rant about bitching and ranting about the live game officiating.

http://38.media.tumblr.com/d03619b66371dd6aa2fd3c9a0d1d96d6/tumblr_nkk6ipcYLV1ruwpoco7_400.gif

Lothar of the Hill People
Lothar of the Hill People

I’m having flashbacks to my quantitative analysis series, two classes of which were taught by a dude with a thick Cherman accent.

Now I’ll have nightmares about kriging again.

entropy

Side note: what the heck happened to the site? I’ve checked on two browsers and it seems we’re stuck in “old folks’ text point,” which is having a weird effect on my eyes.

scotchnaut

THIS TEXT IS VERY LOUD!

Beastmode Ate My Baby

It keeps changing. Either DTZM is playing with the site, or he’s having a stroke.

entropy

Seems to be just this post. I guess the actual quality content broke WordPress?

Low Commander of the Super Soldiers
Spanky Datass
Spanky Datass

WHY DID MY CRAPPY LAP TOP TURN INTO A JITTERBUG PHONE?!?!

entropy

I’m rather glad there’s no statistically significant change in roughing the QB penalties, as the most frequent flags I notice this season are for that pick play shit and things involving any sort of defense in long, exciting pass plays. Now I have to wonder if I can even trust my eyes based on the data presented, as it seems that such penalties on these plays are on par with last year.

Maybe the real issue here is that the penalties themselves seem random, arbitrarily enforced, and poorly explained, which also leads to interruption in the game’s flow.

ThePirateSloth
ThePirateSloth

This entire math post has got me all hot and bothered now.

blaxabbath
laserguru

That’s a lot of goddamn penalties.
Excellent work Frau Doktor!

Beerguyrob
entropy

You know that part of your brain everyone tells you to shut off when you’re watching a truly awful movie they all insist is awesome? Just do the opposite of that.

jjfozz

I want to leave a comment that is relative to this article, well phrased, and meaningful – but numbers make my brain feel like The Ben after he has too many Choco Tacos and malt liquor before getting on that stair climbing machine

The Right Reverend Electric Mayhem

Thank you for this. I may send this to my wife’s colleagues, who have a Cargo Cult understanding of stats: they slavishly (attempt) to perform the rituals without any understanding of WHAT THEY ACTUALLY MEAN, in hopes that it will please the god Peyr Revoower.

Horatio Cornblower

I am so goddamn dumb.

That’s what I took from this post.

King Hippo

We all knew Dok was smarter than us, now we have a handy empirical guide to HOW MUCH so.

blaxabbath

“Needs more targeting penalties.”

-Taylor Mayes

SonOfSpam

I enjoyed my Stats 101 class so much in college, I switched my major to math and thought about becoming an actuary. Then I started taking college-level math classes and realized I’d made a huge mistake. Majoring in psych allowed me to continue my heroic level of drinking.

The point is, I respect anyone who can nerd it up like this. Well done.

King Hippo

Somehow, Stats 101 was added to the undergrad curriculum for environmental engineering at NC State. I took it my very last semester, after having already taken all 3 semesters of college-level calculus, plus differential equations.

Fun indeed.

ballsofsteelandfury

Interestingly, I took stats for engineering AND for my MBA. The MBA stats class was much more fun.

Old School Zero

This was great. I need to check that site out and see how far the data goes back and for what. I’d love to do some irresponsible and way less statistics-savvy investigations of my own.

One thing I found funny is that whole “less full practices” argument, which is being used, apparently, for everything. I’ve seen it as a very serious rationalization about injuries to teams–and specifically San Diego, even though they’ve been CONTINUOUSLY at the bottom of the league in time missed to injury.