To the extent anyone has ever read this site (unlikely!) and cares about it (less likely!), I am moving to a new blog: http://blababoutball.wordpress.com.
A thought experiment on aging…
When I decided to get “serious” about doing some baseball analysis I got myself a serious computer to do the heavy mathematical lifting: a prior-generation dual-processor Mac Pro. Of course I didn’t need it, but I also like to tinker with technology, and upgrading it from a pretty fast 8-core PC to a screaming 12-core machine was a fun project in itself.
Fast forward a few months and I am finally doing the kind of computational task I had in mind for this computer… basically an iterative calculation on a huge dataset, where the calculation is done row by row but each row can be done independently of all others, thus lending itself to parallelization of a form that R lets you do very easily.
I did some very careful tests on small data sets to figure out the “sweet spot” of how many cores I could deploy before RAM became a constraint (16 logical cores, or 8 actual), and came up with a good estimate of how long the computation would take when run on a whole year of data: 4.5 hours.
22 hours later, here I am waiting for it to finish. Still. I check the machine to make sure it’s still running 16 threads (it is), that the CPU is at about 2/3 capacity (it is), and it’s not buffering things to the hard drive (it isn’t).
So I inspect my code to see what I did wrong, and I find… a “<>” where a “=” should be. I’m running it on every year BUT the one I wanted.
An article of mine published at The Hardball Times today:
I know I’m biased, but I think this one is pretty fun.
In response to this comment over at Tango’s blog:
|1B||2B||3B||0 out||1 out||2 out||0 out||1 out||2 out||0 out||1 out||2 out||0 out||1 out||2 out|
Note bases loaded and 2 out: across all four observed eras, this base/out state results in the lowest OBP.
Edit: updated to correct a query error.
If you’ve ever installed MySQL on Linux, you know how easy it is. If you’ve ever installed MySQL on OS X, you know what a terrible pain-in-the-ass it is. It’s a pain whether you install the latest package directly from mysql.org, or if you install via macports. In order to actually make it work there are about a half dozen post-installation steps you need to follow, none of which are documented anywhere reasonable.
If you’re a Mac user and you want to use MySQL for sabermetric analysis, save yourself some headaches and use the script.
One other thing that must be controlled for is game situation and that could be significantly affecting the results. For example, when the pitching team is ahead, especially way ahead, later in the game, the pitcher is more likely to throw a fastball on all pitches, more likely to throw a strike, etc. The batting team is more likely to be taking more pitches, etc.
Unsurprisingly given the source, this is absolutely correct!
To test MGL’s assertion, I computed the percentage of fastball variants (FF, FA, FT, FC, FS, SI in PITCHf/x) thrown by every pitcher from 2008-2014, broken out by platoon, inning, count, index (i.e. whether the pitch was the 1st, 2nd, 3rd, etc. of the PA), and run differential (i.e. how many runs ahead or behind their team was at the time). I then used the delta method to compare (a) the percentage of fastballs thrown in the 7th inning or later when the pitcher’s team was ahead by 4 or more runs, to (b) the percentage of fastballs thrown in the 7th inning or later when the run differential was between 1 and -1.
Here are some selected findings:
- On the first pitch, pitchers threw fastballs at a 5.2% higher rate when far ahead
- On the second pitch (ignoring count), pitchers threw fastballs at a 1.5% higher rate when far ahead
- Irrespective of count or pitch index, pitchers threw fastballs at a 3.6% higher rate overall when far ahead
- In 1-0 counts, pitchers threw fastballs at a 6.7% higher rate when far ahead
- In 0-1 counts, pitchers threw fastballs at a 2.8% lower rate when far ahead
In deeper counts the sample sizes quickly get small, so I’ll stop at 1-0 and 0-1. But the results are definitive: pitchers lean more on their fastball late in games when their team is far ahead. The astute reader will note that pitchers threw fastballs less frequently in 0-1 counts when far ahead (this is also the case for 1-1 and 0-2 counts, in smaller samples), but that is not enough to offset the higher fastball rates in other counts.
In fact the effect shows up whenever the pitcher’s team is far ahead, not just late in games. In innings 4 through 6 pitchers threw fastballs at a 4.1% higher rate on the first pitch, and a 2.3.% higher rate overall, when far ahead than in close games.
So MGL’s observation is exactly right: to really do this kind of pitch-level analysis correctly, we need to control for much more than I did in my last article.