Adventures in Big Data

When I decided to get “serious” about doing some baseball analysis I got myself a serious computer to do the heavy mathematical lifting: a prior-generation dual-processor Mac Pro. Of course I didn’t need it, but I also like to tinker with technology, and upgrading it from a pretty fast 8-core PC to a screaming 12-core machine was a fun project in itself.

Fast forward a few months and I am finally doing the kind of computational task I had in mind for this computer… basically an iterative calculation on a huge dataset, where the calculation is done row by row but each row can be done independently of all others, thus lending itself to parallelization of a form that R lets you do very easily.

I did some very careful tests on small data sets to figure out the “sweet spot” of how many cores I could deploy before RAM became a constraint (16 logical cores, or 8 actual), and came up with a good estimate of how long the computation would take when run on a whole year of data: 4.5 hours.

22 hours later, here I am waiting for it to finish. Still. I check the machine to make sure it’s still running 16 threads (it is), that the CPU is at about 2/3 capacity (it is), and it’s not buffering things to the hard drive (it isn’t).

So I inspect my code to see what I did wrong, and I find… a “<>” where a “=” should be. I’m running it on every year BUT the one I wanted.

🙂

Tools: Installing MySQL on OS X

If you’ve ever installed MySQL on Linux, you know how easy it is. If you’ve ever installed MySQL on OS X, you know what a terrible pain-in-the-ass it is. It’s a pain whether you install the latest package directly from mysql.org, or if you install via macports. In order to actually make it work there are about a half dozen post-installation steps you need to follow, none of which are documented anywhere reasonable.

Enter the lovely folks a Mac Mini Vault. They have created and published (via GitHub) a script that installs MySQL from beginning to end, along with several other useful scripts.

If you’re a Mac user and you want to use MySQL for sabermetric analysis, save yourself some headaches and use the script.