Applying advanced stats on German baseball: wOBA/EqA leaders Northern Division 2010
by Michael Kujoth
Hello fellow readers, and welcome to another episode of “How do you spend a whole day not working on your semester papers and doing something completely senseless instead”. For today, I’m easily gonna accomplish that feat by attempting a sabermetric look at the German Bundesliga Nord 2010. Saberwhat? Glad you asked, bold voice. Sabermetrics is the field of baseball analysis through more advanced statistical studies. In other words: Throwing the second last century stats like Batting Average out of the window and trying to come up with something more telling and accurate.
While those ‘new-age’ stats are slowly catching on in the States, I couldn’t really find anything regarding European baseball, let alone German baseball. Then again, I waspretty surprised that there is an entry for it in the German Wikipedia, but it’s probably still safe to assume that most of you are not very familiar with the matter. So first of all, I’m gonna provide some basics on sabermetrics, before taking a look at the 2010 regular season through the lenses of Weighted On-Base Average (wOBA) and Equivalent Average (EqA).
In the unlikely event that you are actually conversant with those types of stats and as a result see yourself yelling at the computer screen right now, relax. I know that you can’t convert the formula for those stats one-on-one to German baseball. But while there probably will indeed be some inaccuracies, it’s still gonna provide a much more sophisticated look at offensive production than Batting Average or Slugging Percentage can ever give you. I’m gonna defer to the problems attached to this analysis in a later section as well. What’s that? Your eyes are already halfway closed because you don’t really get a kick out of baseball statistics? Well then you should probably steer onto this last-exit lane, close the browser window and do something more productive. Like shooting stray dogs with a pellet gun or something. For the three guys still reading this, because they’re at work and can’t find anything better to do, let’s ride on to why we need advanced stats in baseball.
I. Shortcomings of traditional baseball statistics
This chapter could be an easy ten pages, but in order to save some of your finite lifetime, I’m gonna focus onthree offensive stats: AVG/OBP/SLG. And I’ll make it real quick too: They are more or less worthless. OK, I lied. Not that quick. Let’s start with the most overrated stat ever: Batting Average. People love it. Guy’s hitting .300? Good Player. .250? Not so much. That’s saying David Eckstein (career .281) is a better hitter than Adam Dunn (.251). By thirty points! Doood, this Dunn fella must really suck.
But what does AVG actually provide us with? It tells you how good a player is at getting on base, any base, via basehit. Strictly, it should be called hitting on-base percentage. But who would want to know that, since OBP tells you how good somebody is at getting on base overall and SLG tells you how far on the bases he gets when hitting the ball? Good question bold voice. A+. Sit down. There really is no answer to that either. People use it because people always used it because people always used it. In the end, Batting Average gives you half the answers for two different questions.
Moving on to On-Base Percentage, which is really not a bad statistic after all. I kinda like it. No, not Billy Beane-liking-it or Brad Pitt-soon-to-be-liking it, but nonetheless. It gives you a pretty good idea of the frequency a player is working his way on base with. The problem is, you still don’t know how good a hitter as an individual is, while giving all events the same value, e.g. a homerun is not worth more than a single.Which is corrected when looking at Slugging Percentage. New problem: All our Walks, Hit-By-Pitches etc. are gone again. So the smart people really smarted it out: “Hey, let’s throw on-base percentage and slugging into one pot and call it, hmmm, on-base plus slugging!” Nice try. While OPS actually is a big upgrade over the three single stats when judging offensive ability, there are still flaws attached to it. Consider two players: Player A goes 25 for 100 with 25 Homeruns. Player B goes 50 for 100 with 25 singles and 25 doubles apiece. And just for fun, we’re gonna give him 25 steals as well. Who had more value? And by how much? AVG and OBP side with Player B, while SLG would point to Player A. Their OPSs are exactly the same, while the Stolen Bases are not even acknowledged. The outcome is neither exact nor complete.
II. Introducing: wOBA & EqA
So what’s next? I think we can skip the part where some dudes, who clearly had even more time on hand than me, went through all play-by-play data for every major league game since 200BC, give or take. Let’s just be glad they did and did provide us with some valuable information. That information is the run value of every offensive event in baseball, from Walks to Homeruns, all theway to Catchers Interference and Defensive Indifference. RV tells us how many more (or less) runs have been scored on average, when a certain event occurred during an inning, no matter the situation. Now we know the exact values for Walks, Intentional Walks, Hit-by-Pitches, Singles, Doubles, Triples, Homeruns, Stolen Bases and Caught Stealings. With that, we can calculate a percentage weighing the run values of those events, thus weighted on-base average. I’d argue that Reached Base on Error should be included (which is doneregularly when using wOBA), while others might tell me that it’s not an accomplishment of the player, so it shouldn’t. Then I’d argue that a HBP or a four pitch Walk to the number nine batter isnot muchmore of an accomplishment either. I’ll let the others win this time, since RBOE is not available in the Bundesliga statistics.
Now, for the first big chunk of salt I’m going to throw into your soup: It came to my notice that the German first league is NOT the MLB. Shocking, I know. The aforementioned Wikipedia article states that those new stats like runs created (RC)do not work for German baseball, due to the differences in overall quality and the way runs are produced. Challenge accepted. Thanks Barney. RC should be within a 5% margin of the actual run total an MLB team scores during the course of a season, which is around 6000 PA. Given that teams only have around 900 PA here, I just took the whole league to get a good sample size (8753 PA). And I was off by 14%. Dang. Then I took the 2009 St. Louis Cardinals season to the same formula (technical RC) for a cross-check. Off only 3,7%. Yankees? 8%. Hmm. Also, I couldn’t include Grounded into Double Play for the Bundesliga, since this stat isn’t available as well. With it, the discrepancy would have been even higher. So what do we take from this?
1. There is a pretty good chance that the run values will be different if somebody attempted to figure them out
2. Nobody will figure them out. If you’d take the time, you could prove that there is a difference, but since there’s no play-by-play data, you won’t get the actual run values for the Bundesliga. So just like in real life, sometimes you just have to deal with what fangraphs.com gives you.
3.Accepting that there are differences between the run values of the Bundesliga and the MLB, I’d still put forward the thesis, that they are a lot smaller than the differences between the factual ones and those used in the traditional stats, thus getting us closer to the real offensive value than AVG/OBP/SLG. It’s not perfect, but it’s a step in the right direction. Also, I’ve already gone way to far to stop now.
While we’re at it, let’s empty that dredger for Grain of Salt II: The Return of Sodium McChloride. Generally the sample sizes for everyday playershere are around 120PA in a season, which is really not enough for any serious statistical analysis. Then again, not much of a choice. Just keep in mind that due to the small sample,these stats neitherhave too much predictive value nor are they a reliable indication for a players true (long term) skill. Look at this as a determination of who actually provided the most offensive value this year, no matter if it is due to skill, luck, random fluctuation or all of the above.
In addition to wOBA I’m also going to throw Equivalent Average (EqA) into the, hm, equation. Not to bother you with any more details, EqA is a four-way calculation that works a little different than wOBA.Stolen Bases and Caught Stealings are included as well as Sacrifice Bunts and Sacrifice Flies. As coincidence will have it, the third step of the calculation provides us with a number called Equivalent Runs, which is similar to runs created and gives us an estimate of the runs contributed by a player/team. Here’s a quick glance at the EqRs for the eight teams and their actual runs scored:

Now, I don’t know about you, but this looks pretty darn good to me. The biggest discrepancy is 12.5% off for Dortmund. I cry small sample size and throw the variances of all teams into one account and the error comes out at 5.4%, while five of the eight teams are within less than 4% of their true run total. Yes, this still could be all coincidence, but maybe we’re really onto something here and EqA actually is a stat that can be applied to the Bundesliga. For now, I still like my salty soup enough to take a look at the top ten wOBA and EqA leaders for the Bundesliga Nord 2010.
III. Results
wOBA is brought up to a level similar to OBP, just to make it easier to look at, so the average player is around .330. The first chart takes into account 1B, 2B, 3B, HR, BB, IBB and HBP, while the second one also gauges SB and CS with pretty similar results within the top 10.


EqA on the other hand is being brought up to Batting Average scale to give it a familiar look. Due to the formula, you’re average guy should be at .260. No drastic changes again, the top 5 stay the same with some flip-flopping behind that. I decided to put the EqR leaders here as well, though as explained, this number is not an average or a percentage and thus depends on playing time. It is the estimate of how many runs a player contributed throughout the season.


Quick recap to bring this thing to an end: Our top 5 stay the same, no matter if we use EqA or wOBA/wOBA (SB). Comparing those rankings to the more traditional stats, we find that only OPS has at least the top 4 in the right order. This bodes well with analyses showing the correlation between different kinds of statistics and the actual amount of runs scored by a team, where OPS easily beats out SLG, OBP and AVG (in that order). EqA tops that list, while wOBA ranks either above that or between EqA and OPS, depending on the formula used. For now, I’d personally trust EqA over the traditional stats and wOBA (for the Bundesliga), though more analyses still have to prove its applicability for this league (I might cover the southern division in a follow-up to this).
Now, if you’re still awake and have a question, detected a flaw or if you just wanna call me a nerd, please feel highly encouraged to do so, since this is the first time I’ve done something like this and any feedback would be much appreciated.












Comment by JakubVancura
September 16, 2010 | 2:17 pm
Hey man, just awesome! Love it!