Related Posts Plugin for WordPress, Blogger...

Monday, September 23, 2013

US Late War Nationals 2013 Statistics and Conclusions

US Late War Nationals 2013 Statistics and Conclusions

by Eric Riha and Steven MacLauchlan

Both Eric and I were eager to dive into the dataset provided by the LW Nationals tournament at Historicon. Being nerds to the nth power, we wanted to see what information we could glean from the round by round results of the event, and see if we could prove or disprove any of the prevailing theories on Nationality/List Choice and Player Skill. Since Eric is the real Mathemagician and I am just a lowly mathemagician’s apprentice, I will leave the heavy analysis to Eric. Right up front, I will just get some of the raw data out of the way with some simple slicing and dicing.

Well this doesn't look good for the Russians....then again, what does now-a-days?

Disclaimer: Please remember that this is a relatively small dataset, and dimensions and relationships within the data are not present. So take my analysis with a grain of salt- Eric’s is much more scientific!

The Basics


Nothing terribly revelatory here. Germans and Americans were the most represented nationalities, with Canadians falling into a fairly distant third. Germans outnumbered Soviets ten to one! This is clearly the manifestation of how recent briefings are perceived, and should come as no surprise to anyone.

Infantry lists account for more than half of all lists. Armor and Mech lists both account for just under a quarter each.

The Top and Bottom Ten


Drawing conclusions from these top and bottom rankings is dangerous. Eric’s regression analysis below much more adequately details any relationship between nationality or mobility and rank. The data is presented here only for completeness’ sake.



At the risk of editorializing, I will say that I used to be a big supporter of sportsmanship scoring in tournaments. I no longer think it’s necessary (though as a compromise, I really like the “favorite player” vote to still represent the idea of sportsmanship!). The data here suggests that the anecdote often cited by people suggesting that only the losing players get good sportsmanship scores doesn’t seem to hold up in the least. Of course, having a higher sportsmanship score automatically puts one slightly higher in the rankings, but seeing that the average score between the top 25 and bottom 25 players differs by just a single point, one could make the argument that the sportsmanship scores aren’t adding any granularity to the overall rankings. The average sportsmanship score for the 100 players that completed the event was 29.18.

Thus, the majority of players are receiving their full 30 points with only a handful of people being penalized, rather than the majority of players receiving an ‘average’ score and only ‘exceptional’ sportsmen recognized with higher scores. While this phenomenon may not be representative of sportsmanship scoring across all Flames of War tournaments, it does expose the difficulties in using sportsmanship to calculate a player’s overall placement.

Mission Data


NOTE: Winning a game netted a bonus Victory point, making our results an 8 point scale rather than the standard 7.

Because this is simple averaged data, gleaning info from this is tenuous at best. Still, the data does seem to suggest that much of the anecdotal evidence about mission balance by mobility holds true.

Regression Analysis

RHQ Rankings

We also wanted to take a deeper look at the data from the event to see if any of our data points could act as predictors of performance, rather than just tell us a story of what happened at the event. In order to accomplish this, we needed something that could serve as a ‘control’ - essentially an external measure of player skill. We took a look at a few different options, but ultimately rested on RHQ rankings data; specifically, the RHQ Rank of each player before the results for the event were loaded.

For those of you that just groaned at the idea of using RHQ Rankings as a measure of player skill (you know who you are), we understood the idea would and should be questioned. Several players over the years have talked about how the system can be ‘gamed’ to a certain extent, how multiple events are ‘required’ to achieve a proper placement, etc. This meant our first step was to match it up against our internal measure of player skill - the final event rankings - to see if it was a valid (and statistically significant) control.

From a data standpoint, we had 64 players with active RHQ standings out of the 102 players in the event. I used a simple proportion to normalize the RHQ Rankings (1 to 383) against the 102 placements in the Nationals event, and ran a linear regression with a modified constant of zero to see if there was a correlation between a player’s RHQ Rank before the event and that player’s Nationals Placement in the results of the event.

As you can see from the graph, we run into some self-selection/heteroskedacity issues in the 64 RHQ observations we have (the X values are weighted to the left side of the chart rather than spread evenly throughout it). Since the RHQ Rankings for players are now only posted if they register for the site, many players in the bottom half of the Nationals Placements were not registered and thus did not have an RHQ Rank assigned to them.

Even with the bottom half of our data sparsely populated, we still see a relatively strong correlation between RHQ Ranking and Nationals Placement: nearly 56% of the change in Nationals Rankings can be explained with changes in RHQ Ranking. With a P-value of 8.43x10-13, we also find the predictive co-efficient of X to be statistically significant at 1.0535 - meaning that a 1 Rank Increase in Normalized RHQ Ranking roughly translates to a 1 Rank Increase in Nationals Ranking. Since the bottom-half our of Nationals Placement data would likely show up in the bottom-half of our Normalized RHQ Rankings data, it is safe to assume that adding the missing RHQ Rankings to our data set would strengthen our correlation even further.

Bottom Line: Given the above analysis, we can safely say that a player’s RHQ Ranking predicts the majority of changes in that player’s Nationals Placement and is acceptable for use as a measure of “player skill” in our Flames of War tournament analysis.

Note on Correlation vs. Causality: For the sake of completeness, it should be noted that having a good RHQ Ranking does not make you a player with high skill, but rather players with high skill have a tendency to have good RHQ Rankings. Since we have no way to directly measure a player’s skill in, say, list construction or risk mitigation, we have to use RHQ Rank as a sort of combinatory replacement measure for all of those individual player ‘skills’

List Performance by Mission

While Steve’s averages by mission/list type are interesting, they might be biased by player skill. Did Armor lists do poorly across the board because Armor lists are bad for those mission types? Or were the majority of armor lists run by players with “less skill” than the rest of the playing field?
To perform this analysis, I set up the equation:

When the regression is run, it will check to see which “X” factors are more or less significant in determining “Y” for each mission.

Round 1, Encounter: The data produces an Adjusted R2 value of .08, with no Coefficients as statistically significant determinants of VP’s Earned.
Round 2, Cauldron: The data produces an Adjusted R2 value of .06, with our Normalized RHQ Ranking Coefficient as the only statistically significant determinant of VP’s Earned.
Round 3, Pincer: The data produces an Adjusted R2 value of .12, with our Normalized RHQ Ranking Coefficient as the only statistically significant determinant of VP’s Earned.
Round 4, Counter Attack: The data produces an Adjusted R2 value of .004, with no Coefficients as statistically significant determinants of VP’s Earned.
Round 5, Hold The Line: The data produces an Adjusted R2 value of -.03, with no Coefficients as statistically significant determinants of VP’s Earned.
Round 6, Free For All: The data produces an Adjusted R2 value of -007, with no Coefficients as statistically significant determinants of VP’s Earned.

I was surprised to see so little statistical significance, but soon realized that the results here followed anecdotal evidence quite well. Our regression model explained very, very little of the changes in VP as we proceed through each round, and we have two issues to consider: one data driven and one ‘luck’ driven. Opponent Data is completely absent from our model and our data set. From Steve’s averages analysis, we know that Tank and Mechanized lists outperform Infantry lists in Encounter, and anecdotally we know that Tank and Mech lists have a significant advantage when playing against Infantry lists in that scenario. Adding opponent data to our model would produce much more robust results.

I created a second model to see how Nationality combined with List Type could impact VP’s Earned,

and it achieved the exact same results - only Player Skill was statistically significant, and only in rounds 2 and 3. This helps explain some of the discrepancy between Steve’s Average VP’s by Nationality and the final results. Even though British lists earned more VP’s on average throughout the event, only 2 of 8 British listed ended in the top 25. The same can be seen across the board with other Nationalities.

Bottom Line: A player’s individual choice of Nationality or List Type did not have a statistically significant impact on VP’s earned in any single mission.

Skill vs. Luck:

However, our measure of player skill does provide us with some additional insight into the order in which each mission is played and how a tournament develops. While the results from Encounter may lean more heavily on list vs list match-ups, Round 2 and 3 showed that players with ‘higher’ skill were able to garner higher scores than those with ‘lower’ skill. As the tournament proceeded into Day 2, more and more matchups occurred between players with higher skill and the measure lost its statistical significance.

This leads us to another piece of anecdotal evidence: while the players with higher skill levels will generally filter towards the top ranked spots in an event, one must both be highly skilled and “lucky” to bring home a Nationals Title as the player field expands beyond 64 players. As the player field grows, a single, highly skilled player is less and less capable of impacting the scores of opposing players in the event. One bad match-up, round, or roll of the dice is far more difficult to recover from.

An example from this event would be the records of Bill Wilcox and Doug Rousson - both players with a high degree of skill, as recognized by their RHQ Rankings and the general player community. They played in the first round of the event, with a list matchup that was difficult on both players, in a mission that could result in a draw. The round did indeed end with a draw, and both players earned their lowest personal scores of the tournament. While they were able to climb back into the top spots in the event, neither player had a round by round VP total high enough to play James Gains and directly impact his final score. While James certainly played and defeated other highly skilled players throughout the event, the scenario still serves as an effective example of how the “luck of the draw” can and often does impact the final results.

To add more weight to this conclusion, I ran the same RHQ + List Type + Nationality regression model, but changed my Y variable to Total VP’s Earned.

With an Adjusted R2 of only .058, the model did not explain much of the variation in Total VP’s earned. However, the only statistically significant determinant of Total VP’s Earned was, once again, our Normalized RHQ Ranking.

Bottom Line: While “luck” in matchup, mission, opponent, terrain, morale checks, etc. all play a role in determining final placements in an event, it is Player Skill - the ability to excel at list construction, tactical maneuver, psychology, speed of play, or any other player driven variable - is the only statistically significant predictor of overall performance in a Flames of War tournament.

Where Do We Go From Here

The name of the game is Data, Data, Data, and then more Data. The US Nationals Results dataset I worked with lacked a number of things that may prove to be statistically significant in determining those final results. Remember that, even though Player Skill was shown to be statistically significant, the total variation I could explain with the data I had was incredibly minute - only 5.8% of the variation in VP's Earned could be explained! That leaves a lot of room for what we call Unknown Variable Bias.

This is where the new Flames of War After Action Data Collection Effort comes in.

Flames of War After Action Data Collection Effort

Steve-O and Mike Haught over at BF are trying to fill in some of those gaps, and they've created a Google web-form to collect as much information as they can. The next time you get in a game with your buddy, take a picture of the game board and some notes on the game, and head over to:

and click on the Google Form link in Steve's initial post.

As we collect data, we'll be looking for additional trends that we might not be able to see with what we have so far. It also will provide an important non-tournament perspective on how the game is played and what really determines that end result.

Why Do All of This?

The other question is "What are we looking for?" That's a pretty broad question - and one I'd like to leave open for discussion.

One of these things is not like the other.

Are we looking to 'validate' the thought that Player Skill is more important than a person's List? Is it important to separate those things? Or is someone's list just a factor of their Skill as a Player? Are we looking to prove something new?

Sound off on the forums and let us know how ya feel!

"Eric Riha is a total jerk and probably doesn't want to hear your comments - no wait, this time I really do - drop them off in the WWPD Forums!"

Popular Posts In the last 30 Days

Copyright 2009-2012 WWPD LLC. Graphics and webdesign by Arran Slee-Smith. Original Template Designed by Magpress.