...

Commits (2)
 --- layout: post title: March Madness 2017 --- For this year's March Madness tournament, I built an [algorithmic bracket picker](https://github.com/btelle/dnd-march-madness) based on Dungeons and Dragons dice-roll battles. The bracket it [generated](/data/march-madness-2017/bracket-log.txt), that I submitted for my company's bracket competition, did surprisingly well: it correctly picked UNC to win the tournament and placed first overall in our bracket competition. [![The Bracket](/img/march-madness-2017/btelle3331-bracket-sm.jpg)](/img/march-madness-2017/btelle3331-bracket-lg.pdf) ## How accurate was it? To examine the generated bracket's accuracy, I loaded its predicted games and the tournament's actual games into SQL tables, `actual_games` and `predicted_games`: ```sql CREATE TABLE actual_games ( round INT, division VARCHAR(7), game_number INT, team_a VARCHAR(255), team_b VARCHAR(255), winner VARCHAR(255), PRIMARY KEY(round, division, game_number) ); ``` To see how many games whose winners were accurately predicted by D&D, compare the winners: ```sql SELECT picked_correctly, COUNT(*) FROM ( SELECT actual.round, actual.division, actual.game_number, actual.winner=predicted.winner as picked_correctly FROM actual_games as actual INNER JOIN predicted_games as predicted ON actual.round = predicted.round AND actual.division = predicted.division AND actual.game_number = predicted.game_number ) games GROUP BY 1; ```

Games Predicted Accurately

As the tournament goes on, previous mistakes make it increasingly harder to make correct picks. I broke the accuracy down by round number to see how it performed as the picks got harder. ```sql SELECT round, COUNT(*), SUM(picked_correctly), SUM(picked_correctly)/COUNT(*) as percentage_correct FROM ( SELECT actual.round, actual.winner=predicted.winner as picked_correctly FROM actual_games as actual INNER JOIN predicted_games as predicted ON actual.round = predicted.round AND actual.division = predicted.division AND actual.game_number = predicted.game_number ) x GROUP BY 1 ORDER BY 1 DESC; ```

Games Predicted Accurately by Round

At least 50% accuracy throughout the tournament, not bad. The 100% in the championship round is misleading since there's obviously only one game in that round. ## The randomness/upset factor It would be easy to just rank teams on seed number and come up with a fairly accurate bracket. The real magic of the tournament is in the Cinderella stories and huge unexpected upsets. The D&D algorithm accounts for these by introducing random factors like dragon attacks, which immediately end the battle and award a random team the victory. I wanted to see how well my random upset picks performed: ```sql SELECT picked_correctly, COUNT(*) FROM ( SELECT actual_v.round, actual_v.division, actual_v.game_number, actual_v.winner=predicted_v.winner as picked_correctly FROM actual_v INNER JOIN predicted_v ON actual_v.round = predicted_v.round AND actual_v.division = predicted_v.division AND actual_v.game_number = predicted_v.game_number WHERE ( (predicted_v.team_a_seed < predicted_v.team_b_seed and predicted_v.winner = predicted_v.team_b) OR (predicted_v.team_b_seed < predicted_v.team_a_seed and predicted_v.winner = predicted_v.team_a) )) x GROUP BY 1; ```

Upsets Predicted Accurately

The bracket went 3-11 predicting upsets where a higher seeded team beat a lower seed. Not a great showing, but those games are really hard to predict using just the data the algorithm uses. ## Next Year For next year's bracket, I will try to incorporate more metrics into the model to make battle logic smarter. This year's iteration was only based on seed value and points scored and allowed. I'll also take a few dragons out to cut down on false upsets. \ No newline at end of file
 ... @@ -47,6 +47,9 @@ h3 { ... @@ -47,6 +47,9 @@ h3 { h4 { h4 { font-size: 1.15em; font-size: 1.15em; } } .chart-label { text-align: center; } blockquote { blockquote { border-left: 2px solid; border-left: 2px solid; padding: 1em 1em; padding: 1em 1em; ... ...
 label,value correct,42 incorrect,21
This diff is collapsed.
 round,total_games,correct_games,accuracy_pct 64,32,26,0.8125 32,16,8,0.5 16,8,4,0.5 8,4,2,0.5 4,2,1,0.5 2,1,1,1
 label,value correct,3 incorrect,11

318 KB

 d3.csv('/data/march-madness-2017/accuracy.csv', function(data) { nv.addGraph(function() { var chart = nv.models.pieChart() .x(function(d) { return d.label }) .y(function(d) { return d.value }) .showLabels(true) //Display pie labels .labelThreshold(.05) //Configure the minimum slice size for labels to show up .labelType("percent") //Configure what type of data to show in the label. Can be "key", "value" or "percent" .donut(true) //Turn on Donut mode. Makes pie chart look tasty! .donutRatio(0.35) //Configure how big you want the donut hole size to be. ; chart.tooltip.contentGenerator(function (data) { return "

"+data.data.label+": "+data.data.value+"

" }); d3.select("#accuracy svg") .datum(data) .transition().duration(350) .call(chart); return chart; }); }); d3.csv('/data/march-madness-2017/rounds.csv', function(data) { nv.addGraph(function() { var colors = d3.scale.ordinal() .range(['#87B8DB', '#65A4D1', '#3F8ABF', '#1F76B4', '#085E9A', '#064877']); var chart = nv.models.discreteBarChart() .x(function(d) { return d.label }) //Specify the data accessors. .y(function(d) { return d.value }) .color(colors.range()) chart.yAxis //Chart y-axis settings .axisLabel('Games Predicted Correctly (%)'); chart.xAxis //Chart x-axis settings .axisLabel('Round'); chart.tooltip.contentGenerator(function (data) { var round = "Round of "+data.data.label; if (data.data.label == 16) { round = 'Sweet Sixteen'; } else if (data.data.label == 8) { round = "Elite Eight"; } else if (data.data.label == 4) { round = "Final Four"; } else if (data.data.label == 2) { round = "Championship"; } return "

"+round+": "+data.data.raw+" ("+data.data.value+"%)

" }); d3.select('#rounds svg') .datum(format_bar_chart_data(data)) .call(chart); nv.utils.windowResize(chart.update); return chart; }); }); d3.csv('/data/march-madness-2017/upsets.csv', function(data) { nv.addGraph(function() { var chart = nv.models.pieChart() .x(function(d) { return d.label }) .y(function(d) { return d.value }) .showLabels(true) //Display pie labels .labelThreshold(.05) //Configure the minimum slice size for labels to show up .labelType("percent") //Configure what type of data to show in the label. Can be "key", "value" or "percent" .donut(true) //Turn on Donut mode. Makes pie chart look tasty! .donutRatio(0.35) //Configure how big you want the donut hole size to be. ; chart.tooltip.contentGenerator(function (data) { return "

"+data.data.label+": "+data.data.value+"

" }); d3.select("#upsets svg") .datum(data) .transition().duration(350) .call(chart); return chart; }); }); var format_bar_chart_data = function(data) { var ret_arr = [ { key: "Predicted games by round", values: [] } ] for (i in data) { ret_arr[0].values.push({'label': data[i]['round'], 'value': data[i]['accuracy_pct']*100, 'raw': data[i]['correct_games']}); } return ret_arr; }; \ No newline at end of file