...
 
Commits (2)
---
layout: post
title: March Madness 2017
---
For this year's March Madness tournament, I built an [algorithmic bracket picker](https://github.com/btelle/dnd-march-madness)
based on Dungeons and Dragons dice-roll battles. The bracket it [generated](/data/march-madness-2017/bracket-log.txt),
that I submitted for my company's bracket competition, did surprisingly well:
it correctly picked UNC to win the tournament and placed first overall in our bracket
competition.
[![The Bracket](/img/march-madness-2017/btelle3331-bracket-sm.jpg)](/img/march-madness-2017/btelle3331-bracket-lg.pdf)
## How accurate was it?
To examine the generated bracket's accuracy, I loaded its predicted games and the
tournament's actual games into SQL tables, `actual_games` and `predicted_games`:
```sql
CREATE TABLE actual_games (
round INT,
division VARCHAR(7),
game_number INT,
team_a VARCHAR(255),
team_b VARCHAR(255),
winner VARCHAR(255),
PRIMARY KEY(round, division, game_number)
);
```
To see how many games whose winners were accurately predicted by D&D, compare the
winners:
```sql
SELECT picked_correctly, COUNT(*)
FROM (
SELECT
actual.round,
actual.division,
actual.game_number,
actual.winner=predicted.winner as picked_correctly
FROM actual_games as actual
INNER JOIN predicted_games as predicted
ON actual.round = predicted.round AND
actual.division = predicted.division AND
actual.game_number = predicted.game_number
) games GROUP BY 1;
```
<div class="chart-label"><h3>Games Predicted Accurately</h3></div>
<div id="accuracy"><svg height="400"></svg></div>
As the tournament goes on, previous mistakes make it increasingly harder to make correct
picks. I broke the accuracy down by round number to see how it performed as the picks got
harder.
```sql
SELECT round, COUNT(*), SUM(picked_correctly), SUM(picked_correctly)/COUNT(*) as percentage_correct FROM (
SELECT
actual.round,
actual.winner=predicted.winner as picked_correctly
FROM actual_games as actual
INNER JOIN predicted_games as predicted
ON actual.round = predicted.round AND
actual.division = predicted.division AND
actual.game_number = predicted.game_number
) x
GROUP BY 1
ORDER BY 1 DESC;
```
<div class="chart-label"><h3>Games Predicted Accurately by Round</h3></div>
<div id="rounds"><svg height="500"></svg></div>
At least 50% accuracy throughout the tournament, not bad. The 100% in the championship
round is misleading since there's obviously only one game in that round.
## The randomness/upset factor
It would be easy to just rank teams on seed number and come up with a fairly accurate
bracket. The real magic of the tournament is in the Cinderella stories and huge unexpected
upsets. The D&D algorithm accounts for these by introducing random factors like dragon
attacks, which immediately end the battle and award a random team the victory. I wanted to
see how well my random upset picks performed:
```sql
SELECT picked_correctly, COUNT(*)
FROM (
SELECT
actual_v.round,
actual_v.division,
actual_v.game_number,
actual_v.winner=predicted_v.winner as picked_correctly
FROM actual_v
INNER JOIN predicted_v
ON actual_v.round = predicted_v.round AND
actual_v.division = predicted_v.division AND
actual_v.game_number = predicted_v.game_number
WHERE (
(predicted_v.team_a_seed < predicted_v.team_b_seed and predicted_v.winner = predicted_v.team_b) OR
(predicted_v.team_b_seed < predicted_v.team_a_seed and predicted_v.winner = predicted_v.team_a)
)) x
GROUP BY 1;
```
<div class="chart-label"><h3>Upsets Predicted Accurately</h3></div>
<div id="upsets"><svg height="400"></svg></div>
The bracket went 3-11 predicting upsets where a higher seeded team beat a lower seed. Not
a great showing, but those games are really hard to predict using just the data the
algorithm uses.
## Next Year
For next year's bracket, I will try to incorporate more metrics into the model to make
battle logic smarter. This year's iteration was only based on seed value and points
scored and allowed. I'll also take a few dragons out to cut down on false upsets.
<script src="/js/posts/march-madness-2017.js"></script>
\ No newline at end of file
......@@ -47,6 +47,9 @@ h3 {
h4 {
font-size: 1.15em;
}
.chart-label {
text-align: center;
}
blockquote {
border-left: 2px solid;
padding: 1em 1em;
......
label,value
correct,42
incorrect,21
This diff is collapsed.
round,total_games,correct_games,accuracy_pct
64,32,26,0.8125
32,16,8,0.5
16,8,4,0.5
8,4,2,0.5
4,2,1,0.5
2,1,1,1
label,value
correct,3
incorrect,11
d3.csv('/data/march-madness-2017/accuracy.csv', function(data) {
nv.addGraph(function() {
var chart = nv.models.pieChart()
.x(function(d) { return d.label })
.y(function(d) { return d.value })
.showLabels(true) //Display pie labels
.labelThreshold(.05) //Configure the minimum slice size for labels to show up
.labelType("percent") //Configure what type of data to show in the label. Can be "key", "value" or "percent"
.donut(true) //Turn on Donut mode. Makes pie chart look tasty!
.donutRatio(0.35) //Configure how big you want the donut hole size to be.
;
chart.tooltip.contentGenerator(function (data) {
return "<p>"+data.data.label+": "+data.data.value+"</p>"
});
d3.select("#accuracy svg")
.datum(data)
.transition().duration(350)
.call(chart);
return chart;
});
});
d3.csv('/data/march-madness-2017/rounds.csv', function(data) {
nv.addGraph(function() {
var colors = d3.scale.ordinal()
.range(['#87B8DB', '#65A4D1', '#3F8ABF', '#1F76B4', '#085E9A', '#064877']);
var chart = nv.models.discreteBarChart()
.x(function(d) { return d.label }) //Specify the data accessors.
.y(function(d) { return d.value })
.color(colors.range())
chart.yAxis //Chart y-axis settings
.axisLabel('Games Predicted Correctly (%)');
chart.xAxis //Chart x-axis settings
.axisLabel('Round');
chart.tooltip.contentGenerator(function (data) {
var round = "Round of "+data.data.label;
if (data.data.label == 16) {
round = 'Sweet Sixteen';
} else if (data.data.label == 8) {
round = "Elite Eight";
} else if (data.data.label == 4) {
round = "Final Four";
} else if (data.data.label == 2) {
round = "Championship";
}
return "<p>"+round+": "+data.data.raw+" ("+data.data.value+"%)</p>"
});
d3.select('#rounds svg')
.datum(format_bar_chart_data(data))
.call(chart);
nv.utils.windowResize(chart.update);
return chart;
});
});
d3.csv('/data/march-madness-2017/upsets.csv', function(data) {
nv.addGraph(function() {
var chart = nv.models.pieChart()
.x(function(d) { return d.label })
.y(function(d) { return d.value })
.showLabels(true) //Display pie labels
.labelThreshold(.05) //Configure the minimum slice size for labels to show up
.labelType("percent") //Configure what type of data to show in the label. Can be "key", "value" or "percent"
.donut(true) //Turn on Donut mode. Makes pie chart look tasty!
.donutRatio(0.35) //Configure how big you want the donut hole size to be.
;
chart.tooltip.contentGenerator(function (data) {
return "<p>"+data.data.label+": "+data.data.value+"</p>"
});
d3.select("#upsets svg")
.datum(data)
.transition().duration(350)
.call(chart);
return chart;
});
});
var format_bar_chart_data = function(data) {
var ret_arr = [
{
key: "Predicted games by round",
values: []
}
]
for (i in data) {
ret_arr[0].values.push({'label': data[i]['round'], 'value': data[i]['accuracy_pct']*100, 'raw': data[i]['correct_games']});
}
return ret_arr;
};
\ No newline at end of file