Dynamic Program Generalization
I've made the bayesian bandit harness completely split from the graph-based optimal solver. This will allow us to easily integrate other algorithms into the harness, so long as they have a function with the following prototype they can simply be dropped in:
def select_action(self, prior: Tuple[BetaPrior]) -> int:
where the return value is the index of the action that should be pulled and BetaPrior
is a named tuple:
BetaPrior = namedtuple("BetaPrior", ['alpha', 'beta'])
A definition of BetaPrior
is located in bayesian_util.py
- this definition should be imported/used for other algorithm implementations.