Reinforcement Learning

Markov decision process

Planning and learning

Bellman equation

$$ U(s) = R(s) + \gamma max_{s'} { U(s') } $$

Q learning

$\epsilon$-greedy learning

Exploration and exploitation