Presented on 2018.04.12
by Brian Zier (@bzier)
-
Reinforcement Learning (adapted from lecture series; see resouces secion below)
6:23
Reinforcement Learning venn diagram (slide 6
)- Reward hypothesis (
slide 13
):All goals can be described by the maximisation of expected cumulative reward
- Goal: Select actions to maximise total future reward
29:35
Agent and Environment (slide 16
)47:54
Rat Example (slide 22
)57:10
Major Components of an RL Agent (slide 25
)- Policy: Map from state -> action
- Value: Prediction (expectation) of future reward
- Model: Predicts what the environment will do next
- Transitions: next state
- Reward: next reward
-
Gym environments & building gym-mupen64plus
- Defining the reward function
- Progress detection
-
Dependency / setup challenges
- Screenshot offsets / emulator position
- XVFB
- wxPython
-
Docker solution
- Dependencies (including versioning) explicitly defined in the
Dockerfile
Dockerfile
committed with repodocker-compose
for container run-time configuration (individual dependent processes, networking, commands, volumes, etc)
- Dependencies (including versioning) explicitly defined in the
-
Future Work
- More/all courses (including random)
- Transfer learning from one course to another
- Multiplayer (challenging due to current progress detection)
- Battle mode (completely different reward function)
- Lecture video series starts here with lecture 1
- Lecture slides here
- gym-mupen64plus environment repo here
- Forked A3C agent here
Clone the two repos in the resources section and checkout the appropriate branches:
gym-mupen64plus
->dockerize
universe-starter-agent
->mario-kart-agent
Follow the setup instructions in the README
files. If you bump into any issues getting up and running, please reach out to me by filing an issue on the GitHub repository. I usually check every day-ish for new notifications and respond fairly quickly. There may be deficiencies or mistakes in the instructions and I'd like to know so I can address them.