Hyperparameter tuning and training optimisation using RLlib, Tune and Neptune.
This scripts implements a basic Logger for Tune in order to send training data to neptune.ai
You must have a .neptune-key
file with a valid API key in the same directory as the python script.
You must replace the project_qualified_name
variable at line 43
by your own value.
Dependencies:
pip install ray "ray[rllib]" "ray[tune]" neptune-client
Command line usage:
python3.8 main.py [--training-iterations={unsigned int}]
Basic RLlib CartPole-v0
experiment.
Distributed accross 4 trials with different learning rates ([1., 0.1, 0.01, 0.001]
) usine tune.grid_search()
.
TUNE_CONFIG = {
"env": "CartPole-v0",
"lr": tune.grid_search([1., 0.1, 0.01, 0.001]),
"log_level": "ERROR",
}
It was found that Optimal learning rate is 0.001
.
Below are graphs depicting the evolution of the reward.
We can see that the trial 18db8_00000
was evicted at iteration 4
by the ASHAScheduler
due to poor results.
In a similar experience, the results were the following
+-----------------------------+------------+-------+-------+--------+------------------+-------+----------+
| Trial name | status | loc | lr | iter | total time (s) | ts | reward |
|-----------------------------+------------+-------+-------+--------+------------------+-------+----------|
| PPO_CartPole-v0_708c7_00000 | TERMINATED | | 1 | 1 | 10.398 | 4000 | 22.3296 |
| PPO_CartPole-v0_708c7_00001 | TERMINATED | | 0.1 | 10 | 62.7582 | 40000 | 14.9813 |
| PPO_CartPole-v0_708c7_00002 | TERMINATED | | 0.01 | 1 | 8.89217 | 4000 | 22.0552 |
| PPO_CartPole-v0_708c7_00003 | TERMINATED | | 0.001 | 10 | 47.4788 | 40000 | 185.32 |
+-----------------------------+------------+-------+-------+--------+------------------+-------+----------+