Softmax Strategy

 1. epsilon-greedy strategy 11111 2. UCB strategy 222 3. Softmax  strategy
 1. epsilon-greedy strategy

11111

2. UCB strategy

222

3. Softmax  strategy

333

4. Gradient strategy

444

References

[1] 科学网—【RL系列】Multi-Armed Bandit笔记——Softmax选择策略 - 管金昱的博文

[2] The Epsilon-Greedy Algorithm | James D. McCaffrey