Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model It is just the paper proposing MuZero. MuZero is quite famous when I write this note(Jan 2021). Lots of people tried to reproduce the incredible performance of the paper. Some of well-known implementations like muzero-general give a clear and modular implementation of MuZero. If you are interested in MuZero, you can play with it. Well, let’s diving into the paper....
Part II - From AlphaGo to MuZero
[draft]
Mastering the game of Go without human knowledge The paper propose AlphaGo Zero which is known as self-playing without human knowledge.
Reinforcement learning in AlphaGo Zero $$ (p, v) = f_{\theta} $$
$$ l = (z - v)^2 - \pi^T log(p) + c||\theta||^2 $$
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm The paper propose AlphaZero which is known as self-playing to compete any kinds of board game....
Part I - From AlphaGo to MuZero
[draft]
AlphaGo is quite famous when I was a freshman of college. It somehow is the reason that I was addicted to Reinforcement Learning. Thus Our journey of model-based RL will start here. Although it is not the first one that propose model-based RL, I still believe it will give a big picture of model-based RL.
Mastering the game of Go with deep neural networks and tree search Introduction AlphaGo combines 2 kinds of model, including policy network and value network....