model-based RL

Part III - From AlphaGo to MuZero
^[draft]

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model It is just the paper proposing MuZero. MuZero is quite famous when I write this note(Jan 2021). Lots of people tried to reproduce the incredible performance of the paper. Some of well-known implementations like muzero-general give a clear and modular implementation of MuZero. If you are interested in MuZero, you can play with it. Well, let’s diving into the paper....

Part II - From AlphaGo to MuZero
^[draft]

Mastering the game of Go without human knowledge The paper propose AlphaGo Zero which is known as self-playing without human knowledge. Reinforcement learning in AlphaGo Zero $$ (p, v) = f_{\theta} $$ $$ l = (z - v)^2 - \pi^T log(p) + c||\theta||^2 $$ Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm The paper propose AlphaZero which is known as self-playing to compete any kinds of board game....

Part I - From AlphaGo to MuZero
^[draft]

AlphaGo is quite famous when I was a freshman of college. It somehow is the reason that I was addicted to Reinforcement Learning. Thus Our journey of model-based RL will start here. Although it is not the first one that propose model-based RL, I still believe it will give a big picture of model-based RL. Mastering the game of Go with deep neural networks and tree search Introduction AlphaGo combines 2 kinds of model, including policy network and value network....