Mastering the game of Go without human knowledge
The paper propose AlphaGo Zero which is known as self-playing without human knowledge.
Reinforcement learning in AlphaGo Zero
$$ (p, v) = f_{\theta} $$
$$ l = (z - v)^2 - \pi^T log(p) + c||\theta||^2 $$
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
The paper propose AlphaZero which is known as self-playing to compete any kinds of board game.