deep learning

Pytorch Cheat Sheet

Set Up Visible Devices For PyTorch It’s well-known that we can use os.environ["CUDA_VISIBLE_DEVICES"]="1,2" to determine which GPU can be used by the program, but as for PyTorch, most of answers says that there is no way to set visible devices in the python code for PyTorch. However, I found os.environ.setdefault can do this. import os import torch gpus = [1, 2] os.environ.setdefault("CUDA_VISIBLE_DEVICES", ','.join(map(str, gpus))) print(f"PyTorch detected number of availabel devices: {torch....

Part II - Toward NNGP and NTK

Neural Tangent Kernel(NTK) “In short, NTK represent the changes of the weights before and after the gradient descent update” Let’s start the journey of revealing the black-box neural networks. Setup a Neural Network First of all, we need to define a simple neural network with 2 hidden layers $$ y(x, w)$$ where $y$ is the neural network with weights $w \in \mathbb{R}^m$ and, ${ x, \bar{y} }_N$ is the dataset which is a set of the input data and the output data with $N$ data points....

A Paper Review: Learning to Adapt
^[draft]

Introduciton Propose an efficient method for online adaptation. The algorithm efficiently trains a global model that is capable of using its recent experiences to quickly adapt, achieving fast online adaptation in dynamic environments. They evaluate 2 version of approaches on stochastic continuous control tasks: (1) Recurrence-Based Adaptive Learner (ReBAL) (2) Gradient-Based Adaptive Learner (GrBAL) Objective Setting-Up To adapt the dynamic environment, we require a learned model $p_{\theta}^$ to adapt, using an update rule $u_{\psi}^$ after seeing M data points from some new “task”....

Part I - Toward NNGP and NTK
^[draft]

Neural Network Gaussian Process(NNGP) Model the neural network as GP, aka neural network Gaussian Process(NNGP). Intuitively, the kernel of NNGP compute the distance between the output vectors of 2 input data points. We define the following functions as neural networks with fully-conntected layers: $$z_{i}^{1}(x) = b_i^{1} + \sum_{j=1}^{N_1} \ W_{ij}^{1}x_j^1(x), \ \ x_{j}^{1}(x) = \phi(b_i^{0} + \sum_{k=1}^{d_{in}} \ W_{ik}^{0}x_k(x))$$ where $b_i^{1}$ is the $i$th-bias of the second layer(the same as first hidden layer), $W_{ij}^{1}$ is the $i$th-weights of the first layer(the same as input layer) , function $\phi$ is the activation function, and $x$ is the input data of the neural network....

Part III - From AlphaGo to MuZero
^[draft]

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model It is just the paper proposing MuZero. MuZero is quite famous when I write this note(Jan 2021). Lots of people tried to reproduce the incredible performance of the paper. Some of well-known implementations like muzero-general give a clear and modular implementation of MuZero. If you are interested in MuZero, you can play with it. Well, let’s diving into the paper....

Part II - From AlphaGo to MuZero
^[draft]

Mastering the game of Go without human knowledge The paper propose AlphaGo Zero which is known as self-playing without human knowledge. Reinforcement learning in AlphaGo Zero $$ (p, v) = f_{\theta} $$ $$ l = (z - v)^2 - \pi^T log(p) + c||\theta||^2 $$ Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm The paper propose AlphaZero which is known as self-playing to compete any kinds of board game....

Simple Guide Of VDN And QMIX
^[draft]

Value-Decomposition Network(VDN) QMIX Problem Setup And Assumption Constraint The QMIX imporve the VDN algorithm via give a more general form of the contraint. It defines the contraint like $$\frac{\partial Q_{tot}}{\partial Q_{a}} \geq 0, \forall a$$ where $Q_{tot}$ is the joint value function and $Q_{a}$ is the value function for each agent. An intuitive eplaination is that we want the weights of any individual value function $Q_{a}$ are positive. If the weights of individual value function $Q_{a}$ are negative, it will discourage the agent to cooperate, since the higher $Q_{a}$, the lower joint value $Q_{tot}$....

Part I - From AlphaGo to MuZero
^[draft]

AlphaGo is quite famous when I was a freshman of college. It somehow is the reason that I was addicted to Reinforcement Learning. Thus Our journey of model-based RL will start here. Although it is not the first one that propose model-based RL, I still believe it will give a big picture of model-based RL. Mastering the game of Go with deep neural networks and tree search Introduction AlphaGo combines 2 kinds of model, including policy network and value network....