Pytorch Cheat Sheet

Set Up Visible Devices For PyTorch It’s well-known that we can use os.environ["CUDA_VISIBLE_DEVICES"]="1,2" to determine which GPU can be used by the program, but as for PyTorch, most of answers says that there is no way to set visible devices in the python code for PyTorch. However, I found os.environ.setdefault can do this. import os import torch gpus = [1, 2] os.environ.setdefault("CUDA_VISIBLE_DEVICES", ','.join(map(str, gpus))) print(f"PyTorch detected number of availabel devices: {torch....

October 16, 2022 · 1 min · SY Chou

An SSH Guide

Set up SSH Login With Private Key On Windows Requirement: Install OpenSSH. Please follow the instruction of this page Change the directory to the main directory(EX: C:\Users\{USERNAME}) Step1: Generate SSH Key Pair 1 ssh-keygen -t rsa -f ".\.ssh\{FILE NAME OF KEY}" Step2: Create SSH Folder 1 ssh {LOGIN USERNAME ON THE TARGET HOST}@{TARGET HOST NAME/IP} mkdir -p .ssh Step3: Copy SSH Public Key To The Remote Host 1 cat ".\.ssh\{FILE NAME OF KEY}....

October 7, 2022 · 4 min · SY Chou

From EM To VBEM

1. Introduction When we use K-Means or GMM to solve clustering problem, the most important hyperparameter is the number of the cluster. It is quite hard to decide and cause the good/bad performance significantly. In the mean time, K-Means also cannot handle unbalanced dataset well. However, the variational Bayesian Gaussian mixture model(VB-GMM) can solve these. VB-GMM is a Bayesian model that contains priors over the parameters of GMM. Thus, VB-GMM can be optimized by variational Bayesian expectation maximization(VBEM) and find the optimal cluster number automatically....

July 9, 2021 · 6 min · SY Chou

A Review of SVM and SMO

Note: full code is on my github. 1. Abstract In this article, I will derive SMO algorithm and the Fourier kernel approximation which are well-known algorithm for kernel machine. SMO can solve optimization problem of SVM efficiently and the Fourier kernel approximation is a kind of kernel approximation that can speed up the computation of the kernel matrix. In the last section, I will conduct a evaluation of my manual SVM on the simulation dataset and “Women’s Clothing E-Commerce Review Dataset”....

July 8, 2021 · 17 min · SY Chou

Part II - Toward NNGP and NTK

Neural Tangent Kernel(NTK) “In short, NTK represent the changes of the weights before and after the gradient descent update” Let’s start the journey of revealing the black-box neural networks. Setup a Neural Network First of all, we need to define a simple neural network with 2 hidden layers $$ y(x, w)$$ where $y$ is the neural network with weights $w \in \mathbb{R}^m$ and, ${ x, \bar{y} }_N$ is the dataset which is a set of the input data and the output data with $N$ data points....

February 19, 2021 · 10 min · SY Chou

A Very Brief Introduction to Gaussian Process and Bayesian Optimization

Gaussian Process Big Picture and Background Intuitively, Gaussian distribution define the state space, while Gaussian Process define the function space Before we introduce Gaussian process, we should understand Gaussian distriution at first. For a RV(random variable) $X$ that follow Gaussian Distribution $\mathcal{N}(0, 1)$ should be following image: The P.D.F should be $$x \sim \mathcal{N}(\mu, \sigma) = \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{1}{2} (\frac{- \mu}{\sigma})^2}$$ As for Multivariate Gaussian Distribution, given 2 RV $x$, $y$ both 2 RV follow Gaussian Distribution $\mathcal{N}(0, 1)$ we can illustrate it as...

February 16, 2021 · 12 min · SY Chou

A Set of Shannon Entropy

Shannon Entropy For discrete random variable $X$ with events $\{ x_1, …, x_n \}$ and probability mass function $P(X)$, we defien the Shannon Entropy $H(X)$ as $$H(X) = E[-log_b \ P(X)] = - \sum_{i = 1}^{i = n} \ P(x_i) log_b \ P(x_i)$$ where $b$ is the base of the logarithm. The unit of Shannon entropy is bit for $b = 2$ while nat for $b = e$ The Perspective of Venn Diagram We can illustrate the relation between joint entropy, conditional entropy, and mutual entropy as the following figure...

February 23, 2021 · 3 min · SY Chou

部落格搬家記

因為寫DL筆記時會用到大量數學符號,就索性把原先在Github上的DL_DB_Quick_Notes搬過來了,配合LATEX寫筆記順手很多,原先的Repo應該只會剩下收集Paper用。而最近生活上有些轉折,也許也會順便放些隨筆雜記,但就依心情而定。 目前用的主題是PaperMod,整體設計算令人滿意,只不過在Deploy Hugo遇到蠻多麻煩,這邊簡單記錄一下 設定Github Page Action 參考PaperMod ExampleSite的gh-pages.yml設定,自己再作一些修改,大致如下 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 name: Build GH-Pages on: push: paths-ignore: - 'images/**' - 'LICENSE' - 'README.md' branches: - master workflow_dispatch: # manual run jobs: deploy: runs-on: ubuntu-latest steps: - name: Git checkout uses: actions/checkout@v2 with: ref: master - name: Get Theme run: git submodule update --init --recursive - name: Update theme to Latest commit run: git submodule update --remote --merge - name: Setup hugo uses: peaceiris/actions-hugo@v2 with: hugo-version: 'latest' - name: Build run: hugo --buildDrafts --gc --verbose --minify - name: Deploy uses: peaceiris/actions-gh-pages@v3 with: github_token: ${{ secrets....

February 16, 2021 · 2 min · SY Chou

Toward VB-GMM
  [draft]

Note: the code in R is on my Github 3. Variational Bayesian Gaussian Mixture Model(VB-GMM) 3.1 Graphical Model Gaussian Mixture Model & Clustering The variational Bayesian Gaussian mixture model(VB-GMM) can be represented as the above graphical model. We see each data point as a Gaussian mixture distribution with $K$ components. We also denote the number of data points as $N$. Each $x_n$ is a Gaussian mixture distribution with a weight $\pi_n$ corresponds to a data point....

July 9, 2021 · 11 min · SY Chou

A Paper Review: Learning to Adapt
  [draft]

Introduciton Propose an efficient method for online adaptation. The algorithm efficiently trains a global model that is capable of using its recent experiences to quickly adapt, achieving fast online adaptation in dynamic environments. They evaluate 2 version of approaches on stochastic continuous control tasks: (1) Recurrence-Based Adaptive Learner (ReBAL) (2) Gradient-Based Adaptive Learner (GrBAL) Objective Setting-Up To adapt the dynamic environment, we require a learned model $p_{\theta}^$ to adapt, using an update rule $u_{\psi}^$ after seeing M data points from some new “task”....

March 15, 2021 · 2 min · SY Chou