statistics

From EM To VBEM

1. Introduction When we use K-Means or GMM to solve clustering problem, the most important hyperparameter is the number of the cluster. It is quite hard to decide and cause the good/bad performance significantly. In the mean time, K-Means also cannot handle unbalanced dataset well. However, the variational Bayesian Gaussian mixture model(VB-GMM) can solve these. VB-GMM is a Bayesian model that contains priors over the parameters of GMM. Thus, VB-GMM can be optimized by variational Bayesian expectation maximization(VBEM) and find the optimal cluster number automatically....

Toward VB-GMM
^[draft]

Note: the code in R is on my Github 3. Variational Bayesian Gaussian Mixture Model(VB-GMM) 3.1 Graphical Model Gaussian Mixture Model & Clustering The variational Bayesian Gaussian mixture model(VB-GMM) can be represented as the above graphical model. We see each data point as a Gaussian mixture distribution with $K$ components. We also denote the number of data points as $N$. Each $x_n$ is a Gaussian mixture distribution with a weight $\pi_n$ corresponds to a data point....

A Guide Of Variational Lower Bound
^[draft]

Problem Setup The Variational Lower Bound is also knowd as Evidence Lower Bound(ELBO) or VLB. It is quite useful that we can derive a lower bound of a model containing a hidden variable. Futhermore, we can even maximize the bound to maximize the log probability. We can assume that $X$ are observations (data) and $Z$ are hidden/latent variables which is unobservable. In general, we can also imagine $Z$ as a parameter and the relationship between $Z$ and $X$ are represented as the following...

Some Intuition Of MLE, MAP, and Bayesian Estimation
^[draft]

The main different between 3 kinds of estimation is What do we assume for the prior? The Maximum Likelihood Estimation(MLE) doesn’t use any prior but only maiximize the probability according to the samples. On the other hand, MAP and Bayesian both use priors to estimate the probability. The Maximum A Posteriori(MAP) only use the probability of single event while Bayesian Estimation see a distribution as the prior. To be continue…...