Markov Chain Monte Carlo (Metropolis-Hastings algorithm)

Data science

Markov Chain Monte Carlo (Metropolis-Hastings algorithm)

2018. 5. 14. 17:28

Markov Chain Monte Carlo

Basic Idea

어떤 분포(pdf)로부터 데이터를 sampling 하거나, 분포의 기댓값을 구하기 위해 사용하는 방법이다. high-dimensional pdf일 경우 이 pdf에서 데이터를 샘플링하기가 힘든데, 이를 하기 위한 방법이 MCMC이다. 또는 MCMC는 uncertain parameter model에서 parameter을 추정할 때 쓰기도 한다.

기본 아이디어는 다음과 같다.

1. 시작점을 정하고 그 지점에서의 확률(pdf(x))을 구한다.

2. 그 지점에서 이동할 수 있는 임의의 지점을 정한다. (이를 주로 "random walk" 라고 부른다.)

3. 만약 그 지점의 확률이 이전 지점보다 높다면 이동하고, 낮으면 이동하지 않거나, 어떤 확률로 이동한다. (이는 강화학습에서의 exploit & explore의 원리와 비슷하다.)

이러한 절차를 계속하다보면 실제 pdf에서 데이터가 나온 것처럼 샘플링할 수 있다. 왜 이것이 Markkov Chain인가하면, 2번 단계에서 임의의 지점을 정할 때, 이전 단계에만 의존적이기 때문이다.

MCMC를 구현하는 알고리즘은 크게 두 가지 방법이 있는데 우선 Metropolis-Hastings algorithm을 살펴보자.

Metropolis-Hastings algorithm

여기서 Q는 proposal distribution P*(x)는 high-dimensional pdf 이다. 이 때, Q는 random walk를 샘플링할 분포로 대칭 분포여야하며, 잘 설정되어야한다. (즉, Q(x|y) = Q(y|x) 여야 한다.) 이 알고리즘을 먼저 간단히 기술해보면, x(1)을 임의로 고른 후, Q(x;x(1)) 에서 x(2)를 샘플링한다. 이 때, P*(x(2))가 P*(x(1))보다 크면, alpha의 확률로 accpet해서 이동하고, 1-alpha의 확률로 reject한다. 이 때, alpha는 acceptance ratio인데 alpha = P*(x(2))/P*(x(1)) 이다. 즉, 이동한 점의 pdf값이 이전 값보다 크면 무조건 이동하고, 이전값보다 작으면 일정 확률로 이동하는데 이 일정확률은 이후값과 이전값의 ratio이다.

위키피디아의 좀 더 자세한 알고리즘도를 보면 아래와 같다.

Initialization: Choose an arbitrary point x₀ to be the first sample, and choose an arbitrary probability density $g(x|y)$ (sometimes written $Q(x|y)$ ) that suggests a candidate for the next sample value x, given the previous sample value y. For the Metropolis algorithm, g must be symmetric; in other words, it must satisfy $g(x|y)=g(y|x)$ . A usual choice is to let $g(x|y)$ be a Gaussian distribution centered at y, so that points closer to y are more likely to be visited next—making the sequence of samples into a random walk. The function g is referred to as the proposal density or jumping distribution.
For each iteration t:
Generate : Generate a candidate x for the next sample by picking from the distribution $g(x'|x_{t})$ .
Calculate : Calculate the acceptance ratio ${\textstyle \alpha =f(x')/f(x_{t})}$ , which will be used to decide whether to accept or reject the candidate. Because f is proportional to the density of P, we have that $\alpha =f(x')/f(x_{t})=P(x')/P(x_{t})$ .
Accept or Reject :
Generate a uniform random number u on [0,1].
If $u\leq \alpha$ accept the candidate by setting $x_{t+1}=x'$ ,
If $u>\alpha$ reject the candidate and set $x_{t+1}=x_{t}$ , instead.

MCMC(Metropolis-Hastings algorithm)를 통해 파라미터를 추정(파라미터를 샘플링)하는 것도 이와 비슷한 절차를 따른다.

1. 모델의 초기 파라미터 값을 정한다. (이는 랜덤으로 정해도 되고, 기존 문헌의 정보가 있다면 이를 활용한다.)

2. 파라미터를 랜덤하게 선택해서 "이후 파라미터"를 구한다.

3. 초기 파라미터에서의 cost와 이후 파라미터의 cost를 비교해, 이후 파라미터의 cost가 작으면 파라미터를 업데이트한다. (ex. cost = Mean absolute error)

4. 만약 이후 파라미터의 cost가 크면 alpha의 확률로 파라미터를 업데이트 한다.

5. alpha는 예를 들어 exp(-(cost(이후)-cost(이전)) 으로 정할 수 있다. (이후 cost가 크면 이 값은 0에서 1사이의 값을 같기 때문에 확률로 쓰는것)

최종적으로 MCMC를 통해 나오는 것은 파라미터의 분포이다. 이 분포에서 어떤 값을 파라미터의 추정값으로 쓸 것인가도 하나의 문제이다. 예를 들어, MLE, 평균 등을 쓸 수 있다.

참고

https://www.youtube.com/watch?v=h1NOS_wxgGg

'Data science' 카테고리의 다른 글

딥러닝 모델의 K-겹 교차검증 (K-fold Cross Validation) (2)	2018.08.15
R로 구현하는 Gibbs Sampling (1)	2018.05.17
데이터베이스 정규화 1NF, 2NF, 3NF, BCNF (38)	2017.10.19
Gradient Boosting 의 기초 (0)	2017.09.09
XGBoost 모델에서의 변수선택 (0)	2017.09.03

Deepplay interested in data analytics and ML modeling

admin write link

notice

블로그 운영 정보

my link

statistics

total :
today :
yesterday :

Data science