Hidden Markov Model

Notes taken by Colin Bauer

12/03/01



Hidden Markov Models consist of the following

States

Initialization

State transition Matrix

Confusion Matrix



Example: Confusion Matrix for Weather


Sunny

Cloudy

Rainy

Sunny

0.5

0.25

0.25

Cloudy

0.125

0.125

0.325

Rainy

0.125

0.625

0.375



Columns add to one, since it is an exhaustive ennumeration of cases and one of them has to be true

Rows do not have to add to 1, Example:




Here the row adds to one, but each column does not.

Confusion Matrix for Seaweed:


Dry

Dryish

Damp

Soggy

Sunny

0.6

0.2

0.15

0.05

Cloudy

0.25

0.25

0.25

0.25

Rainy

0.05

0.1

0.35

0.5



Transition Matrix:






Markov-Assumption (n-ordered model): Predict next state based on the n previous states.

Example: n=2

Given two sunny days, we want to compute the prob. of the next day being cloudy.

P[cloudy | sunny, sunny)

sunny,sunny,cloudy,rainy,sunny

This is computed the following way: 1*p1*p8*p5*p3





Decoding






Why use phonemes?

All of the following get Markov Models:

P[words | signals] = P[words]*P[signals | words] / p[signals]

P[phoneme | signals] = P[phoneme]*P[signals | phoneme] / P[signals]

P[words | phoneme] = P[words]*P[phoneme | words] / P[phoneme]



Hidden Markov Model for phoneme [m]:






Simple case:

P[ [m] | c1,c4,c6] = ?

What paths could have created c1, c4, c6?

Only one: Onset -> Mid -> End, P[ [m] | c1,c4,c6] = 0.7*0.1*0.6*0.7*0.1*0.6



Complex case:

P[ [m] | c1,c1,c4,c4,c6,c6]=?

Many paths, for example:

P1 = 0.3*0.7*0.9*0.1*0.4*0.6*(0.5*0.5*0.7*0.7*0.5*0.5)

P2= ...

P = P1 + P2



P[ words ] is easier, since it is more informed. It is possible to give probabilities for a sequence of words.

Example:

of the (bigram, taking only last word)

of the students (trigram, taking two last words)






Different pronunciations of tomato: