Basics
Here we show how to use the essential ingredients of the package.
using Distributions
using HiddenMarkovModels
using LinearAlgebra
using Random
using StableRNGsrng = StableRNG(63);Model
The package provides a versatile HMM type with three main attributes:
- a vector
initof state initialization probabilities - a matrix
transof state transition probabilities - a vector
distsof observation distributions, one for each state
Any scalar- or vector-valued distribution from Distributions.jl can be used for the last part, as well as Custom distributions.
init = [0.6, 0.4]
trans = [0.7 0.3; 0.2 0.8]
dists = [MvNormal([-0.5, -0.8], I), MvNormal([0.5, 0.8], I)]
hmm = HMM(init, trans, dists)Hidden Markov Model with:
- initialization: [0.6, 0.4]
- transition matrix: [0.7 0.3; 0.2 0.8]
- observation distributions: [IsoNormal(
dim: 2
μ: [-0.5, -0.8]
Σ: [1.0 0.0; 0.0 1.0]
)
, IsoNormal(
dim: 2
μ: [0.5, 0.8]
Σ: [1.0 0.0; 0.0 1.0]
)
]Simulation
You can simulate a pair of state and observation sequences with rand by specifying how long you want them to be.
T = 20
state_seq, obs_seq = rand(rng, hmm, T);The state sequence is a vector of integers.
state_seq[1:3]3-element Vector{Int64}:
2
2
1The observation sequence is a vector whose elements have whatever type an observation distribution returns when sampled. Here we chose a multivariate normal distribution, so we get vectors at each time step.
In the case of multivariate observations, HMMBase.jl works with matrices, whereas HiddenMarkovModels.jl works with vectors of vectors. This allows us to accept more generic observations than just numbers or vectors inside the sequence.
obs_seq[1:3]3-element Vector{Vector{Float64}}:
[0.46430432030033214, -0.8538832667367795]
[0.07113025810845847, 0.35893904118945]
[-0.8923441428307395, -0.5712530157810823]In practical applications, the state sequence is not known, which is why we need inference algorithms to gather information about it.
Inference
The Viterbi algorithm (viterbi) returns:
- the most likely state sequence $\hat{X}_{1:T} = \underset{X_{1:T}}{\mathrm{argmax}}~\mathbb{P}(X_{1:T} \vert Y_{1:T})$,
- the joint loglikelihood $\mathbb{P}(\hat{X}_{1:T}, Y_{1:T})$ (in a vector of size 1).
best_state_seq, best_joint_loglikelihood = viterbi(hmm, obs_seq);
only(best_joint_loglikelihood)-57.69066363984475As we can see, the most likely state sequence is very close to the true state sequence, but not necessarily equal.
(state_seq .== best_state_seq)'1×20 adjoint(::BitVector) with eltype Bool:
0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1The forward algorithm (forward) returns:
- a matrix of filtered state marginals $\alpha[i, t] = \mathbb{P}(X_t = i | Y_{1:t})$,
- the loglikelihood $\mathbb{P}(Y_{1:T})$ of the observation sequence (in a vector of size 1).
filtered_state_marginals, obs_seq_loglikelihood_f = forward(hmm, obs_seq);
only(obs_seq_loglikelihood_f)-54.214899298272634At each time $t$, these filtered marginals take only the observations up to time $t$ into account. This is particularly useful to infer the marginal distribution of the last state.
filtered_state_marginals[:, T]2-element Vector{Float64}:
0.7717057517405977
0.2282942482594023The forward-backward algorithm (forward_backward) returns:
- a matrix of smoothed state marginals $\gamma[i, t] = \mathbb{P}(X_t = i | Y_{1:T})$,
- the loglikelihood $\mathbb{P}(Y_{1:T})$ of the observation sequence (in a vector of size 1).
smoothed_state_marginals, obs_seq_loglikelihood_fb = forward_backward(hmm, obs_seq);
only(obs_seq_loglikelihood_fb)-54.214899298272634At each time $t$, it takes all observations up to time $T$ into account. This is particularly useful during learning. Note that forward and forward-backward only coincide at the last time step.
filtered_state_marginals[:, T - 1] ≈ smoothed_state_marginals[:, T - 1]falsefiltered_state_marginals[:, T] ≈ smoothed_state_marginals[:, T]trueFinally, we provide a thin wrapper (logdensityof) around the forward algorithm for observation sequence loglikelihoods $\mathbb{P}(Y_{1:T})$.
logdensityof(hmm, obs_seq)-54.214899298272634Another function (joint_logdensityof) can compute joint loglikelihoods $\mathbb{P}(X_{1:T}, Y_{1:T})$ which take the states into account.
joint_logdensityof(hmm, obs_seq, state_seq)-60.170162049564425For instance, we can check that the output of Viterbi is at least as likely as the true state sequence.
joint_logdensityof(hmm, obs_seq, best_state_seq)-57.69066363984476Learning
The Baum-Welch algorithm (baum_welch) is a variant of Expectation-Maximization, designed specifically to estimate HMM parameters. Since it is a local optimization procedure, it requires a starting point that is close enough to the true model.
init_guess = [0.5, 0.5]
trans_guess = [0.6 0.4; 0.3 0.7]
dists_guess = [MvNormal([-0.4, -0.7], I), MvNormal([0.4, 0.7], I)]
hmm_guess = HMM(init_guess, trans_guess, dists_guess);Let's estimate parameters based on a slightly longer sequence.
_, long_obs_seq = rand(rng, hmm, 200)
hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, long_obs_seq);An essential guarantee of this algorithm is that the loglikelihood of the observation sequence keeps increasing as the model improves.
first(loglikelihood_evolution), last(loglikelihood_evolution)(-613.0225125416881, -605.3027257918021)We can check that the transition matrix estimate has improved.
cat(transition_matrix(hmm_est), transition_matrix(hmm); dims=3)2×2×2 Array{Float64, 3}:
[:, :, 1] =
0.676764 0.323236
0.197387 0.802613
[:, :, 2] =
0.7 0.3
0.2 0.8And so have the estimates for the observation distributions.
map(mean, hcat(obs_distributions(hmm_est), obs_distributions(hmm)))2×2 Matrix{Vector{Float64}}:
[-0.256414, -0.563875] [-0.5, -0.8]
[0.652218, 0.841332] [0.5, 0.8]On the other hand, the initialization is concentrated on one state. This effect can be mitigated by learning from several independent sequences.
hcat(initialization(hmm_est), initialization(hmm))2×2 Matrix{Float64}:
1.91765e-19 0.6
1.0 0.4Since HMMs are not identifiable up to a permutation of the states, there is no guarantee that state $i$ in the true model will correspond to state $i$ in the estimated model. This is important to keep in mind when testing new models.
Multiple sequences
In many applications, we have access to various observation sequences of different lengths.
nb_seqs = 1000
long_obs_seqs = [last(rand(rng, hmm, rand(rng, 100:200))) for k in 1:nb_seqs];
typeof(long_obs_seqs)Vector{Vector{Vector{Float64}}} (alias for Array{Array{Array{Float64, 1}, 1}, 1})Every algorithm in the package accepts multiple sequences in a concatenated form. The user must also specify where each sequence ends in the concatenated vector, by passing seq_ends as a keyword argument. Otherwise, the input will be treated as a unique observation sequence, which is mathematically incorrect.
long_obs_seq_concat = reduce(vcat, long_obs_seqs)
typeof(long_obs_seq_concat)Vector{Vector{Float64}} (alias for Array{Array{Float64, 1}, 1})seq_ends = cumsum(length.(long_obs_seqs))
seq_ends'1×1000 adjoint(::Vector{Int64}) with eltype Int64:
174 279 388 539 737 894 1007 1161 … 148712 148826 148945 149138The outputs of inference algorithms are then concatenated, and the associated loglikelihoods are split by sequence (in a vector of size length(seq_ends)).
best_state_seq_concat, best_joint_loglikelihood_concat = viterbi(
hmm, long_obs_seq_concat; seq_ends
);length(best_joint_loglikelihood_concat) == length(seq_ends)truelength(best_state_seq_concat) == last(seq_ends)trueThe function seq_limits returns the begin and end of a given sequence in the concatenated vector. It can be used to untangle the results.
start2, stop2 = seq_limits(seq_ends, 2)(175, 279)best_state_seq_concat[start2:stop2] == first(viterbi(hmm, long_obs_seqs[2]))trueWhile inference algorithms can also be run separately on each sequence without changing the results, considering multiple sequences together is nontrivial for Baum-Welch. That is why the package takes care of it automatically.
hmm_est_concat, _ = baum_welch(hmm_guess, long_obs_seq_concat; seq_ends);Our estimate should be a little better.
cat(transition_matrix(hmm_est_concat), transition_matrix(hmm); dims=3)2×2×2 Array{Float64, 3}:
[:, :, 1] =
0.700421 0.299579
0.203473 0.796527
[:, :, 2] =
0.7 0.3
0.2 0.8map(mean, hcat(obs_distributions(hmm_est_concat), obs_distributions(hmm)))2×2 Matrix{Vector{Float64}}:
[-0.502073, -0.800874] [-0.5, -0.8]
[0.504019, 0.801399] [0.5, 0.8]hcat(initialization(hmm_est_concat), initialization(hmm))2×2 Matrix{Float64}:
0.605645 0.6
0.394355 0.4This page was generated using Literate.jl.