# Basics

Here we show how to use the essential ingredients of the package.

```
using Distributions
using HiddenMarkovModels
using LinearAlgebra
using Random
using StableRNGs
```

`rng = StableRNG(63);`

## Model

The package provides a versatile `HMM`

type with three main attributes:

- a vector
`init`

of state initialization probabilities - a matrix
`trans`

of state transition probabilities - a vector
`dists`

of observation distributions, one for each state

Any scalar- or vector-valued distribution from Distributions.jl can be used for the last part, as well as Custom distributions.

```
init = [0.6, 0.4]
trans = [0.7 0.3; 0.2 0.8]
dists = [MvNormal([-0.5, -0.8], I), MvNormal([0.5, 0.8], I)]
hmm = HMM(init, trans, dists)
```

```
Hidden Markov Model with:
- initialization: [0.6, 0.4]
- transition matrix: [0.7 0.3; 0.2 0.8]
- observation distributions: [IsoNormal(
dim: 2
μ: [-0.5, -0.8]
Σ: [1.0 0.0; 0.0 1.0]
)
, IsoNormal(
dim: 2
μ: [0.5, 0.8]
Σ: [1.0 0.0; 0.0 1.0]
)
]
```

## Simulation

You can simulate a pair of state and observation sequences with `rand`

by specifying how long you want them to be.

```
T = 20
state_seq, obs_seq = rand(rng, hmm, T);
```

The state sequence is a vector of integers.

`state_seq[1:3]`

```
3-element Vector{Int64}:
2
2
1
```

The observation sequence is a vector whose elements have whatever type an observation distribution returns when sampled. Here we chose a multivariate normal distribution, so we get vectors at each time step.

In the case of multivariate observations, HMMBase.jl works with matrices, whereas HiddenMarkovModels.jl works with vectors of vectors. This allows us to accept more generic observations than just numbers or vectors inside the sequence.

`obs_seq[1:3]`

```
3-element Vector{Vector{Float64}}:
[0.46430432030033214, -0.8538832667367795]
[0.07113025810845847, 0.35893904118945]
[-0.8923441428307395, -0.5712530157810823]
```

In practical applications, the state sequence is not known, which is why we need inference algorithms to gather information about it.

## Inference

The **Viterbi algorithm** (`viterbi`

) returns:

- the most likely state sequence $\hat{X}_{1:T} = \underset{X_{1:T}}{\mathrm{argmax}}~\mathbb{P}(X_{1:T} \vert Y_{1:T})$,
- the joint loglikelihood $\mathbb{P}(\hat{X}_{1:T}, Y_{1:T})$ (in a vector of size 1).

```
best_state_seq, best_joint_loglikelihood = viterbi(hmm, obs_seq);
only(best_joint_loglikelihood)
```

`-57.69066363984475`

As we can see, the most likely state sequence is very close to the true state sequence, but not necessarily equal.

`(state_seq .== best_state_seq)'`

```
1×20 adjoint(::BitVector) with eltype Bool:
0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
```

The **forward algorithm** (`forward`

) returns:

- a matrix of filtered state marginals $\alpha[i, t] = \mathbb{P}(X_t = i | Y_{1:t})$,
- the loglikelihood $\mathbb{P}(Y_{1:T})$ of the observation sequence (in a vector of size 1).

```
filtered_state_marginals, obs_seq_loglikelihood_f = forward(hmm, obs_seq);
only(obs_seq_loglikelihood_f)
```

`-54.214899298272634`

At each time $t$, these filtered marginals take only the observations up to time $t$ into account. This is particularly useful to infer the marginal distribution of the last state.

`filtered_state_marginals[:, T]`

```
2-element Vector{Float64}:
0.7717057517405977
0.2282942482594023
```

The forward-backward algorithm (`forward_backward`

) returns:

- a matrix of smoothed state marginals $\gamma[i, t] = \mathbb{P}(X_t = i | Y_{1:T})$,
- the loglikelihood $\mathbb{P}(Y_{1:T})$ of the observation sequence (in a vector of size 1).

```
smoothed_state_marginals, obs_seq_loglikelihood_fb = forward_backward(hmm, obs_seq);
only(obs_seq_loglikelihood_fb)
```

`-54.214899298272634`

At each time $t$, it takes all observations up to time $T$ into account. This is particularly useful during learning. Note that forward and forward-backward only coincide at the last time step.

`filtered_state_marginals[:, T - 1] ≈ smoothed_state_marginals[:, T - 1]`

`false`

`filtered_state_marginals[:, T] ≈ smoothed_state_marginals[:, T]`

`true`

Finally, we provide a thin wrapper (`logdensityof`

) around the forward algorithm for observation sequence loglikelihoods $\mathbb{P}(Y_{1:T})$.

`logdensityof(hmm, obs_seq)`

`-54.214899298272634`

Another function (`joint_logdensityof`

) can compute joint loglikelihoods $\mathbb{P}(X_{1:T}, Y_{1:T})$ which take the states into account.

`joint_logdensityof(hmm, obs_seq, state_seq)`

`-60.170162049564425`

For instance, we can check that the output of Viterbi is at least as likely as the true state sequence.

`joint_logdensityof(hmm, obs_seq, best_state_seq)`

`-57.69066363984476`

## Learning

The Baum-Welch algorithm (`baum_welch`

) is a variant of Expectation-Maximization, designed specifically to estimate HMM parameters. Since it is a local optimization procedure, it requires a starting point that is close enough to the true model.

```
init_guess = [0.5, 0.5]
trans_guess = [0.6 0.4; 0.3 0.7]
dists_guess = [MvNormal([-0.4, -0.7], I), MvNormal([0.4, 0.7], I)]
hmm_guess = HMM(init_guess, trans_guess, dists_guess);
```

Let's estimate parameters based on a slightly longer sequence.

```
_, long_obs_seq = rand(rng, hmm, 200)
hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, long_obs_seq);
```

An essential guarantee of this algorithm is that the loglikelihood of the observation sequence keeps increasing as the model improves.

`first(loglikelihood_evolution), last(loglikelihood_evolution)`

`(-613.0225125416881, -605.3027257918021)`

We can check that the transition matrix estimate has improved.

`cat(transition_matrix(hmm_est), transition_matrix(hmm); dims=3)`

```
2×2×2 Array{Float64, 3}:
[:, :, 1] =
0.676764 0.323236
0.197387 0.802613
[:, :, 2] =
0.7 0.3
0.2 0.8
```

And so have the estimates for the observation distributions.

`map(mean, hcat(obs_distributions(hmm_est), obs_distributions(hmm)))`

```
2×2 Matrix{Vector{Float64}}:
[-0.256414, -0.563875] [-0.5, -0.8]
[0.652218, 0.841332] [0.5, 0.8]
```

On the other hand, the initialization is concentrated on one state. This effect can be mitigated by learning from several independent sequences.

`hcat(initialization(hmm_est), initialization(hmm))`

```
2×2 Matrix{Float64}:
1.91765e-19 0.6
1.0 0.4
```

Since HMMs are not identifiable up to a permutation of the states, there is no guarantee that state $i$ in the true model will correspond to state $i$ in the estimated model. This is important to keep in mind when testing new models.

## Multiple sequences

In many applications, we have access to various observation sequences of different lengths.

```
nb_seqs = 300
long_obs_seqs = [last(rand(rng, hmm, rand(rng, 100:200))) for k in 1:nb_seqs];
typeof(long_obs_seqs)
```

`Vector{Vector{Vector{Float64}}} (alias for Array{Array{Array{Float64, 1}, 1}, 1})`

Every algorithm in the package accepts multiple sequences in a concatenated form. The user must also specify where each sequence ends in the concatenated vector, by passing `seq_ends`

as a keyword argument. Otherwise, the input will be treated as a unique observation sequence, which is mathematically incorrect.

```
long_obs_seq_concat = reduce(vcat, long_obs_seqs)
typeof(long_obs_seq_concat)
```

`Vector{Vector{Float64}} (alias for Array{Array{Float64, 1}, 1})`

```
seq_ends = cumsum(length.(long_obs_seqs))
seq_ends'
```

```
1×300 adjoint(::Vector{Int64}) with eltype Int64:
174 279 388 539 737 894 1007 … 44408 44557 44737 44855 45032
```

The outputs of inference algorithms are then concatenated, and the associated loglikelihoods are split by sequence (in a vector of size `length(seq_ends)`

).

```
best_state_seq_concat, best_joint_loglikelihood_concat = viterbi(
hmm, long_obs_seq_concat; seq_ends
);
```

`length(best_joint_loglikelihood_concat) == length(seq_ends)`

`true`

`length(best_state_seq_concat) == last(seq_ends)`

`true`

The function `seq_limits`

returns the begin and end of a given sequence in the concatenated vector. It can be used to untangle the results.

`start2, stop2 = seq_limits(seq_ends, 2)`

`(175, 279)`

`best_state_seq_concat[start2:stop2] == first(viterbi(hmm, long_obs_seqs[2]))`

`true`

While inference algorithms can also be run separately on each sequence without changing the results, considering multiple sequences together is nontrivial for Baum-Welch. That is why the package takes care of it automatically.

`hmm_est_concat, _ = baum_welch(hmm_guess, long_obs_seq_concat; seq_ends);`

Our estimate should be a little better.

`cat(transition_matrix(hmm_est_concat), transition_matrix(hmm); dims=3)`

```
2×2×2 Array{Float64, 3}:
[:, :, 1] =
0.69677 0.30323
0.207515 0.792485
[:, :, 2] =
0.7 0.3
0.2 0.8
```

`map(mean, hcat(obs_distributions(hmm_est_concat), obs_distributions(hmm)))`

```
2×2 Matrix{Vector{Float64}}:
[-0.503771, -0.796337] [-0.5, -0.8]
[0.509728, 0.805274] [0.5, 0.8]
```

`hcat(initialization(hmm_est_concat), initialization(hmm))`

```
2×2 Matrix{Float64}:
0.629082 0.6
0.370918 0.4
```

*This page was generated using Literate.jl.*