Competitors

Julia

We compare features among the following Julia packages:

HiddenMarkovModels.jl
HMMBase.jl
HMMGradients.jl

We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.

	HiddenMarkovModels.jl	HMMBase.jl	HMMGradients.jl
Algorithms^[1]	V, FB, BW	V, FB, BW	FB
Number types	anything	`Float64`	`AbstractFloat`
Observation types	anything	number or vector	anything
Observation distributions	DensityInterface.jl	Distributions.jl	manual
Multiple sequences	yes	no	yes
Priors / structures	possible	no	possible
Control dependency	yes	no	no
Automatic differentiation	yes	no	yes
Linear algebra speedup	yes	yes	no
Numerical stability	scaling+	scaling+	log

Very small probabilities

In all HMM algorithms, we work with probabilities that may become very small as time progresses. There are two main solutions for this problem: scaling and logarithmic computations. This package implements the Viterbi algorithm in log scale, but the other algorithms use scaling to exploit BLAS operations. As was done in HMMBase.jl, we enhance scaling with a division by the highest observation loglikelihood: instead of working with $b_{i,t} = \mathbb{P}(Y_t | X_t = i)$, we use $b_{i,t} / \max_i b_{i,t}$. See Formulas for details.

Python

We compare features among the following Python packages:

hmmlearn (based on NumPy)
pomegranate (based on PyTorch)
dynamax (based on JAX)

	hmmlearn	pomegranate	dynamax
Algorithms^[1]	V, FB, BW, VI	FB, BW	FB, V, BW, GD
Number types	NumPy formats	PyTorch formats	JAX formats
Observation types	number or vector	number or vector	number or vector
Observation distributions	hmmlearn catalogue	pomegranate catalogue	dynamax catalogue
Multiple sequences	yes	yes	yes
Priors / structures	yes	no	yes
Control dependency	no	no	yes
Automatic differentiation	no	yes	yes
Linear algebra speedup	yes	yes	yes
Numerical stability	scaling / log	log	log

1V = Viterbi, FB = Forward-Backward, BW = Baum-Welch, VI = Variational Inference, GD = Gradient Descent