Understanding kaldi lattice-prune and beam

Introduction

For a long time, I have been thinking that Kaldi lattice-prune uses a beam which keeps the specified number of candidates alive. How dare me. I finally took a time to properly understand this. 

There are several excellent Kaldi notes (for example Josh Meyer's blog), but I could not find information specifically about lattice-prune. My intension of this blog post is to leave information about Kaldi lattice-prune and --beam (assuming that someone uses Kaldi or hybrid ASR systems in 2022).

TL;DR 

lattice-prune --beam=4 for example deletes all paths of a lattice that exceed <best_path_cost> + 4, as can be seen in this line of the source code.

Explanation

To understand how Kaldi beam works, I took one of the utterances in the LibriSpeech dev set: 1272-128104-0000.

The lattice of this utterance should be in exp/chain_cleaned/tdnn_1d_sp/decode_dev_clean_tgsmall/lat.1.gz once the script kaldi/egs/librispeech/s5/run.sh is finished.

The whole lattice

The whole lattice of this utterance looks like this. There are many hypotheses in it.

This is the commands to generate the above visualisation.
lattice-to-fst \
  --lm-scale=0.0 \
  --acoustic-scale=1.0 \
  "ark:gunzip -c exp/chain_cleaned/tdnn_1d_spdecode_dev_clean_tgsmall/lat.1.gz |" \
  "scp,p:echo 1272-128104-0000 example_lattice_prune/1272-128104-0000.fst|"

fstdraw \
  --portrait=true \
  --osymbols=graph_tgsmall/words.txt \
  1272-128104-0000.fst | dot -Tpdf > 1272-128104-0000.pdf


10-best hypotheses

There are too many hypotheses to consider in the above lattice. To understand the effects of lattice-prune nicer, I generated 10-best hypotheses and total cost (AM+LM cost) of each hypothesis.
 # 10-best hypotheses
1272-128104-0000-1 MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE'RE GLAD TO WELCOME HIS GOSPEL
1272-128104-0000-2 MISTER QUILLED HER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE'RE GLAD TO WELCOME HIS GOSPEL
1272-128104-0000-3 MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WEIR GLAD TO WELCOME HIS GOSPEL
1272-128104-0000-4 MISTER QUILTER AS THE APOSTLE OF THE MIDDLE CLASSES AND WE'RE GLAD TO WELCOME HIS GOSPEL
1272-128104-0000-5 MISTER QUILT HER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE'RE GLAD TO WELCOME HIS GOSPEL
1272-128104-0000-6 MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL
1272-128104-0000-7 MISTER QUILTER EZ THE APOSTLE OF THE MIDDLE CLASSES AND WE'RE GLAD TO WELCOME HIS GOSPEL
1272-128104-0000-8 MISTER QUILLED HER IS THE APOSTLE OF THE MIDDLE CLASSES AND WEIR GLAD TO WELCOME HIS GOSPEL
1272-128104-0000-9 MISTER QUILTER AS THE APOSTLE OF THE MIDDLE CLASSES AND WEIR GLAD TO WELCOME HIS GOSPEL
1272-128104-0000-10 MISTER QUILT HER IS THE APOSTLE OF THE MIDDLE CLASSES AND WEIR GLAD TO WELCOME HIS GOSPEL

The followings are total cost of 10-best hypotheses.
# 10-best cost (AM+LM) cost
1272-128104-0000-1 -8795.8406
1272-128104-0000-2 -8790.8297
1272-128104-0000-3 -8790.7957
1272-128104-0000-4 -8790.562600000001
1272-128104-0000-5 -8790.4581
1272-128104-0000-6 -8789.1235
1272-128104-0000-7 -8786.115300000001
1272-128104-0000-8 -8785.7848
1272-128104-0000-9 -8785.517600000001
1272-128104-0000-10 -8785.413199999999

10-best hypotheses and AM/LM costs are generated with the following commands.
lattice-to-nbest \
  --acoustic-scale=1.0 \
  --n=10 \
  "ark:gunzip -c exp/chain_cleaned/decode_dev_clean_tgsmall/lat.1.gz |" \
  ark:- | \
    nbest-to-linear \
      ark:- \
      "ark:/dev/null" \
      'ark,t:|utils/int2sym.pl -f 2- exp/chain_cleaned/graph_tgsmall/words.txt > nbest-10' \
      'ark,t:lmcost-10' \
      "ark,t:amcost-10"

Apply pruning 

Finally, I have all the information necessary to examine how lattice-prune is applied with Kaldi. As mentioned at the beginning of this post, lattice-prune deletes all of the paths in a lattice that exceed <best_path_cost> + <beam>

Looking at the costs of the above 10-best hypotheses:
  • beam=4 would preserve only the 1-best hypothesis
    -8795.8406 + 4 = -8791.8406 and this value is still lower than any of the other hypotheses.

  • beam=5.04 would preserve {1,2}-best hypotheses
    -8795.8406 + 5.04 = -8790.8006 and this value is lower than 2nd best hypothesis but not 3rd best hypothesis.

  • beam=5.05 would preserve {1,2,3}-best hypotheses
    -8795.8406 + 5.04 = -8790.7906 and this value is lower than 2nd and 3rd best hypotheses.

This is the lattice after pruning with beam=4. As expected, all of the hypotheses but the 1st ranked hypothesis are removed.
Now changing to beam=5.04 and we can see that there are two alternatives in the pruned lattice.
Changing beam to 5.05 preserves another hypothesis in the lattice. Interestingly, preserving the 3rd best hypothesis in the lattice generates another hypothesis whose rank I am not sure about.

Conclusion

This blog post examined how Kaldi applies lattice-pruning --beam. A person like me can think that lattice-pruning --beam specifies the number of candidates alive at one node, but this is not correct. lattice-prune deletes all of the paths in a lattice that exceed <best_path_cost> + <beam>.

Comments

Popular posts from this blog

How wav2vec2.0 takes input audio data

Setup git pre-push hook

SLT 2022 Notes