Understanding kaldi lattice-prune and beam
Introduction
For a long time, I have been thinking that Kaldi lattice-prune uses a beam which keeps the specified number of candidates alive. How dare me. I finally took a time to properly understand this.
There are several excellent Kaldi notes (for example Josh Meyer's blog), but I could not find information specifically about lattice-prune. My intension of this blog post is to leave information about Kaldi lattice-prune and --beam (assuming that someone uses Kaldi or hybrid ASR systems in 2022).
TL;DR
lattice-prune --beam=4 for example deletes all paths of a lattice that exceed
<best_path_cost> + 4, as can be seen in
this line of the source code.
Explanation
To understand how Kaldi beam works, I took one of the utterances in the
LibriSpeech dev set:
1272-128104-0000.
The lattice of this utterance should be in exp/chain_cleaned/tdnn_1d_sp/decode_dev_clean_tgsmall/lat.1.gz once the script kaldi/egs/librispeech/s5/run.sh is finished.
The whole lattice
The whole lattice of this utterance looks like this. There are many
hypotheses in it.
This is the commands to generate the above visualisation.
lattice-to-fst \ --lm-scale=0.0 \ --acoustic-scale=1.0 \ "ark:gunzip -c exp/chain_cleaned/tdnn_1d_spdecode_dev_clean_tgsmall/lat.1.gz |" \ "scp,p:echo 1272-128104-0000 example_lattice_prune/1272-128104-0000.fst|" fstdraw \ --portrait=true \ --osymbols=graph_tgsmall/words.txt \ 1272-128104-0000.fst | dot -Tpdf > 1272-128104-0000.pdf
10-best hypotheses
There are too many hypotheses to consider in the above lattice. To
understand the effects of lattice-prune nicer, I generated 10-best hypotheses and total cost (AM+LM cost) of
each hypothesis.
# 10-best hypotheses 1272-128104-0000-1 MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE'RE GLAD TO WELCOME HIS GOSPEL 1272-128104-0000-2 MISTER QUILLED HER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE'RE GLAD TO WELCOME HIS GOSPEL 1272-128104-0000-3 MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WEIR GLAD TO WELCOME HIS GOSPEL 1272-128104-0000-4 MISTER QUILTER AS THE APOSTLE OF THE MIDDLE CLASSES AND WE'RE GLAD TO WELCOME HIS GOSPEL 1272-128104-0000-5 MISTER QUILT HER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE'RE GLAD TO WELCOME HIS GOSPEL 1272-128104-0000-6 MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL 1272-128104-0000-7 MISTER QUILTER EZ THE APOSTLE OF THE MIDDLE CLASSES AND WE'RE GLAD TO WELCOME HIS GOSPEL 1272-128104-0000-8 MISTER QUILLED HER IS THE APOSTLE OF THE MIDDLE CLASSES AND WEIR GLAD TO WELCOME HIS GOSPEL 1272-128104-0000-9 MISTER QUILTER AS THE APOSTLE OF THE MIDDLE CLASSES AND WEIR GLAD TO WELCOME HIS GOSPEL 1272-128104-0000-10 MISTER QUILT HER IS THE APOSTLE OF THE MIDDLE CLASSES AND WEIR GLAD TO WELCOME HIS GOSPEL
The followings are total cost of 10-best hypotheses.
# 10-best cost (AM+LM) cost 1272-128104-0000-1 -8795.8406 1272-128104-0000-2 -8790.8297 1272-128104-0000-3 -8790.7957 1272-128104-0000-4 -8790.562600000001 1272-128104-0000-5 -8790.4581 1272-128104-0000-6 -8789.1235 1272-128104-0000-7 -8786.115300000001 1272-128104-0000-8 -8785.7848 1272-128104-0000-9 -8785.517600000001 1272-128104-0000-10 -8785.413199999999
10-best hypotheses and AM/LM costs are generated with the following commands.
lattice-to-nbest \ --acoustic-scale=1.0 \ --n=10 \ "ark:gunzip -c exp/chain_cleaned/decode_dev_clean_tgsmall/lat.1.gz |" \ ark:- | \ nbest-to-linear \ ark:- \ "ark:/dev/null" \ 'ark,t:|utils/int2sym.pl -f 2- exp/chain_cleaned/graph_tgsmall/words.txt > nbest-10' \ 'ark,t:lmcost-10' \ "ark,t:amcost-10"
Apply pruning
Finally, I have all the information necessary to examine how lattice-prune is applied with Kaldi. As mentioned at the beginning of this
post, lattice-prune deletes all of the paths in a lattice that exceed <best_path_cost> + <beam>.
Looking at the costs of the above 10-best hypotheses:
-
beam=4 would preserve only the 1-best hypothesis
-8795.8406 + 4 = -8791.8406 and this value is still lower than any of the other hypotheses.
-
beam=5.04 would preserve {1,2}-best hypotheses
-8795.8406 + 5.04 = -8790.8006 and this value is lower than 2nd best hypothesis but not 3rd best hypothesis.
-
beam=5.05 would preserve {1,2,3}-best hypotheses
-8795.8406 + 5.04 = -8790.7906 and this value is lower than 2nd and 3rd best hypotheses.
This is the lattice after pruning with beam=4. As expected, all of the hypotheses but the 1st ranked hypothesis are removed.
Now changing to beam=5.04 and we can see that there are two alternatives in the pruned lattice.
Changing beam to 5.05 preserves another hypothesis in the lattice. Interestingly, preserving the 3rd best hypothesis in the lattice generates another hypothesis whose rank I am not sure about.
Conclusion
This blog post examined how Kaldi applies lattice-pruning --beam. A person like me can think that lattice-pruning --beam specifies the number of candidates alive at one node, but this is not correct. lattice-prune deletes all of the paths in a lattice that exceed <best_path_cost> + <beam>.
Comments
Post a Comment