Yasu Research

Posts

SLT 2022 Notes

February 10, 2023

Attending an Offline Conference Again I was fortunate to have an opportunity to attend the Spoken Language Technology workshop 2022, held from 9th to 12th January in Doha, Qatar. I presented my final PhD work at the conference. Going back to an (almost) in-person conference was so refreshing and I learned so many things. In this post, I'll summarise my takeaways from the SLT 2022 conference. The photo below is the conference venue, Marsa Malaz Kempinski. Human Speech Processing and Neural Networks As a linguistics student (back in 2012), I've been always keen to learn studies on relevance of human language processing to computer speech processing. On this topic, I remember one of the keynote presentations and one poster. The first keynote talk by Nima Mesgarani was interesting, mentioning that the shallow layers of an RNN-T encoder learns acoustic and phonetic information occurring in short time frames while the later layers capture lexical and semantic information which ca...

Setup git pre-push hook

October 21, 2022

Motivation I often forget to run tests before pushing my commits. This is a bad practice. Someone might use my broken code in a collaborative environment. I was looking for a way to automatically run tests before creating a commit or pushing commits. The solution is Git Hooks: Customizing Git Hooks . Git Hooks Many hooks can be setup in a Git repository. For example, Git Hooks can run before creating a commit ( pre-commit ), after creating a commit ( post-commit ) and before pushing commits to a remote branch ( pre-push ). All of these are client-side hooks and client-side hooks are not shared across local repositories. The other type of hooks are server-side hooks that run after commits are pushed to a remote server and policy is imposed on all users of the repository. I have setup pre-push in a Git repository to avoid forgetting to run tests before sending updates to the remote server. Setup pre-push The way to setup pr...

Understanding kaldi lattice-prune and beam

August 20, 2022

Introduction For a long time, I have been thinking that Kaldi lattice-prune uses a beam which keeps the specified number of candidates alive. How dare me. I finally took a time to properly understand this. There are several excellent Kaldi notes (for example Josh Meyer's blog ), but I could not find information specifically about lattice-prune . My intension of this blog post is to leave information about Kaldi lattice-prune and --beam (assuming that someone uses Kaldi or hybrid ASR systems in 2022). TL;DR lattice-prune --beam=4 for example deletes all paths of a lattice that exceed <best_path_cost> + 4, as can be seen in this line of the source code . Explanation To understand how Kaldi beam works, I took one of the utterances in the LibriSpeech dev set: 1272-128104-0000 . The lattice of this utterance should be in exp/chain_cleaned/tdnn_1d_sp/decode_dev_clean_tgsmall/lat.1.gz once the ...

Posts

How wav2vec2.0 takes input audio data

SLT 2022 Notes

Setup git pre-push hook

Understanding kaldi lattice-prune and beam