Posts

Two Random Variables

Image
Introduction In the previous post of  Basic Probability , I discussed my chance of visiting Paris next year. I had a sample space like this: \( S = \{Meet, No\_more\_holiday, No\_money, Paris\_gone\_from\_Earth, ...\} \). The random variable of \( X \) was all about me going to Paris next year. What I am going to write in this post is when we have two sample spaces, two outcomes and two random variables. Having two random variables means that we need to consider the followings: two events happen simultaneously ( joint probability ), only one event happens regardless of the other event ( marginal probability ) and one event happens because of the other event ( conditional probability ). Let's define a simple sample space of me visiting Paris next year is \( S_X = \{me\_in\_paris, me\_not\_in\_paris\} \).  Let's define another sample space to have two events at the same time. The second event is whether Emily goes to Paris next year. We now have anot...

Basic Probability

Image
Introduction Probability theory is the foundation of machine learning. Knowledge of machine learning is a requirement for working on a speech and language processing project today. So, probability theory is essential for speech and language processing projects! The objective of this post is to refresh my knowledge of probability theory. I am keen to connect probability theory with real world examples, and to avoid throwing a bunch of theoretical definitions. Feel free to leave comments if my writing is incorrect.   Probability The probability is a chance of an occurrence of an event. The probability is a value between 0 and 1. In contrast, human words are not mathematical. Even if I say "I'll go to Paris next year, 100%", I might not go to Paris100% next year. When I was in Paris last time. The theoretical and mathematical probability has to be precise unlike human words. A ...

Acoustic features for speech processing

Image
Introduction This post summarises acoustic features for various tasks of speech processing. Automatic speech recognition (ASR) is one of the most studied speech processing tasks. Acoustic features for ASR include Mel-frequency cepstrum coefficients (MFCCs) and spectogram-based features including Mel-spectrograms and Mel-filter banks. The choice of acoustic features depends on a choice of ASR model: Traditional machine learning (ML) models such as Gaussian mixture models (GMMs) have difficulties of handling correlated features and MFCCs are favourite for de-correlated coeffcients. More recent deep learning based models e.g., Conformer use acoustic feature vectors with correlation between neighbour dimensions: filterbanks. A popular model from OpenAI, Whisper , directly takes as input a log-Mel spectrogram which is technically the same representation as filterbanks (will be explained later). ...

Visualising a speech signal

Image
Speech Visualisation This post covers visualisation of a speech signal: plotting a waveform, annotating a waveform and showing speech spectrums. I am using the first speech file (BASIC5000_0001) of the JSUT corpus  that consists of 10 hour recordings of a Japanese female speaker. JSUT ver 1.1 BASIC5000_0001 My code is all written in this Python notebook: https://github.com/yasumori/blog/blob/main/2026/2026_01_visualisation.ipynb . You should be able to run it after installing required libraries: librosa, matplotlib, and numpy. The first speech file is also uploaded to my GitHub, following the terms of use "Re-distribution is not permitted, but you can upload a part of this corpus (e.g., ~100 audio files) in your website or blog". import librosa import subprocess # load audio in 16kHz signal, sr = librosa.load("./data/BASIC5000_0001.wav", sr=16000) print(f"number of samples: {len(signal)}") print(f"duration {len(signal)/sr} ...

Decibel and Logarithms

Image
Introduction The decibel is a unit to express loudness of sounds, and an important measurement in sound processing. The decibel is the logarithmic scale of sound intensity. The reason to use the logarithm is that human hearing is logarithmic rather than linear. The decibel is also not an absolute metric but a relative ratio of intensity of one sound compared to another. This part is very confusing because we think that familiar measurements like lengths "cm, m, km" and weight "g, kg..." are absolute units. There are many online resources that explain the logarithms or the decibel. I don't see resources that explain both the logarithms and the decibel. This is my motivation to create this post: keeping information about the logarithms and the decibel in one page.  Logarithms The logarithms are inverse operation of an exponent (power). \[ 2^3 = 8 \] \[ \log_28 = 3 \] The log of 8 to the base 2 is 3 , and 2 to...