Posts

Showing posts from April, 2026

Probability Distribution3: Beta and Dirichlet distribution

Image
Introduction The sixth post of my probability theory series focuses on the Beta distribution.  Basic Probability Two Random Variables Chain Rule of Probability Theory Probability Distribution1 Probability Distribution2: Normal distribution The beta distribution has the multivariate version called the Dirichlet distribution. The Dirichlet distribution used to be very popular in Bayesian and natural language processing literature before the LLM era. A common one-linear explanation of the beta distribution is "a distribution over a probability". I hope all readers are confused, so was I when I heard this for the first time. Here is an example. I visited paris every year in the past 10 years. That makes my chance of visiting Paris next year 100% according to my travel history.  The beta distribution asks this question: "how confident are we with this 100% chance of me visiting Paris next year"? It is a distribution ("our confidence") over probabilities ("...

Differentiation and optimisation

Image
Introduction Differentiation calculus is a core part of machine learning optimisation. Differentiation is often taught at school as finding a slope in geometry. While a theoretical concept of differentiation is important, I feel that connection between differentiation at school and differentiation for machine learning optimisation is not linked well. This post aims to bridge the gap between school math and machine learning optimisation, focusing on how the mathematical concept (differentiation) is essential for the practical algorithm (gradient descent) to support machine learning. The post begins with a very basic concept of slope finding. Finding a slope between two points The general formula to find a slope \(a\) of a function \( y = ax \) is: \[ a = \frac{y_2 - y_1}{x_2-x_1} \] The figure below illustrates rise (vertical) and run (horizontal) changes between two points, Point 1 \((x = 0, y=0)\)  and Point 2 \((x=2, y=4)\).  The slope of this function is \( a = \frac{4...