Dirichlet Distribution/Process
Last updated
Last updated
Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.
The Dirichlet distribution of order K ≥ 2 with parameters α1, ..., αK > 0 has a with respect to on the RK−1 given by where belong to the standard , or in other words:
Here can be interpreted as "prior observation count". See .
The is the multivariate , which can be expressed in terms of the :
The Dirichlet distribution is the distribution of the (a generic with a given number of possible outcomes) and (the distribution over observed counts of each possible category in a set of categorically distributed observations). This means that if a data point has either a categorical or multinomial distribution, and the of the distribution's parameter (the vector of probabilities that generates the data point) is distributed as a Dirichlet, then the of the parameter is also a Dirichlet. Intuitively, in such a case, starting from what we know about the parameter prior to observing the data point, we then can update our knowledge based on the data point and end up with a new distribution of the same form as the old one. This means that we can successively update our knowledge of a parameter by incorporating new observations one at a time, without running into mathematical difficulties.
Formally, this can be expressed as follows. Given a model
then the following holds:
This relationship is used in to estimate the underlying parameter p of a given a collection of N samples. Intuitively, we can view the vector α as , i.e. as representing the number of observations in each category that we have already seen (or guessed). Then we simply add in the counts for all the new observations (the vector c) in order to derive the posterior distribution.
In Bayesian and other with mixture components, Dirichlet distributions are commonly used as the prior distributions for the appearing in the models. See the section on below for more information.
Let's say you have baised die, i.e probability of every number (class) is not equal. So now we have a categorical/multinomial distribution with unknown parameters based on if you roll it once or multiple times respectively.
Now you want to estimate the parameters of that categorical\multinomial distribution i.e what is the probability of each of the faces of die? Parameters of multinomial distribution is given as
denotes the the probability of output to belong to class .
So now how would estimate the parameters which are basically of probability of class .
Solution:
Roll out the dice many times, let's say 30. And denote the frequency of each of the output (classes). Let's we rolled the dice for times, and we got outputs with the frequency . Now what you think might be the value of parameter , i would say . So basically we are estimating the parameters which are Random Variable here using the simulations here. But note that is just an estimate i.e 0.33 is not the only value possible for . Hence basically you can assocaite a probability distribution to each based on . This probability distribution of based on is nothing but the Dirichlet Distribution.
And to be precise 0.33 is the mean of random variable . So, for every we have .
and the complete is distribution is given by
Detailed PDF on Dirichlet Distribution
A Dirichlet process is a probability distribution, whose (process's) range is itself a set of probability distributions. It is often used in to describe the knowledge about the distribution of —how likely it is that the random variables are distributed according to one or another particular distribution.
The Dirichlet process is specified by a base distribution {\displaystyle H} and a positive {\displaystyle \alpha } called the concentration parameter (also known as scaling parameter). The base distribution is the of the process, i.e., the Dirichlet process draws distributions "around" the base distribution the way a draws real numbers around its mean. However, even if the base distribution is , the distributions drawn from the Dirichlet process are . The scaling parameter specifies how strong this discretization is: in the limit of {\displaystyle \alpha \rightarrow 0}, the realizations are all concentrated at a single value, while in the limit of {\displaystyle \alpha \rightarrow \infty } the realizations become continuous. Between the two extremes the realizations are discrete distributions with less and less concentration as {\displaystyle \alpha } increases.
The Dirichlet process can also be seen as the infinite-dimensional generalization of the . In the same way as the Dirichlet distribution is the for the , the Dirichlet process is the conjugate prior for infinite, discrete distributions. A particularly important application of Dirichlet processes is as a distribution in .