Probability

Notation

  • Uppercase XX denotes a random variable

  • Uppercase P(X)P(X) denotes the probability distribution over that variable

  • Lowercase xP(X)x∼P(X) denotes a value xx sampled (∼) from the probability distribution P(X)P(X) via some generative process.

  • Lowercase p(X)p(X) is the density function of the distribution of XX. It is a scalar function over the measure space of XX.

  • p(X=x)p(X=x)(shorthand p(x)p(x)) denotes the density function evaluated at a particular value x.

Probability Distribution

A probability distribution is a list of all of the possible outcomes of a random variable along with their corresponding probability values.

Like this:

Probability Density function

This is function which represents probability distribution of a random variable. Denoted by p(x)p(x).

For discrete random variables.

P(X=x)=p(x)P(X=x)= p(x)

For continuous random variables.

P(a<x<b)=abp(x)dxP(a<x<b) = \int_a^b p(x)dx

So basically function ppis a way to associated probability to each outcome of random variable xx.

Note that functions have parameters in them, for example, in normal distribution we have mean and variance which are its parameters. To show parameters we use the notation p(x;μ,σ)p(x;\mu, \sigma)or f(xμ,σ)f(x|\mu, \sigma).

Also, how to distinguish two probability functions? Let's say you have three variables X,Y,ZX,Y,Z, now instead of using three letters for pdf of these like f(x),g(y),h(z)f(x), g(y), h(z)you denote pdf by ppbut you use the random variable letter as a subscript with pplike pX,pY,pZp_X, p_Y, p_Z. So when you write pX(a)p_X(a), it means you are talking about random variable XXtaking value aaand not Y,ZY,Z.

Summary

What Is Probability

Let’s suppose I want to bet on a soccer game between two teams of robots, Arduino Arsenal and C Milan. After thinking about it, I decide that there is an 80% probability that Arduino Arsenal winning. What do I mean by that? Here are three possibilities…

  • They’re robot teams, so I can make them play over and over again, and if I did that, Arduino Arsenal would win 8 out of every 10 games on average.

  • For any given game, I would only agree that betting on this game is only “fair” if a $1 bet on C Milan gives a $5 payoff (i.e. I get my $1 back plus a $4 reward for being correct), as would a $4 bet on Arduino Arsenal (i.e., my $4 bet plus a $1 reward).

  • My subjective “belief” or “confidence” in an Arduino Arsenal victory is four times as strong as my belief in a C Milan victory.

The Frequentist View

It defines probability as a long-run frequency. Suppose we were to try flipping a fair coin, over and over again. By definition, this is a coin that has P(H)=0.5P(H)=0.5. What might we observe? It basically defines probability in term of the frequency of happening of event. In this case, number of times we get Heads si the probability of Head.

The frequentist definition of probability has some desirable characteristics. First, it is objective: the probability of an event is necessarily grounded in the world. The only way that probability statements can make sense is if they refer to (a sequence of) events that occur in the physical universe. Second, it is unambiguous: any two people watching the same sequence of events unfold, trying to calculate the probability of an event, must inevitably come up with the same answer.

The frequentist definition has a narrow scope. There are lots of things out there that human beings are happy to assign probability to in everyday language, but cannot (even in theory) be mapped onto a hypothetical sequence of events. For instance, if a meteorologist comes on TV and says, “the probability of rain in Adelaide on 2 November 2048 is 60%” we humans are happy to accept this. But it’s not clear how to define this in frequentist terms. There’s only one city of Adelaide, and only 2 November 2048. There’s no infinite sequence of events here, just a once-off thing. Frequentist probability genuinely forbids us from making probability statements about a single event.

The Bayesian View

Bayesian view of probability is often called the subjectivist view. The most common way of thinking about subjective probability is to define the probability of an event as the degree of belief that an intelligent and rational agent assigns to that truth of that event.

However, in order for this approach to work, we need some way of operationalising “degree of belief”. One way that you can do this is to formalise it in terms of “rational gambling”, though there are many other ways. Suppose that I believe that there’s a 60% probability of rain tomorrow. If someone offers me a bet: if it rains tomorrow, then I win $5, but if it doesn’t rain then I lose $5. Clearly, from my perspective, this is a pretty good bet. On the other hand, if I think that the probability of rain is only 40%, then it’s a bad bet to take.

The main advantage is that it allows you to assign probabilities to any event you want to. You don’t need to be limited to those events that are repeatable. The main disadvantage (to many people) is that we can’t be purely objective – specifying a probability requires us to specify an entity that has the relevant degree of belief. This entity might be a human, an alien, a robot, or even a statistician, but there has to be an intelligent agent out there that believes in things. To many people this is uncomfortable: it seems to make probability arbitrary. While the Bayesian approach does require that the agent in question be rational (i.e., obey the rules of probability), it does allow everyone to have their own beliefs; I can believe the coin is fair and you don’t have to, even though we’re both rational.

Probability vs Liklihood

When you say probability, is a number between 0 to 1 for some event to happen. Whereas liklihood is the value which tells relative chances of some event to happen. To understand better, lets consider a constinuous random variable with gaussian distribution with mean μ\mu and stddev σ\sigma. Now if what's the probability of random variable to be exactly say s μ\mu, it's 0. But it's liklihood is greatest. Probability is given by area under curve of distribution, whereas liklihood is the value at the point on the distribution curve.

Stationarity

If a process is stationery then it means that its density (probability distribution) doesn't change with time.

Why liklihood function is not pdf?

Inverse transform sampling

The problem that the inverse transform sampling method solves is as follows:

The inverse transform sampling method works as follows:

  1. Generate a random number u from the standard uniform distribution in the interval [0,1], e.g. from {\displaystyle U\sim \mathrm {Unif} [0,1].}

  2. Find the inverse of the desired CDF, e.g. {\displaystyle F_{X}^{-1}(x)}.

  3. Compute {\displaystyle X=F_{X}^{-1}(u)}. The computed random variableX has distributionF_X(x).

Expressed differently, given a continuous uniform variable U in [0,1] and an invertible cumulative distribution function F_{X}, the random variable{\displaystyle X=F_{X}^{-1}(U)} has distribution F_{X} (or, X is distributed F_{X}).

So, this can be basically use to draw samples from different probability distributions.

Sampling from Categorical Distribution

Support of Function (Function can be probability density)

In mathematics, the support of a real-valued function f is the subset of the domain containing those elements which are not mapped to zero.

The set-theoretic support of f, written supp(f), is the set of points in X where f is non-zero, and X is domain of f.

supp(f)={xXf(x)0}supp(f) = \{x \in X | f(x) \neq 0\}

The support of f is the smallest subset of X with the property that f is zero on the subset's complement. If f(x) = 0 for all but a finite number of points x in X, then f is said to have finite support.

Support of Random Variable

In case of probability distribution, the support of random variable on which distribution is used is same as the support of probability density function. They are used interchangebly many times.

For discrete random variables, it is the set of all the realizations that have a strictly positive probability of being observed.

Example If a discrete random variable X has probability mass function[eq1]its support, denoted by R_X, is[eq2]

Last updated