🧠
AI
  • Artificial Intelligence
  • Intuitive Maths behind AI
    • Probability
    • Information Theory
    • Linear Algebra
    • Calculus
  • Overview
  • Research Ideas and Philosophy
  • Basic Principles
  • Information Theory
    • Entropy
    • Log Probability
  • Probability & Statistics
    • Random Variables
    • Probability
      • Probablistic Equations
      • Bayes Theorem
      • Probability Distributions & Processes
    • Statistics
      • Measures
      • Z-Scores
      • Covariance and Correlation
      • Correlation vs Dependance
    • Mahalanobis vs Chi-Squared
    • Uncertainty
    • Statistical Inference
      • Graphical Models
      • Estimator vs Parameter
      • Estimation
      • Bayesian/Probabilistic Inference
        • Probabilistic Modelling
        • Problems of Bayesian Inference
        • Conjugate Priors
        • Dirichlet Distribution/Process
        • Posterior Predictive Distribution
      • Sampling-Based Inference
    • Sampling
      • Rejection Sampling
      • Reservoir Sampling
      • Thompson Sampling
    • Bayesian Inference
    • Regression
    • Markov
    • Monte Carlo
      • Monte Carlo Estimators
      • Importance Sampling
    • Kernel Density Estimation
    • Gaussian Processes
    • Gaussian Soap Bubble
  • Linear Algebra
    • Vector Space and Matrices
    • Geometry of System of Linear Equations
    • Determinants
    • Transformations
    • Geometrical Representation
    • Positive (Semi)Definite Matrices
    • Matrix Interpretation
    • Dot Product as Linear Transformation and Duality of Vector-Linear Transformation
    • Norms
    • Linear Least Square
    • Matrix Decomposition
      • QR Decomposition
      • Cholesky Decomposition
      • Eigen Value Decomposition
      • SVD - Singular Value Decomposition
    • Matrix Inversion
    • Matrix Calculus
    • Matrix Cookbook
    • Distributed Matrix Algebra
    • High Dimensional Spaces
  • Optimization
    • Derivatives
      • Partial Derivative
      • Directional Derivative
      • Gradient
      • Jacobian
    • Regularization
    • Gradient Descent
    • Newton's Method
    • Gauss-Newton
    • Levenberg–Marquardt
    • Conjugate Gradient
    • Implicit Function Theorem for optimization
    • Lagrange Multiplier
    • Powell's dog leg
    • Laplace Approximation
    • Cross Entropy Method
    • Implicit Function Theorem
  • Statistical Learning Theory
    • Expectation Maximization
  • Machine Learning
    • Clustering
    • Bias Variance Trade-off
  • Deep Learning
    • PreProcessing
    • Convolution Arithmetic
    • Regularization
    • Optimizers
    • Loss function
    • Activation Functions
    • Automatic Differentiation
    • Softmax Classifier and Cross Entropy
    • Normalization
    • Batch Normalization
    • Variational Inference
    • VAE: Variational Auto-Encoders
    • Generative vs Discriminative
      • Generative Modelling
    • Making GANs train
    • Dimensionality of Layer Vs Number of Layers
    • Deep learning techniques
    • Dilated Convolutions
    • Non-Maximum Suppression
    • Hard Negative Mining
    • Mean Average Precision
    • Fine Tuning or Transfer Learning
    • Hyper-parameter Tuning
  • Bayesian Deep Learning
    • Probabilistic View
    • Uncertainty
    • Variational Inference for Bayesian Neural Network
  • Reinforcement Learning
    • General
    • Multi-armed Bandit
    • Imitation Learning
    • MDP Equations
    • Solving MDP with known Model
    • Value Iteration
    • Model Free Prediction and Control
    • Off Policy vs On Policy
    • Control & Planning from RL perspective
    • Deep Reinforcement Learning
      • Value Function Approximation
      • Policy Gradient
        • Algorithms
    • Multi Agent Reinforcement Learning
    • Reinforcement Learning - Sutton and Barto
      • Chapter 3: Finite Markov Decision Processes
      • Chapter 4: Dynamic Programming
    • MBRL
  • Transformers
    • Tokenziation
    • Embedding
      • Word Embedding
      • Positional Encoding
    • Encoder
    • Decoder
    • Multi-head Attention Block
    • Time Complexities of Self-Attention
    • KV Cache
    • Multi-head Latent Attention
    • Speculative Decoding
    • Flash Attention
    • Metrics
  • LLMs
    • LLM Techniques
    • LLM Post-training
    • Inference/Test Time Scaling
    • Reasoning Models
    • Reward Hacking
  • Diffusion Models
    • ImageGen
  • Distributed Training
  • State Space Models
  • RLHF
  • Robotics
    • Kalman Filter
    • Unscented Kalman Filter
  • Game Theory and ML
    • 1st Lecture - 19/01
    • Lecture 2 - 22/01
    • Lecture 4: Optimization
  • Continual Learning
    • Lecture - 21/01
    • iCaRL: Incremental Classifier and Representation Learning
    • Variational Continual Learning
  • Computer Vision
    • Hough Transform
    • Projective Geometry
      • Extrinsic and Intrinsic Parameters
      • Image Rectification
    • Tracking
    • Optical Flow
    • Harris Corner
    • Others
  • Papers
    • To Be Read
    • Probabilistic Object Detection and Uncertainty Estimation
      • BayesOD
      • Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection
      • Gaussian YOLOv3
      • Dropout Sampling for Robust Object Detection in Open-Set Condition
      • *Sampling Free Epistemic Uncertainty Estimation using Approximated Variance Propagation
      • Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
      • Can We Trust You? On Calibration of Probabilistic Object Detector for Autonomous Driving
    • Object Detection
    • Temporal Fusion in Object Detection/ Video Object Detection
    • An intriguing failing of convolutional neural networks and the CoordConv solution
    • A Neural Algorithm of Artistic Style - A.Gatys
  • Deep Learning Book
    • Chapter 4: Optimization
    • Chapter 5: Machine Learning Basics
    • Chapter 6: Deep FeedForward Networks
  • Python
    • Decorators
    • Packages
      • Pip
    • Gotchas
    • Async functions
  • Computer Science
  • TensorFlow
  • Pytorch
    • RNN/LSTM in Pytorch
    • Dataset/ Data loader
    • Resuming/Loading Saved model
  • Programming
    • Unit Testing
    • How to write code
  • General Software Engineering
    • SSH tunneling and Ngrok
  • How To Do Research
  • Resources
  • ROS for python3
  • Kitti
Powered by GitBook
On this page
  • Resources
  • Summary
  • First Principal
  • Problems with Monte Carlo
  1. Probability & Statistics
  2. Monte Carlo

Monte Carlo Estimators

PreviousMonte CarloNextImportance Sampling

Last updated 2 years ago

Resources

Summary

Let's say we have to compute some quantity, which is difficult to compute, let's say some integral of a difficult function fff. Then what we do in this method, sample some points from some distribution and then evaluate that function on those points and then do some computation of those function evaluation at those points to estimate the quantity in interest.

It's basically using sampling from some simple known distribution to estimate something difficult. By evaluating the quantity on those samples points and taking average of all those to estimate the quantity.

First Principal

I think the first principal which monte carlo methods use is the relation between expectation vs integration. This relation relates the problems of numerical computation (computing some difficult quantity) to problem of expectation computation which can be easily computed as numerical average.

Let's say the difficult numerical computation is finding a integral of very difficult function fff, now the fff​ is very difficult such that analytic integration of fff​ which is ∫01f(x)dx\int_0^1 f(x)dx∫01​f(x)dx​ not is not feasible. So how do we estimate this integral

so now we can write ∫01f(x)dx\int_0^1 f(x)dx∫01​f(x)dx​as

∫01f(x)dx=∫01f(x)β‹…1dx=Ex∼U[0,1][f(x)]\int_0^1 f(x)dx = \int_0^1 f(x) \cdot 1dx = E_{x\sim U[0,1]}[f(x)]∫01​f(x)dx=∫01​f(x)β‹…1dx=Ex∼U[0,1]​[f(x)]

​So we converted the problem of integration to finding the expected value. Now to estimate Ex∼U[0,1][f(x)]E_{x\sim U[0,1]}[f(x)]Ex∼U[0,1]​[f(x)]we can use average. So we can calculate E[f(x)]=1Nβˆ‘i=1Nf(xi)E[f(x)] = \frac{1}{N} \sum_{i=1}^N f(x_i)E[f(x)]=N1β€‹βˆ‘i=1N​f(xi​)where xi∼U[0,1]x_i \sim U[0,1]xiβ€‹βˆΌU[0,1]. Now since the the interval of integration was [0,1] and we used Uniform distribution to sample the points, we could directly take the average of function evaluated at sample points. But if the can choose which distribution p(x)p(x)p(x)​ to use to sample xxx​ to evaluate fff​at. So for any arbitrary p(x)p(x)p(x)​we can write the above integral as.

∫f(x)dx=∫f(x)p(x)p(x)dx=Ex∼p(x)[f(x)p(x)]\int f(x) dx = \int \frac{f(x)}{p(x)} p(x) dx = E_{x\sim p(x)}[\frac{f(x)}{p(x)}]∫f(x)dx=∫p(x)f(x)​p(x)dx=Ex∼p(x)​[p(x)f(x)​]

​So basically this will give unbiased estimator to estimate the integral. The choice of p(x)p(x)p(x)​ dictates the convergence of monte carlo estimate to true value of integral.

The factor of 1p(x)\frac{1}{p(x)}p(x)1​​relates to the concept of Importance Sampling because when we sample xxx according to p(x)p(x)p(x)we need to counter that effect, hence downweighting by that factor.

Problems with Monte Carlo

Since there is sampling from a probability distribution, monte carlo method hardly sample in the very low probability regions, so it doesn't evaluate on those samples.

https://www.pbr-book.org/3ed-2018/Monte_Carlo_Integration/The_Monte_Carlo_Estimator
http://faculty.washington.edu/yenchic/17Sp_403/Lec2_MonteCarlo.pdf
https://ib.berkeley.edu/labs/slatkin/eriq/classes/guest_lect/mc_lecture_notes.pdf