🧠
AI
  • Artificial Intelligence
  • Intuitive Maths behind AI
    • Probability
    • Information Theory
    • Linear Algebra
    • Calculus
  • Overview
  • Research Ideas and Philosophy
  • Basic Principles
  • Information Theory
    • Entropy
    • Log Probability
  • Probability & Statistics
    • Random Variables
    • Probability
      • Probablistic Equations
      • Bayes Theorem
      • Probability Distributions & Processes
    • Statistics
      • Measures
      • Z-Scores
      • Covariance and Correlation
      • Correlation vs Dependance
    • Mahalanobis vs Chi-Squared
    • Uncertainty
    • Statistical Inference
      • Graphical Models
      • Estimator vs Parameter
      • Estimation
      • Bayesian/Probabilistic Inference
        • Probabilistic Modelling
        • Problems of Bayesian Inference
        • Conjugate Priors
        • Dirichlet Distribution/Process
        • Posterior Predictive Distribution
      • Sampling-Based Inference
    • Sampling
      • Rejection Sampling
      • Reservoir Sampling
      • Thompson Sampling
    • Bayesian Inference
    • Regression
    • Markov
    • Monte Carlo
      • Monte Carlo Estimators
      • Importance Sampling
    • Kernel Density Estimation
    • Gaussian Processes
    • Gaussian Soap Bubble
  • Linear Algebra
    • Vector Space and Matrices
    • Geometry of System of Linear Equations
    • Determinants
    • Transformations
    • Geometrical Representation
    • Positive (Semi)Definite Matrices
    • Matrix Interpretation
    • Dot Product as Linear Transformation and Duality of Vector-Linear Transformation
    • Norms
    • Linear Least Square
    • Matrix Decomposition
      • QR Decomposition
      • Cholesky Decomposition
      • Eigen Value Decomposition
      • SVD - Singular Value Decomposition
    • Matrix Inversion
    • Matrix Calculus
    • Matrix Cookbook
    • Distributed Matrix Algebra
    • High Dimensional Spaces
  • Optimization
    • Derivatives
      • Partial Derivative
      • Directional Derivative
      • Gradient
      • Jacobian
    • Regularization
    • Gradient Descent
    • Newton's Method
    • Gauss-Newton
    • Levenberg–Marquardt
    • Conjugate Gradient
    • Implicit Function Theorem for optimization
    • Lagrange Multiplier
    • Powell's dog leg
    • Laplace Approximation
    • Cross Entropy Method
    • Implicit Function Theorem
  • Statistical Learning Theory
    • Expectation Maximization
  • Machine Learning
    • Clustering
    • Bias Variance Trade-off
  • Deep Learning
    • PreProcessing
    • Convolution Arithmetic
    • Regularization
    • Optimizers
    • Loss function
    • Activation Functions
    • Automatic Differentiation
    • Softmax Classifier and Cross Entropy
    • Normalization
    • Batch Normalization
    • Variational Inference
    • VAE: Variational Auto-Encoders
    • Generative vs Discriminative
      • Generative Modelling
    • Making GANs train
    • Dimensionality of Layer Vs Number of Layers
    • Deep learning techniques
    • Dilated Convolutions
    • Non-Maximum Suppression
    • Hard Negative Mining
    • Mean Average Precision
    • Fine Tuning or Transfer Learning
    • Hyper-parameter Tuning
  • Bayesian Deep Learning
    • Probabilistic View
    • Uncertainty
    • Variational Inference for Bayesian Neural Network
  • Reinforcement Learning
    • General
    • Multi-armed Bandit
    • Imitation Learning
    • MDP Equations
    • Solving MDP with known Model
    • Value Iteration
    • Model Free Prediction and Control
    • Off Policy vs On Policy
    • Control & Planning from RL perspective
    • Deep Reinforcement Learning
      • Value Function Approximation
      • Policy Gradient
        • Algorithms
    • Multi Agent Reinforcement Learning
    • Reinforcement Learning - Sutton and Barto
      • Chapter 3: Finite Markov Decision Processes
      • Chapter 4: Dynamic Programming
    • MBRL
  • Transformers
    • Tokenziation
    • Embedding
      • Word Embedding
      • Positional Encoding
    • Encoder
    • Decoder
    • Multi-head Attention Block
    • Time Complexities of Self-Attention
    • KV Cache
    • Multi-head Latent Attention
    • Speculative Decoding
    • Flash Attention
    • Metrics
  • LLMs
    • LLM Techniques
    • LLM Post-training
    • Inference/Test Time Scaling
    • Reasoning Models
    • Reward Hacking
  • Diffusion Models
    • ImageGen
  • Distributed Training
  • State Space Models
  • RLHF
  • Robotics
    • Kalman Filter
    • Unscented Kalman Filter
  • Game Theory and ML
    • 1st Lecture - 19/01
    • Lecture 2 - 22/01
    • Lecture 4: Optimization
  • Continual Learning
    • Lecture - 21/01
    • iCaRL: Incremental Classifier and Representation Learning
    • Variational Continual Learning
  • Computer Vision
    • Hough Transform
    • Projective Geometry
      • Extrinsic and Intrinsic Parameters
      • Image Rectification
    • Tracking
    • Optical Flow
    • Harris Corner
    • Others
  • Papers
    • To Be Read
    • Probabilistic Object Detection and Uncertainty Estimation
      • BayesOD
      • Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection
      • Gaussian YOLOv3
      • Dropout Sampling for Robust Object Detection in Open-Set Condition
      • *Sampling Free Epistemic Uncertainty Estimation using Approximated Variance Propagation
      • Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
      • Can We Trust You? On Calibration of Probabilistic Object Detector for Autonomous Driving
    • Object Detection
    • Temporal Fusion in Object Detection/ Video Object Detection
    • An intriguing failing of convolutional neural networks and the CoordConv solution
    • A Neural Algorithm of Artistic Style - A.Gatys
  • Deep Learning Book
    • Chapter 4: Optimization
    • Chapter 5: Machine Learning Basics
    • Chapter 6: Deep FeedForward Networks
  • Python
    • Decorators
    • Packages
      • Pip
    • Gotchas
    • Async functions
  • Computer Science
  • TensorFlow
  • Pytorch
    • RNN/LSTM in Pytorch
    • Dataset/ Data loader
    • Resuming/Loading Saved model
  • Programming
    • Unit Testing
    • How to write code
  • General Software Engineering
    • SSH tunneling and Ngrok
  • How To Do Research
  • Resources
  • ROS for python3
  • Kitti
Powered by GitBook
On this page
  • Modeling Probabilities
  • Generative modelling vs Discriminative modelling.
  • Example
  1. Deep Learning

Generative vs Discriminative

PreviousVAE: Variational Auto-EncodersNextGenerative Modelling

Last updated 4 years ago

More formally, given a set of data instances X and a set of labels Y:

  • Generative models capture the joint probability p(X, Y), or just p(X) if there are no labels.

  • Discriminative models capture the conditional probability p(Y | X).

A generative model includes the distribution of the data itself, and tells you how likely a given example is. For example, models that predict the next word in a sequence are typically generative models (usually much simpler than GANs) because they can assign a probability to a sequence of words.

Modeling Probabilities

Neither kind of model has to return a number representing a probability. You can model the distribution of data by imitating that distribution.

Similarly, a generative model can model a distribution by producing convincing "fake" data that looks like it's drawn from that distribution.

Both generative and discriminative models can estimate probabilities (but they don't have to).

Advantages of Generative Model

  • They are are less prone to overfitting. As they understand/learn the whole data distribution rather than the boundary.

Disadvantages of Generative Model

  • Prone to outliers

  • Need large data to learn the distribution.

Generative modelling vs Discriminative modelling.

When we say modelling the distribution, we mean - learning a model which represents the distribution of the random variable.

Example

Let's say you have following system:

  • Computer receives telephone call

  • Measures the pitch of the voice

  • Tells if it's a male or female.

x=pitchx=\text{pitch}x=pitch, y=male,femaley=\text{male,female}y=male,female

Generative Modelling.

The computer will learn the distribution of pitches for both the gender. i.e it will learn p(x∣y=male)p(x|y=\text{male})p(x∣y=male)and p(x∣y=female)p(x|y=\text{female})p(x∣y=female). In this we are basically modelling how the pitches vary according to the gender.

Note basically learning the joint distribution p(x,y)p(x,y)p(x,y)as you learned p(x∣y)p(x|y)p(x∣y)and you can know distribution of p(y)p(y)p(y)simply by counting the number of each type of gender samples you are getting during the training. Hence p(x,y)=p(x∣y)p(y)p(x,y) = p(x|y)p(y)p(x,y)=p(x∣y)p(y)

Now after modelling, computer will see what are chances of given a pitch it belongs to p(x∣y=male)p(x|y=\text{male})p(x∣y=male) or p(x∣y=female)p(x|y=\text{female})p(x∣y=female). The one with the maximum likelihood wins.

Discriminative Modelling

In this you will simply model the boundary between p(y=male∣x)p(y=\text{male}|x)p(y=male∣x)and p(y=female∣x)p(y=\text{female}|x)p(y=female∣x). So now given a pitch you can tells what's the probability of pitch belonging to female or male. It doesn't learn anything about the distribution of pitch about any of the gender.

For example, a discriminative classifier like a can label an instance without assigning a probability to that label. Such a classifier would still be a model because the distribution of all predicted labels would model the real distribution of labels in the data.

decision tree
What is the difference between a generative and a discriminative algorithm?Stack Overflow
What is the difference between a generative and a discriminative algorithm?Stack Overflow
Logo
Logo