🧠
AI
  • Artificial Intelligence
  • Intuitive Maths behind AI
    • Probability
    • Information Theory
    • Linear Algebra
    • Calculus
  • Overview
  • Research Ideas and Philosophy
  • Basic Principles
  • Information Theory
    • Entropy
    • Log Probability
  • Probability & Statistics
    • Random Variables
    • Probability
      • Probablistic Equations
      • Bayes Theorem
      • Probability Distributions & Processes
    • Statistics
      • Measures
      • Z-Scores
      • Covariance and Correlation
      • Correlation vs Dependance
    • Mahalanobis vs Chi-Squared
    • Uncertainty
    • Statistical Inference
      • Graphical Models
      • Estimator vs Parameter
      • Estimation
      • Bayesian/Probabilistic Inference
        • Probabilistic Modelling
        • Problems of Bayesian Inference
        • Conjugate Priors
        • Dirichlet Distribution/Process
        • Posterior Predictive Distribution
      • Sampling-Based Inference
    • Sampling
      • Rejection Sampling
      • Reservoir Sampling
      • Thompson Sampling
    • Bayesian Inference
    • Regression
    • Markov
    • Monte Carlo
      • Monte Carlo Estimators
      • Importance Sampling
    • Kernel Density Estimation
    • Gaussian Processes
    • Gaussian Soap Bubble
  • Linear Algebra
    • Vector Space and Matrices
    • Geometry of System of Linear Equations
    • Determinants
    • Transformations
    • Geometrical Representation
    • Positive (Semi)Definite Matrices
    • Matrix Interpretation
    • Dot Product as Linear Transformation and Duality of Vector-Linear Transformation
    • Norms
    • Linear Least Square
    • Matrix Decomposition
      • QR Decomposition
      • Cholesky Decomposition
      • Eigen Value Decomposition
      • SVD - Singular Value Decomposition
    • Matrix Inversion
    • Matrix Calculus
    • Matrix Cookbook
    • Distributed Matrix Algebra
    • High Dimensional Spaces
  • Optimization
    • Derivatives
      • Partial Derivative
      • Directional Derivative
      • Gradient
      • Jacobian
    • Regularization
    • Gradient Descent
    • Newton's Method
    • Gauss-Newton
    • Levenberg–Marquardt
    • Conjugate Gradient
    • Implicit Function Theorem for optimization
    • Lagrange Multiplier
    • Powell's dog leg
    • Laplace Approximation
    • Cross Entropy Method
    • Implicit Function Theorem
  • Statistical Learning Theory
    • Expectation Maximization
  • Machine Learning
    • Clustering
    • Bias Variance Trade-off
  • Deep Learning
    • PreProcessing
    • Convolution Arithmetic
    • Regularization
    • Optimizers
    • Loss function
    • Activation Functions
    • Automatic Differentiation
    • Softmax Classifier and Cross Entropy
    • Normalization
    • Batch Normalization
    • Variational Inference
    • VAE: Variational Auto-Encoders
    • Generative vs Discriminative
      • Generative Modelling
    • Making GANs train
    • Dimensionality of Layer Vs Number of Layers
    • Deep learning techniques
    • Dilated Convolutions
    • Non-Maximum Suppression
    • Hard Negative Mining
    • Mean Average Precision
    • Fine Tuning or Transfer Learning
    • Hyper-parameter Tuning
  • Bayesian Deep Learning
    • Probabilistic View
    • Uncertainty
    • Variational Inference for Bayesian Neural Network
  • Reinforcement Learning
    • General
    • Multi-armed Bandit
    • Imitation Learning
    • MDP Equations
    • Solving MDP with known Model
    • Value Iteration
    • Model Free Prediction and Control
    • Off Policy vs On Policy
    • Control & Planning from RL perspective
    • Deep Reinforcement Learning
      • Value Function Approximation
      • Policy Gradient
        • Algorithms
    • Multi Agent Reinforcement Learning
    • Reinforcement Learning - Sutton and Barto
      • Chapter 3: Finite Markov Decision Processes
      • Chapter 4: Dynamic Programming
    • MBRL
  • Transformers
    • Tokenziation
    • Embedding
      • Word Embedding
      • Positional Encoding
    • Encoder
    • Decoder
    • Multi-head Attention Block
    • Time Complexities of Self-Attention
    • KV Cache
    • Multi-head Latent Attention
    • Speculative Decoding
    • Flash Attention
    • Metrics
  • LLMs
    • LLM Techniques
    • LLM Post-training
    • Inference/Test Time Scaling
    • Reasoning Models
    • Reward Hacking
  • Diffusion Models
    • ImageGen
  • Distributed Training
  • State Space Models
  • RLHF
  • Robotics
    • Kalman Filter
    • Unscented Kalman Filter
  • Game Theory and ML
    • 1st Lecture - 19/01
    • Lecture 2 - 22/01
    • Lecture 4: Optimization
  • Continual Learning
    • Lecture - 21/01
    • iCaRL: Incremental Classifier and Representation Learning
    • Variational Continual Learning
  • Computer Vision
    • Hough Transform
    • Projective Geometry
      • Extrinsic and Intrinsic Parameters
      • Image Rectification
    • Tracking
    • Optical Flow
    • Harris Corner
    • Others
  • Papers
    • To Be Read
    • Probabilistic Object Detection and Uncertainty Estimation
      • BayesOD
      • Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection
      • Gaussian YOLOv3
      • Dropout Sampling for Robust Object Detection in Open-Set Condition
      • *Sampling Free Epistemic Uncertainty Estimation using Approximated Variance Propagation
      • Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
      • Can We Trust You? On Calibration of Probabilistic Object Detector for Autonomous Driving
    • Object Detection
    • Temporal Fusion in Object Detection/ Video Object Detection
    • An intriguing failing of convolutional neural networks and the CoordConv solution
    • A Neural Algorithm of Artistic Style - A.Gatys
  • Deep Learning Book
    • Chapter 4: Optimization
    • Chapter 5: Machine Learning Basics
    • Chapter 6: Deep FeedForward Networks
  • Python
    • Decorators
    • Packages
      • Pip
    • Gotchas
    • Async functions
  • Computer Science
  • TensorFlow
  • Pytorch
    • RNN/LSTM in Pytorch
    • Dataset/ Data loader
    • Resuming/Loading Saved model
  • Programming
    • Unit Testing
    • How to write code
  • General Software Engineering
    • SSH tunneling and Ngrok
  • How To Do Research
  • Resources
  • ROS for python3
  • Kitti
Powered by GitBook
On this page
  • Behaviour Cloning
  • Problems with Behaviour cloning
  • Solution for error explode problem.
  • Problem with data collection in Imitation Learning
  • Inverse RL
  • Diffusion Policy
  1. Reinforcement Learning

Imitation Learning

PreviousMulti-armed BanditNextMDP Equations

Last updated 8 months ago

Behaviour Cloning

It is simple the supervised learning of state and action pair. State is the input and action is the output.

Problems with Behaviour cloning

  • We have a demonstration data distribution called D=P(Ï„)D = P(\tau)D=P(Ï„), now lets day we trained an algorithm on DDD , now say we run the algo then the algo will make an error which will lead to obsevations which are not in the training data, and in that case the error will keep on increasae. Error Explode

    Let us say that out learnt policy is πθ(at∣ot)\pi_{\theta}(a_t|o_t)πθ​(at​∣ot​) , now what above means is pdata(ot)≠pπ(ot)p_{data}(o_t) \neq p_\pi(o_t)pdata​(ot​)=pπ​(ot​) . Hence we solve this problem, we have find a way to make pdata(ot)=pπ(ot)p_{data}(o_t) = p_\pi(o_t)pdata​(ot​)=pπ​(ot​)

  • For same state, there might be different actions in the training data.

  • If something new state comes up, that might not be in the training data.

  • Training data may not have correct action for given state, then the training will also be wrong as it is simple behaviour cloning. Hence, performance is bounded by the ability of person collecting the data.

  • Problems associated with supervised learning i.e. saddle points, convergence, etc.

  • Humans are not good at demonstration for everything. Ex: You demonstrate how to drive a car, but may not be able to demonstrate the movement of all joints for a humanoid. Hence, collecting human training data for every kind of problem is not possible.

Solution for error explode problem.

  • Here in the above image, they have left and right camera, so basically they also training data with left and right image pair with corrected action. Hence when a vehicle drift out, training data have samples similar to those. Hence, it will be able to correct the trajectory of the vehicle. In general terms, we can add noise to out collected distribution to create more training data along with corrected actions for the noise. Hence in this way, we will have an algorithm trained on data, such that it can correct itself when go wrong.

    https://images.nvidia.com/content/tegra/automotive/images/2016/solutions/pdf/end-to-end-dl-using-px.pdf

  • DAgger: Dataset Aggregation If run it for long enough and some assumptions are true, then this is guaranteed to work. IT solves the problem of distributional drift.

  • If your model is so perfect that it doesn't make mistakes, then it is safe to assume that there will be no distributional shift and behaviour cloning will directly work.

Problem with data collection in Imitation Learning

  • Non-Markovian Behaviour: While collecting data, driver take action not only based on current state but the previous states. Hence, for two same state, there may be different actions due to different history. Solution: Instead of taking action based on current state, take previous states also into consideration. Hence take previous states also in the input. You can RNN for including the temporal dimension.

  • MultiModal Behaviour: Different actions for same state in training data. Solution: For discrete actions, it is not really a problem. Because at the softmax ouput, we can get probability mass as two equally probable actions. For continuous actions, we generally ouput as gaussian distribution as we use MSE loss most of the times, So, it gets difficult to output two equally probable actions. For this we can use

    • Output a mixture of Gaussians: Outputs N means, N variances, and N weights adding upto one.

    • Latent Vairable Model: Inject some noise (random number) along with the input state

    • Autoregressive Discretization: Discretize the continous n dimensional varibale into n variables.

Inverse RL

Find the reward function given the expert trajectories.

Diffusion Policy

RL — Inverse Reinforcement LearningMedium
Ch. 21 - Imitation Learning
Logo
Mixture of Gaussian
Once state deviates from training, the error exploes and state keep deviating