🧠
AI
  • Artificial Intelligence
  • Intuitive Maths behind AI
    • Probability
    • Information Theory
    • Linear Algebra
    • Calculus
  • Overview
  • Research Ideas and Philosophy
  • Basic Principles
  • Information Theory
    • Entropy
    • Log Probability
  • Probability & Statistics
    • Random Variables
    • Probability
      • Probablistic Equations
      • Bayes Theorem
      • Probability Distributions & Processes
    • Statistics
      • Measures
      • Z-Scores
      • Covariance and Correlation
      • Correlation vs Dependance
    • Mahalanobis vs Chi-Squared
    • Uncertainty
    • Statistical Inference
      • Graphical Models
      • Estimator vs Parameter
      • Estimation
      • Bayesian/Probabilistic Inference
        • Probabilistic Modelling
        • Problems of Bayesian Inference
        • Conjugate Priors
        • Dirichlet Distribution/Process
        • Posterior Predictive Distribution
      • Sampling-Based Inference
    • Sampling
      • Rejection Sampling
      • Reservoir Sampling
      • Thompson Sampling
    • Bayesian Inference
    • Regression
    • Markov
    • Monte Carlo
      • Monte Carlo Estimators
      • Importance Sampling
    • Kernel Density Estimation
    • Gaussian Processes
    • Gaussian Soap Bubble
  • Linear Algebra
    • Vector Space and Matrices
    • Geometry of System of Linear Equations
    • Determinants
    • Transformations
    • Geometrical Representation
    • Positive (Semi)Definite Matrices
    • Matrix Interpretation
    • Dot Product as Linear Transformation and Duality of Vector-Linear Transformation
    • Norms
    • Linear Least Square
    • Matrix Decomposition
      • QR Decomposition
      • Cholesky Decomposition
      • Eigen Value Decomposition
      • SVD - Singular Value Decomposition
    • Matrix Inversion
    • Matrix Calculus
    • Matrix Cookbook
    • Distributed Matrix Algebra
    • High Dimensional Spaces
  • Optimization
    • Derivatives
      • Partial Derivative
      • Directional Derivative
      • Gradient
      • Jacobian
    • Regularization
    • Gradient Descent
    • Newton's Method
    • Gauss-Newton
    • Levenberg–Marquardt
    • Conjugate Gradient
    • Implicit Function Theorem for optimization
    • Lagrange Multiplier
    • Powell's dog leg
    • Laplace Approximation
    • Cross Entropy Method
    • Implicit Function Theorem
  • Statistical Learning Theory
    • Expectation Maximization
  • Machine Learning
    • Clustering
    • Bias Variance Trade-off
  • Deep Learning
    • PreProcessing
    • Convolution Arithmetic
    • Regularization
    • Optimizers
    • Loss function
    • Activation Functions
    • Automatic Differentiation
    • Softmax Classifier and Cross Entropy
    • Normalization
    • Batch Normalization
    • Variational Inference
    • VAE: Variational Auto-Encoders
    • Generative vs Discriminative
      • Generative Modelling
    • Making GANs train
    • Dimensionality of Layer Vs Number of Layers
    • Deep learning techniques
    • Dilated Convolutions
    • Non-Maximum Suppression
    • Hard Negative Mining
    • Mean Average Precision
    • Fine Tuning or Transfer Learning
    • Hyper-parameter Tuning
  • Bayesian Deep Learning
    • Probabilistic View
    • Uncertainty
    • Variational Inference for Bayesian Neural Network
  • Reinforcement Learning
    • General
    • Multi-armed Bandit
    • Imitation Learning
    • MDP Equations
    • Solving MDP with known Model
    • Value Iteration
    • Model Free Prediction and Control
    • Off Policy vs On Policy
    • Control & Planning from RL perspective
    • Deep Reinforcement Learning
      • Value Function Approximation
      • Policy Gradient
        • Algorithms
    • Multi Agent Reinforcement Learning
    • Reinforcement Learning - Sutton and Barto
      • Chapter 3: Finite Markov Decision Processes
      • Chapter 4: Dynamic Programming
    • MBRL
  • Transformers
    • Tokenziation
    • Embedding
      • Word Embedding
      • Positional Encoding
    • Encoder
    • Decoder
    • Multi-head Attention Block
    • Time Complexities of Self-Attention
    • KV Cache
    • Multi-head Latent Attention
    • Speculative Decoding
    • Flash Attention
    • Metrics
  • LLMs
    • LLM Techniques
    • LLM Post-training
    • Inference/Test Time Scaling
    • Reasoning Models
    • Reward Hacking
  • Diffusion Models
    • ImageGen
  • Distributed Training
  • State Space Models
  • RLHF
  • Robotics
    • Kalman Filter
    • Unscented Kalman Filter
  • Game Theory and ML
    • 1st Lecture - 19/01
    • Lecture 2 - 22/01
    • Lecture 4: Optimization
  • Continual Learning
    • Lecture - 21/01
    • iCaRL: Incremental Classifier and Representation Learning
    • Variational Continual Learning
  • Computer Vision
    • Hough Transform
    • Projective Geometry
      • Extrinsic and Intrinsic Parameters
      • Image Rectification
    • Tracking
    • Optical Flow
    • Harris Corner
    • Others
  • Papers
    • To Be Read
    • Probabilistic Object Detection and Uncertainty Estimation
      • BayesOD
      • Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection
      • Gaussian YOLOv3
      • Dropout Sampling for Robust Object Detection in Open-Set Condition
      • *Sampling Free Epistemic Uncertainty Estimation using Approximated Variance Propagation
      • Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
      • Can We Trust You? On Calibration of Probabilistic Object Detector for Autonomous Driving
    • Object Detection
    • Temporal Fusion in Object Detection/ Video Object Detection
    • An intriguing failing of convolutional neural networks and the CoordConv solution
    • A Neural Algorithm of Artistic Style - A.Gatys
  • Deep Learning Book
    • Chapter 4: Optimization
    • Chapter 5: Machine Learning Basics
    • Chapter 6: Deep FeedForward Networks
  • Python
    • Decorators
    • Packages
      • Pip
    • Gotchas
    • Async functions
  • Computer Science
  • TensorFlow
  • Pytorch
    • RNN/LSTM in Pytorch
    • Dataset/ Data loader
    • Resuming/Loading Saved model
  • Programming
    • Unit Testing
    • How to write code
  • General Software Engineering
    • SSH tunneling and Ngrok
  • How To Do Research
  • Resources
  • ROS for python3
  • Kitti
Powered by GitBook
On this page
  • Prisoner's Dilemma
  • COP 21 Game
  • Multi-Player Game
  • Zero-Sum Two-player Games
  • Game
  • Mix strategies
  • Nash Equilibrium of a Game
  1. Game Theory and ML

Lecture 2 - 22/01

Previous1st Lecture - 19/01NextLecture 4: Optimization

Last updated 4 years ago

Prisoner's Dilemma

If agents do not cooperate, the best (global) outcome possible is missed.

COP 21 Game

  • N governments

  • 2 actions per states

    • Do not pollute (Cost = 3)

    • Pollute (cost=1 and +1 for everyone)

What is the equilibrium?

Multi-Player Game

  • Siimultaneous move games

  • n players, each player pick a strat and occurs a loss.

lk(s1,...sn)=lk(sk,s−k)l_k(s_1, ...s_n) = l_k(s_k, s_{-k})lk​(s1​,...sn​)=lk​(sk​,s−k​)

Goal of the player: Minimize their loss

Zero-Sum Two-player Games

Zero-sum: ∑k=1nlk=0\sum_{k=1}^n l_k = 0∑k=1n​lk​=0

n=2

Action for each players: i∈[n]=1,....,ni \in [n] = {1,....,n}i∈[n]=1,....,n and j∈[m]j \in [m]j∈[m]

Game

min⁡i∈[n]max⁡j∈[m]lij\min_{i\in[n]} \max_{j \in [m]} l_{ij}i∈[n]min​j∈[m]max​lij​

Mix strategies

For example in the game of rock-paper-scissor

We have probabilities over actions of each player as p=[p1,p2,....pn]∈Δnp=[p1, p2, .... p_n] \in \Delta_np=[p1,p2,....pn​]∈Δn​and q=[q1,q2,...,qm]∈Δmq=[q1,q2, ..., q_m] \in \Delta_mq=[q1,q2,...,qm​]∈Δm​

Δn:=p∈Rn:p1+...pn=1,pi>=0\Delta_n := {p \in R^n: p_1+...p_n=1, p_i >= 0}Δn​:=p∈Rn:p1​+...pn​=1,pi​>=0

Payoff: l(p,q):=Ei∼p,j∼q[lij]=pTLql(p,q):= E_{i\sim p, j \sim q} [l_{ij}] = p^TLql(p,q):=Ei∼p,j∼q​[lij​]=pTLq

Game: min⁡p∈Δnmax⁡q∈ΔmpTLq\min_{p\in \Delta_n} \max_{q \in \Delta_m} p^TLqminp∈Δn​​maxq∈Δm​​pTLq

Nash Equilibrium of a Game

Best worst-case move

s∗∈NASH  ⟹  lk(sk∗,s−k∗)≤lk(sk,s−k∗)∀ss^* \in \text{NASH} \implies l_k(s^*_k, s^*_{-k}) \leq l_k(s_k, s^*_{-k}) \forall ss∗∈NASH⟹lk​(sk∗​,s−k∗​)≤lk​(sk​,s−k∗​)∀s

Theorem

Any game with a finite set of players and a finite set of strategies has a Nash equilibrium of mixed strategies.

`