🧠
AI
  • Artificial Intelligence
  • Intuitive Maths behind AI
    • Probability
    • Information Theory
    • Linear Algebra
    • Calculus
  • Overview
  • Research Ideas and Philosophy
  • Basic Principles
  • Information Theory
    • Entropy
    • Log Probability
  • Probability & Statistics
    • Random Variables
    • Probability
      • Probablistic Equations
      • Bayes Theorem
      • Probability Distributions & Processes
    • Statistics
      • Measures
      • Z-Scores
      • Covariance and Correlation
      • Correlation vs Dependance
    • Mahalanobis vs Chi-Squared
    • Uncertainty
    • Statistical Inference
      • Graphical Models
      • Estimator vs Parameter
      • Estimation
      • Bayesian/Probabilistic Inference
        • Probabilistic Modelling
        • Problems of Bayesian Inference
        • Conjugate Priors
        • Dirichlet Distribution/Process
        • Posterior Predictive Distribution
      • Sampling-Based Inference
    • Sampling
      • Rejection Sampling
      • Reservoir Sampling
      • Thompson Sampling
    • Bayesian Inference
    • Regression
    • Markov
    • Monte Carlo
      • Monte Carlo Estimators
      • Importance Sampling
    • Kernel Density Estimation
    • Gaussian Processes
    • Gaussian Soap Bubble
  • Linear Algebra
    • Vector Space and Matrices
    • Geometry of System of Linear Equations
    • Determinants
    • Transformations
    • Geometrical Representation
    • Positive (Semi)Definite Matrices
    • Matrix Interpretation
    • Dot Product as Linear Transformation and Duality of Vector-Linear Transformation
    • Norms
    • Linear Least Square
    • Matrix Decomposition
      • QR Decomposition
      • Cholesky Decomposition
      • Eigen Value Decomposition
      • SVD - Singular Value Decomposition
    • Matrix Inversion
    • Matrix Calculus
    • Matrix Cookbook
    • Distributed Matrix Algebra
    • High Dimensional Spaces
  • Optimization
    • Derivatives
      • Partial Derivative
      • Directional Derivative
      • Gradient
      • Jacobian
    • Regularization
    • Gradient Descent
    • Newton's Method
    • Gauss-Newton
    • Levenberg–Marquardt
    • Conjugate Gradient
    • Implicit Function Theorem for optimization
    • Lagrange Multiplier
    • Powell's dog leg
    • Laplace Approximation
    • Cross Entropy Method
    • Implicit Function Theorem
  • Statistical Learning Theory
    • Expectation Maximization
  • Machine Learning
    • Clustering
    • Bias Variance Trade-off
  • Deep Learning
    • PreProcessing
    • Convolution Arithmetic
    • Regularization
    • Optimizers
    • Loss function
    • Activation Functions
    • Automatic Differentiation
    • Softmax Classifier and Cross Entropy
    • Normalization
    • Batch Normalization
    • Variational Inference
    • VAE: Variational Auto-Encoders
    • Generative vs Discriminative
      • Generative Modelling
    • Making GANs train
    • Dimensionality of Layer Vs Number of Layers
    • Deep learning techniques
    • Dilated Convolutions
    • Non-Maximum Suppression
    • Hard Negative Mining
    • Mean Average Precision
    • Fine Tuning or Transfer Learning
    • Hyper-parameter Tuning
  • Bayesian Deep Learning
    • Probabilistic View
    • Uncertainty
    • Variational Inference for Bayesian Neural Network
  • Reinforcement Learning
    • General
    • Multi-armed Bandit
    • Imitation Learning
    • MDP Equations
    • Solving MDP with known Model
    • Value Iteration
    • Model Free Prediction and Control
    • Off Policy vs On Policy
    • Control & Planning from RL perspective
    • Deep Reinforcement Learning
      • Value Function Approximation
      • Policy Gradient
        • Algorithms
    • Multi Agent Reinforcement Learning
    • Reinforcement Learning - Sutton and Barto
      • Chapter 3: Finite Markov Decision Processes
      • Chapter 4: Dynamic Programming
    • MBRL
  • Transformers
    • Tokenziation
    • Embedding
      • Word Embedding
      • Positional Encoding
    • Encoder
    • Decoder
    • Multi-head Attention Block
    • Time Complexities of Self-Attention
    • KV Cache
    • Multi-head Latent Attention
    • Speculative Decoding
    • Flash Attention
    • Metrics
  • LLMs
    • LLM Techniques
    • LLM Post-training
    • Inference/Test Time Scaling
    • Reasoning Models
    • Reward Hacking
  • Diffusion Models
    • ImageGen
  • Distributed Training
  • State Space Models
  • RLHF
  • Robotics
    • Kalman Filter
    • Unscented Kalman Filter
  • Game Theory and ML
    • 1st Lecture - 19/01
    • Lecture 2 - 22/01
    • Lecture 4: Optimization
  • Continual Learning
    • Lecture - 21/01
    • iCaRL: Incremental Classifier and Representation Learning
    • Variational Continual Learning
  • Computer Vision
    • Hough Transform
    • Projective Geometry
      • Extrinsic and Intrinsic Parameters
      • Image Rectification
    • Tracking
    • Optical Flow
    • Harris Corner
    • Others
  • Papers
    • To Be Read
    • Probabilistic Object Detection and Uncertainty Estimation
      • BayesOD
      • Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection
      • Gaussian YOLOv3
      • Dropout Sampling for Robust Object Detection in Open-Set Condition
      • *Sampling Free Epistemic Uncertainty Estimation using Approximated Variance Propagation
      • Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
      • Can We Trust You? On Calibration of Probabilistic Object Detector for Autonomous Driving
    • Object Detection
    • Temporal Fusion in Object Detection/ Video Object Detection
    • An intriguing failing of convolutional neural networks and the CoordConv solution
    • A Neural Algorithm of Artistic Style - A.Gatys
  • Deep Learning Book
    • Chapter 4: Optimization
    • Chapter 5: Machine Learning Basics
    • Chapter 6: Deep FeedForward Networks
  • Python
    • Decorators
    • Packages
      • Pip
    • Gotchas
    • Async functions
  • Computer Science
  • TensorFlow
  • Pytorch
    • RNN/LSTM in Pytorch
    • Dataset/ Data loader
    • Resuming/Loading Saved model
  • Programming
    • Unit Testing
    • How to write code
  • General Software Engineering
    • SSH tunneling and Ngrok
  • How To Do Research
  • Resources
  • ROS for python3
  • Kitti
Powered by GitBook
On this page

TensorFlow

Efficient TensorFlow usuage

PreviousComputer ScienceNextPytorch

Last updated 5 years ago

  • Try to not use feed dicts for input. tf.data is good for input pipeline.

  • Fused Batch Norm is comparative faster than non-fused batch norm implementation.

  • You can use compile TF to use intel MKL to speed ups.

The only functionality I’m struggling with, is the .shuffle(buffer_size)function. This, like the map() and .batch() function, can be applied on a dataset, to create a new dataset, loads the first buffer_size elements into memory and shuffles them. I explained the problem, given an example, that can arise from this function, that we as users need to be aware of. But in short: If your initial list of samples (image path + label) is ordered, so that you all samples of the first class in the beginning, followed by all samples of the second class etc., and your buffer_size is smaller than the number of samples you have of each class, you will have batches with samples of only one class and your model will not train. Therefore we need to take care of an initial shuffling ourselves and not rely on TensorFlows shuffling.

Tensorflow in Practise

-

  • Debugging Steps:

    • Step 1: Check the architecture

    • Step 2: Check the hyper-parameters of neural network

    • Step 3: Check the Complexity of network

    • Step 4: Check the Structure of Input data

    • Step 5: Check the Distribution of data

  • Static and Dynamic Shapes: .get_shape() - gives the static shape of the tensor. = Static tf.shape(x) - returns the op to get the shape of x at the runtime. = Dynamic Note: The static shape is very useful to debug your code with print so you can check your tensors have the right shapes.

  • Getting Validation and training loss using same loss op. Loss_op = tf.losse(something)

    • Now if you want val loss and train loss on same plot. -Then make two writers - When adding the training_loss_summ, add it to one writer. - When adding the validation_loss_summ, add it to 2nd writer

    • If you want two plot, one having training error and other having val error. Then just create two loss_summ ops with same loss_ops.

  • tf.AUTOREUSE -

    • tf.AUTO_REUSE which tells TensorFlow to create a new variable if a variable with the same name doesn't exist, and reuse otherwise:

    • with tf.variable_scope("scope", reuse=tf.AUTO_REUSE): features1 = tf.layers.conv2d(image1, filters=32, kernel_size=3) features2 = tf.layers.conv2d(image2, filters=32, kernel_size=3)

  • Python ops: def py_input_fn(): actual_data = np.random.normal(size=[100]) return actual_data data = tf.py_func(py_input_fn, [], (tf.float32))

    • Python ops allow you to convert a regular Python function to a TensorFlow operation.

  • DataSet API is also useful.

  • Overload Operators:

    • z = -x # z = tf.negative(x)

    • z = x + y # z = tf.add(x, y)

    • z = x - y # z = tf.subtract(x, y)

    • z = x * y # z = tf.mul(x, y)

    • z = x / y # z = tf.div(x, y)

    • z = x // y # z = tf.floordiv(x, y)

    • z = x % y # z = tf.mod(x, y)

    • z = x ** y # z = tf.pow(x, y)

    • z = x @ y # z = tf.matmul(x, y)

    • z = x > y # z = tf.greater(x, y)

    • z = x >= y # z = tf.greater_equal(x, y)

    • z = x < y # z = tf.less(x, y)

    • z = x <= y # z = tf.less_equal(x, y)

    • z = abs(x) # z = tf.abs(x)

    • z = x & y # z = tf.logical_and(x, y)

    • z = x | y # z = tf.logical_or(x, y)

    • z = x ^ y # z = tf.logical_xor(x, y)

    • z = ~x # z = tf.logical_not(x)

  • Order of Execution and Control Dependencies. Directly see this from the above mentioned website. This defines the order of execution of ops which may not be much clear while making the graphs.

  • Some Important Fucntions in TF:

    • Tf.cond - Works as if in python

    • Tf.where -

    • tf.TensorArray - This allow growing tensor array, with no definite shape.

  • Make sure that when using softmax, you have to computer e^x, now that float value can only store upto 3.40282e+38, a hence max value of x allowed is ln(3.40282e+38) = 88.7, hence have to make sure that activation of final layer is below that value.

  • See the batch norm specified in the last of above website.

  • Batch Norm Issues:

    • See the bookmarks in DL folder.

    • Use update_graphkey as before train_op. Or use update_collection=None as parameters in contrib batch_norm.

    • Also make right use of reuse parameter in the function.

  • Small batch size is better than large batch size for less generalization gap.

Here
https://github.com/vahidk/EffectiveTensorflow