# Chapter 5: Machine Learning Basics

### I.I.D Assumptions of data-generating process

Let's say we have have training and test dataset. Then the examples in each dataset are independent from each other i.e sampling of one example doesn't affect the sampling of others. So, for example, in object detection , we collect data inform of video and then annotate it. Now, it's very important that we shuffle the frames, otherwise examples in dataset will be in sequence of each other and hence not independent. Also, the training and test test should be identically distributed i.e it should not be the case that either of dataset is biased with some particular kind of examples, in other words, examples in each data should follow the same probability distribution. me

We call it data-generating distribution $$p\_{data}$$.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://theshank.gitbook.io/ai/deep-learning-book/chapter-5-machine-learning-basics.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.