# Reward Hacking

Agent/Model learns an unintentional (undesirable) behaviour to achieve the high reward.&#x20;

This happens because:

* We thought that the given reward will induce some particular kind of behaviour that we want the agent to execute to complete the task.&#x20;
  * But the agents completes the task (achieves high reward) but by using the behaviour that's not be wanted because of a few reasons.
* Reward hacking can happen because Reward doesn't have 1-to-1 mapping withe policy. Multiple policies can achieve same reward.&#x20;
* It is related to problem of reward misspecification.&#x20;
* For example, give a bipedal agent, the reward is 1 for covering some distance forward which corresponds to task of moving forward, there's multiple policy possible
  * Walk
  * Crawl
  * Roll
  * But out of all these, for us, we maybe only intended for the agent to use walk policy and other's are undesirable.&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://theshank.gitbook.io/ai/llms/reward-hacking.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
