An AI-driven virtual assistant is being developed to compose emails on behalf of users using Reinforcement Learning with Human Feedback. How does KL divergence used to address reward hacking in this scenario?

By changing the user feedback to increase the agent's rewards.

By making the virtual assistant generate random responses to avoid hacking.

By ignoring user feedback and focusing on predefined responses.

By quantifying the difference between the agent's predicted actions and its actual actions.

Verified Answer
Correct Option - d

To get all Infosys Certified Generative AI Professional - Expert Exam questions Join Group https://bit.ly/infy_premium_group

We're passionate about offering best placement materials and courses!! A one stop place for Placement Materials. We daily post Offcampus updates and Placement Materials.

Qtr No. 213, New Town Yehlanka Indore 454775

admin@prepflix.in