Which of the below given option is commonly used to addresses reward hacking in Reinforcement learning to measure the difference between two probability distributions.

Regularization

Gradient Descent

Policy Iteration

KL Divergence

Verified Answer
Correct Option - d

To get all Infosys Certified Generative AI Professional - Expert Exam questions Join Telegram Group https://rebrand.ly/lex-telegram-236dee

Telegram