Busniess Name: Sourabh Sharma
Qtr No. 213, New Town Yehlanka
Indore 454775, India
Which of the below given option is commonly used to addresses reward hacking in Reinforcement learning to measure the difference between two probability distributions.
Regularization
Gradient Descent
Policy Iteration
KL Divergence
To get all Infosys Certified Generative AI Professional - Expert Exam questions Join Telegram Group https://rebrand.ly/lex-telegram-236dee