In Reinforcement Learning with Human Feedback which algorithmic approach is most frequently employed to fine-tune the policy based on the learned reward model derived from human feedback?

Q-learning
Deep Deterministic Policy Gradient (DDPG)
Proximal Policy Optimization (PPO)
Monte Carlo Tree Search (MCTS)
Verified Answer
Correct Option - c

To get all Infosys Certified Generative AI Professional Advanced Exam questions Join Telegram Group https://rebrand.ly/lex-group-70b557

Telegram

We're passionate about offering best placement materials and courses!! A one stop place for Placement Materials. We daily post Offcampus updates and Placement Materials.

Qtr No. 213, New Town Yehlanka Indore 454775

admin@prepflix.in

Updated on Tue, 16 Sept 2025