In Reinforcement Learning with Human Feedback which algorithmic approach is most frequently employed to fine-tune the policy based on the learned reward model derived from human feedback?

Q-learning
Deep Deterministic Policy Gradient (DDPG)
Proximal Policy Optimization (PPO)
Monte Carlo Tree Search (MCTS)
Verified Answer
Correct Option - c

To get all Infosys Certified Generative AI Professional Advanced Exam questions Join Telegram Group https://rebrand.ly/lex-telegram-236dee

Telegram