YoVDO

Reinforcement Learning from Human Feedback

Offered By: DeepLearning.AI via Coursera

Tags

Prompt Engineering Courses Fine-Tuning Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences. Reinforcement Learning from Human Feedback (RLHF) is currently the main method for aligning LLMs with human values and preferences. RLHF is also used for further tuning a base LLM to align with values and preferences that are specific to your use case. In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will: 1. Explore the two datasets that are used in RLHF training: the “preference” and “prompt” datasets. 2. Use the open source Google Cloud Pipeline Components Library, to fine-tune the Llama 2 model with RLHF. 3. Assess the tuned LLM against the original base model by comparing loss curves and using the “Side-by-Side (SxS)” method.

Syllabus

  • Reinforcement Learning from Human Feedback

Taught by

Nikita Namjoshi

Related Courses

Discover, Validate & Launch New Business Ideas with ChatGPT
Udemy
150 Digital Marketing Growth Hacks for Businesses
Udemy
AI: Executive Briefing
Pluralsight
The Complete Digital Marketing Guide - 25 Courses in 1
Udemy
Learn to build a voice assistant with Alexa
Udemy