YoVDO

AI Math Compiler for Synthetic Dataset Generation - Emitting Reasoning Steps

Offered By: Chris Hay via YouTube

Tags

Compiler Design Courses Parsing Courses Abstract Syntax Tree Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore an innovative approach to generating synthetic math datasets for AI training in this 39-minute video presentation by Chris Hay. Discover how a custom-built AI math compiler produces accurate questions, answers, and step-by-step explanations, addressing a significant challenge in AI development. Learn about the compiler's structure, including its tokenizer, parser, abstract syntax tree, and instruction emitter, which generates natural language prompts instead of traditional assembly instructions. Gain insights into the process of creating reliable synthetic data for training large language models like GPT, Llama3.1, and Mistral, potentially similar to techniques used by Google DeepMind's Alphaproof and OpenAI's Q* or Project Strawberry. Understand how the compiler ensures accuracy in step-by-step explanations and utilizes LLMs like Mistral-nemo to refine the output into human-readable form. Ideal for those interested in synthetic data generation for AI models or compiler functionality, with the added benefit of access to the open-source code on GitHub.

Syllabus

I built an AI Math Compiler that emits synthetic datasets rather than code


Taught by

Chris Hay

Related Courses

Programming Languages
University of Virginia via Udacity
Compilers
Stanford University via Coursera
Compilers
Stanford University via edX
Introduction to Natural Language Processing
University of Michigan via Coursera
Advanced Software Construction in Java
Massachusetts Institute of Technology via edX