Running Llama 2 with Extended Context Length - Up to 32k Tokens
Offered By: Trelis Research via YouTube
Course Description
Overview
Syllabus
How to run Llama 2 with longer context length
Run Llama 2 with 16k context in Google Colab
How to run a GPTQ model in Colab
Run Llama 2 7B with 32k context length using RunPod
Run Llama 2 13B for better performance! 16k context length
Streaming Llama 2 13B on 16k context length
Adjusting max token output and temperature
Streaming Llama 2 13B on 16k context length and 0 temperature
STREAMING LLAMA 2 13B ON 32k CONTEXT LENGTH!
PRO NOTEBOOK - Save Chats and Files. Easily adjust context length.
THEORY BONUS: How to get longer context length?
How does GPTQ work?
How does Flash attention work?
What is the best model for long context length?
What is better Llama 2 or Code-llama or YaRN?
Tips for long context lengths
Taught by
Trelis Research
Related Courses
Digital Signal ProcessingÉcole Polytechnique Fédérale de Lausanne via Coursera Principles of Communication Systems - I
Indian Institute of Technology Kanpur via Swayam Digital Signal Processing 2: Filtering
École Polytechnique Fédérale de Lausanne via Coursera Digital Signal Processing 3: Analog vs Digital
École Polytechnique Fédérale de Lausanne via Coursera Digital Signal Processing 4: Applications
École Polytechnique Fédérale de Lausanne via Coursera