Learn how to create, train, and tweak large language models (LLMs) by building one from the ground up!
In Build a Large Language Model (from Scratch), you’ll discover how LLMs work from the inside out. In this insightful book, bestselling author Sebastian Raschka guides you step by step through creating your own LLM, explaining each stage with clear text, diagrams, and examples. You’ll go from the initial design and creation to pretraining on a general corpus, all the way to finetuning for specific tasks.
Build a Large Language Model (from Scratch) teaches you how to:
- Plan and code all the parts of an LLM - Prepare a dataset suitable for LLM training - Finetune LLMs for text classification and with your own data - Use human feedback to ensure your LLM follows instructions - Load pretrained weights into an LLM
The large language models (LLMs) that power cutting-edge AI tools like ChatGPT, Bard, and Copilot seem like a miracle, but they’re not magic. This book demystifies LLMs by helping you build your own from scratch. You’ll get a unique and valuable insight into how LLMs work, learn how to evaluate their quality, and pick up concrete techniques to finetune and improve them.
The process you use to train and develop your own small-but-functional model in this book follows the same steps used to deliver huge-scale foundation models like GPT-4. Your small-scale LLM can be developed on an ordinary laptop, and you’ll be able to use it as your own personal assistant. about the book
Build a Large Language Model (from Scratch) is a one-of-a-kind guide to building your own working LLM. In it, machine learning expert and author Sebastian Raschka reveals how LLMs work under the hood, tearing the lid off the Generative AI black box. The book is filled with practical insights into constructing LLMs, including building a data loading pipeline, assembling their internal building blocks, and finetuning techniques. As you go, you’ll gradually turn your base model into a text classifier tool, and a chatbot that follows your conversational instructions.
Some of my greatest passions are "Data Science" and machine learning. I enjoy everything that involves working with data: The discovery of interesting patterns and coming up with insightful conclusions using techniques from the fields of data mining and machine learning for predictive modeling.
I am a big advocate of working in teams and the concept of "open source." In my opinion, it is a positive feedback loop: Sharing ideas and tools that are useful to others and getting constructive feedback that helps us learn!
A little bit more about myself: Currently, I am sharpening my analytical skills as a PhD candidate at Michigan State University where I am currently working on a highly efficient virtual screening software for computer-aided drug-discovery and a novel approach to protein ligand docking (among other projects). Basically, it is about the screening of a database of millions of 3-dimensional structures of chemical compounds in order to identifiy the ones that could potentially bind to specific protein receptors in order to trigger a biological response.
In my free-time I am also really fond of sports: Either playing soccer or tennis in the open air or building models for predictions. I always enjoy creative discussions, and I am happy to connect with people. Please feel free to contact me by email or in one of those many other networks!
This book is fantastic. Great resource for learning and finding blind spots in your knowledge. The appendix is gold with tips and tricks on working with LLMs (like implementation of LoRa from scratch).
If you are thinking about working with LLMs, this book is an excellent starting point. It is extremely code heavy, but don’t be dissuaded because the code is necessary to understand how LLMs work. The focus on making sure you can follow along means you can do everything in the book on your own laptop.
Truth is, only a fool or a very rich person with lot of spare time may want to really build a LLM from scratch. But that is not the point: this book is an exceptional resource for anyone interested in understanding the intricacies of LLMs. The author effectively breaks down complex concepts into manageable steps making them accessible for readers with varying levels of expertise. He leads the audience step by step through all the minutiae behind the scenes of Large Language Models, using programming examples and clear explanations. At the end of the journey, the reader not only will be aware of what's inside the most modern AI models, but will truly appreciate the efforts that make them possible.
I think that some exposure to the mathematics behind LLMs, although not strictly necessary, is usefull to get the most from "Build a Large Language Model" and recommend the '3Blue1Brown' Deep Learning Series on youtube to supplement the information in this book.
Sebastian Raschka's "Build a Large Language Model (from Scratch)" is an exceptional resource for anyone interested in understanding and implementing LLMs from the ground up. The book is thorough, well-structured, and packed with practical insights into the world of transformer-based models, specifically tailored to demystify LLMs for readers with a foundation in machine learning.
The book is divided into seven chapters, each building upon the previous one to gradually guide readers from the basics of transformers to the fine-tuning of models for specific tasks. Here's a brief look at what each chapter covers:
Chapter 1 provides a solid introduction to LLM fundamentals, covering the transformer architecture in a way that’s both informative and approachable. Chapter 2 dives into the mechanics of preparing text for training, including tokenization, vectorization, and other foundational preprocessing steps. Chapter 3 focuses on attention mechanisms, explaining both the basics and advanced aspects of self-attention, and lays the groundwork for understanding multihead attention in transformers. Chapter 4 brings it all together with a hands-on guide to implementing a GPT-like model, covering essential coding techniques, model layers, and parameter considerations. Chapter 5 walks through the LLM pretraining process, detailing the calculation of training losses and providing the structure for pretraining models with high-quality performance. Chapter 6 explores fine-tuning strategies, allowing readers to transform a pretrained model into one tailored for tasks like spam detection. Chapter 7 concludes with a focus on instruction fine-tuning, explaining how to prepare datasets and adapt models to follow human instructions effectively.
Each chapter not only explains the "why" behind the techniques but also provides code snippets and clear step-by-step guidance to implement everything from attention mechanisms to pretraining and fine-tuning a functional model.
Overall, "Build a Large Language Model (from Scratch)" is an invaluable asset for anyone looking to gain a comprehensive understanding of LLMs, whether for professional development or academic curiosity. I highly recommend this book to intermediate and advanced readers who are eager to dive into the technicalities of LLMs and develop a model from scratch with practical guidance from one of the field's top educators.
All concepts are explained both theory and through code. I am familiar with TensorFlow, even then it is easy to follow the code which uses Pytorch. What kind of a tokenizer is used in LLMs Comparison of embedding layer to one-hot encoding for ease of understanding Learned about positional embeddings Detailed and step by step explanation of self attention implementation from a simple implementation to multi-head attention module Next we create a small GPT-2 model for generating text - here we also get to know the difference between ReLU and GELU. We also see the importance of shortcut connections. We see why we are basing our learning on GPT-2 model and not GPT-3. We see how step by step, a GPT model generates text given an input. Next the book explores training the model. While training, the weights are updated using backpropagation so that required target tokens can be generated through the model. For training and validation datasets, a short story from the public domain has been selected. Using a small dataset makes code execution faster and easier to learn. We also learn how to save and load model weights. We also see how to load pretrained weights from OpenAI. The book moves forward to fine tuning for classification. Difference between classification fine tuned model and an instruction fine tuned model is explained. We learn to classify text as spam and not spam by fine tuning the model on supervised data. Lastly, fine tuning to follow instructions is done by training the model on a dataset with input-output pairs, with input been the instruction and output been the desired response. Those who have hardware limitations can choose a smaller model. In order to score the model's responses, we learn about Llama model. Further references have been shared for every chapter for those who want to dive in deeper.
I have been waiting for this book since Sebastian announced it.
The book content is a perfect match to the title. It guides through the data prep, training, and evaluation, and includes exercises with solutions.
It has a balanced composition of illustrative diagrams (34%, I particularly like these visuals), code (33%), and text (33%, or even less). The content includes practical tips such as training times per epoch on different hardware setups.
The book is surprisingly concise: the author avoided going sideways of historical context, numerous model architectures, etc. which would blow up the book volume by 10x.
For those of us interacting with LLMs through APIs, the book is still valuable: it provides a solid introduction to fundamentals like tokens and the process of LLMs training. This aids in prompt engineering and choosing an appropriate use case for generative AI applications.
This book is a good, well-organized guide to building large language models (LLMs). It walks you through everything, from transformer basics to a working GPT-like model. The explanations are clear, and the code examples are helpful. You'll learn the steps, from pretraining to fine-tuning for instruction and classification tasks.
The book covers important concepts like tokenization, embeddings, and the self-attention mechanism. It’s good at explaining each part's purpose and how it fits into the whole. The step-by-step implementation of each component is a plus.
However, the book could go deeper. It doesn't always explain why things are the way they are, or the mathematical intuition. It focuses on the how, but I'd liked to see more on the mathematical underpinnings and reasoning behind some choices. I get why this could be considered beyond the scope of the book.
The book contains insightful material about model architectures for a few standard LLM families, including GPT and Llama. The supporting python code and interactive Jupyter notebooks are very helpful when working through the book. The only reason I don’t give the book five stars is that I was hoping for at least a small section about why the model architectures are built with the layers that they are. For example, why do GPT family models repeat the same transformer with multi head attention block a prescribed number of times? What is the rationale behind this? Is there a mathematical basis or is it all just empirical?
Had a blast reading through this book and implementing the LLM described. It's well written and clear. The author is recently making video walkthroughs on YouTube for each chapter, too.
(My background is as a programmer, but not an ML focused engineer. I've trained deep neural networks and other ML models before, but mostly for self-study or hobby projects.)
Beyond the basic structure of the book, it provides lots of suggestions for variations, improvements, and continued learning (appendixes, additional Notebooks in the Github repo).
I cannot praise this book enough. This one sets a new standard for in-depth, explanatory, technical books on complex topics.
The author managed explain attention mechanism, Transformers, decoder-based LLMs, and the most important concepts that comprise the contemporary LLMs (self-supervised learning with next-token prediction, instruction and supervised based fine-tuning) in a single, few hundred page book, in an approachable language and code.
Cant wait to go through it again (I bought the MEAP version) once a physical copy arrives.
This book was helpful for refreshing and reinforcing my understanding of transformer models. I like that it doesn’t just use a generic transformer architecture, but rather recreates GPT-2 in pytorch so that you can ultimately load pretrained weights from OpenAI. I also appreciated that there was an appendix on LoRAs.
The best book to learn the internal working of large language models. It has been a big black box how large language model have been generating accurate answers (definitely not the lame one like 2+2 =4 questions) and it turns out to be just next token predictor. Sebastian has broken down/simplified the math and calculation behind it token prediction with illustrative diagram. Additionally, you'll learn how to build classifier and generator using your CPU/GPU locally.
Raschka's illustrations alongside his publisher's spectacular format of designing this book is what will remain memorable when it's all said and done. He has done a fantastic job of boiling down these technical feats in a rapidly evolving field. So, Kudos to him! Just when SFT was the standard the last few months; now it is replaced with large scale RL! Well, these are the times, all in all: - a good read overall.
I liked the interactive approach of building up an LLM completely from scratch and explaining everything step by step. The step to the attention mechanism was quite steep to be honest. And in the end I felt some closure was lacking bringing it to a point where everything relied on pre-existing blocks like the torch.nn.Transformer
A lot of information about internal design of LLM. As a minus, I don’t see discussion on how to actually use this information in practice while working with LLMs.