Why does ChatGPT work so well? Is it “just scaling up GPT-3” under the hood? In this 🧵, let’s discuss the “Instruct” paradigm, its deep technical insights, and a big implication: “prompt engineering” as we know it may likely disappear soon:👇
View TweetThe original GPT-3 was trained by a minimalistic objective: predict the next word on a massive text corpus. Many abilities magically emerge, such as reasoning, coding, translation. You can even do “few-shot learning”: define new tasks by providing I/O examples in context. 1/ https://t.co/1nDFsFLu33 View Tweet
It’s not at all obvious why simply predicting the next word can give us such abilities. One intuitive explanation is to imagine a detective story. Suppose the model needs to fill in the last blank: “the murderer is ___”, then it has to do deep reasoning to answer correctly. 2/ https://t.co/Jozv219EEP View Tweet
But this is not enough. In practice, we have to coax GPT-3 to autocomplete what we desire by carefully curating the examples, wording, and structure. This is exactly “prompt engineering”, where users have to practice the awkward and sometimes nonsensical vernacular of LLMs. 3/ View Tweet
Prompt engineering is a BUG🐞, not a feature! It’s caused by the fundamental misalignment between the next-word objective and the actual user intent in real applications. Example: you want GPT-3 to “Explain the moon landing to a 6yo”. It replies like a drunk parrot🦜: 4/
View TweetPrompt engineering is even worse in DALLE2 and Stable Diffusion. Just go to https://t.co/FIcVovqhdb and see how insane some prompts are. My favorite is the “parentheses trick” - adding (((…))) sometimes gives you better images 😅. It’s both hilarious and embarassing. 5/
View TweetChatGPT and the base model InstructGPT address the plague in an elegant way. The key observation is that alignment is very hard to be captured by in-the-wild data. Humans must be in the loop to help tutor GPT, and GPT will be able to ask better questions as it improves. 6/
View TweetThere are 3 steps. The first is very straightforward: just collect a dataset of human-written answers to prompts that users submit, and finetune GPT by supervised learning. It’s easiest but also the most costly: it could be slow and painful for humans to write long responses. 7/
View TweetStep 2 is much more interesting. GPT is asked to propose a few different answers, and all a human annotator needs to do is ranking the responses from most desirable to least. Using these labels, we can train a reward model that captures human preferences. 8/
View TweetIn reinforcement learning (RL), the reward function is typically hardcoded, such as the game score in Atari games. ChatGPT’s data-driven reward model is a powerful idea. Another example is our recent MineDojo work that learns reward from tons of Minecraft YouTube videos: 9/ https://t.co/uaInr0qzAY View Tweet
Step 3: treat GPT as a policy and optimize it by RL against the learned reward. PPO is chosen as a simple and effective training algorithm. Now that GPT is better aligned, we can rinse and repeat step 2-3 to improve it continously. It’s like CI for LLM! 10/
View TweetThis is the “Instruct” paradigm - a super effective way to do alignment, as evident in ChatGPT’s mindblowing demos. The RL part also reminds me of the famous P=NP (or ≠) problem: it tends to be much easier to verify a solution than actually solving the problem from scratch. 11/
View TweetSimilarly, humans can quickly assess the quality of GPT’s output, but it’s much harder and cognitively taxing to write out a full solution. InstructGPT exploits this fact to lower the manual labeling cost significantly, making it practical to scale up the model CI pipeline. 12/ View Tweet
Another interesting connection is that the Instruct training looks a lot like GANs. Here ChatGPT is a generator and reward model (RM) is a discriminator. ChatGPT tries to fool RM, while RM learns to detect alien with human help. The game converges when RM can no longer tell. 13/
View TweetModel alignment with user intent is also making its way to image generation! There are some preliminary works, such as https://t.co/zplEcplkng. Given the explosive AI progress, how long will it take to have an Instruct- or Chat-DALLE that feels like talking to a real artist? 14/
View TweetSo folks, enjoy prompt engineering while it lasts! It’s an unfortunate historical artifact - a bit like alchemy🧪, neither art nor science. Soon it will just be “prompt writing” - my grandma can get it right on her first try. No more magic incantations to coerce the model. 15/
View TweetOf course, ChatGPT is not perfect enough to completely eliminate prompt engineering for now, but it is an unstoppable force. Meanwhile, the model has other serious syndromes: hallucination & habitual BS. I covered this in another thread: 16/ https://t.co/CwKtn50Qwq View Tweet
There are ongoing open-source efforts for the Instruct paradigm! To name a few: 👉 trlx @carperai https://t.co/DFLVMQTP4M. Carper AI is an org from @StabilityAI 👉 RL4LM @rajammanabrolu @allen_ai https://t.co/1T4KInsUiH I’m so glad to have met the above authors at NeurIPS! 17/
View TweetFurther reading: reward model also has scaling laws: https://t.co/TyNH4sfRLq! Also the RM is only an imperfect proxy (unlike Atari), so it’s a bad idea to over-optimize. This paper is from @johnschulman2, inventor of PPO. Super interesting work but went under the radar. 18/
View TweetThere are also other artifacts caused by the misalignment problem, such as prompt hacking or “injection”. I actually like this one because it allows us to bypass OpenAI’s prompt prefix and fully unleash the model 😆. See @goodside’s cool findings: 19/ https://t.co/gROKXPlssl View Tweet
Thanks for reading! Welcome to follow me for more deep dives in the latest AI tech 🙌. References: 👉 https://t.co/dgmtKnKmc2 👉 InstructGPT paper: https://t.co/ql8NRRoqds 👉 https://t.co/hBssFw6nyP 👉 Beautiful illustrations: https://t.co/A887eClqxY END/🧵 View Tweet
