pollen

❯

❯

❯

The End of Finetuning — With Jeremy Howard of Fast.ai 3

The End of Finetuning — With Jeremy Howard of Fast.ai-3

Oct 28, 202312 min read

Episode AI notes

PhD programs didn’t focus on practical skills like working with vision, data tables, recommendation systems, and text. After 30 years, there is finally a chance to make it happen.
The focus in machine learning shifted from zero shot and few shot learning to transfer learning using the ULM Fit approach, fine tuning on an instruction corpus, and RLHF fast classification.
Join active and engaging Discord communities like Luther Discord, Carper AI, Alignment Lab, and OS Skunk Works to build and learn from others.
Fast AI’s goals can be expanded to make coding more accessible, eliminating prerequisite experience and addressing uncertainties in training models and fine-tuning.
There is limited knowledge about the capabilities and limitations of GPT-4, but it has shown promising performance in playing chess.
Recognize the diverse capabilities of people and embrace technology as an opportunity to improve humanity. Time 0:00:00
2023-10-28

1min Snip Summary: PhD programs were the only available courses, and they didn’t focus on practical skills. We wanted to teach useful skills like working with vision, data tables, recommendation systems, and text. NLP was more academic than practical. I had a crazy idea inspired by cognitive science about symbolic manipulations. It’s been 30 years, and finally, we have a chance to make it happen.
Speaker 1
I mean, there wasn’t much in the way of courses. Really, the courses out there were PhD programs that had happened to have recorded their lessons, they would really mention it at all. We wanted to show how to do four things that seemed really useful, you know, work with vision, work with tables of data, work with kind of recommendation systems and collaborative filtering And work with text, because we felt like those four kind of modalities covered a lot of the stuff that are useful in real life. And no one was doing anything much useful with text. Everybody was talking about Word to VAC, you know, like King plus, Queen minus woman and blah, blah, blah. It was like cool experiments, but nobody’s doing anything like useful with it. NLP was all like, monetization and stop words and topic models and biograms and SVMs. And it was really academic and not practical. Yeah, I mean, to be honest, I’ve been thinking about this crazy idea for nearly 30 years, since I had done cognitive science at university, where we talked a lot about the cells Chinese Room experiment, this idea of like, what if there was somebody that could kind of like, do all of the symbolic manipulations required to answer questions in Chinese, but they didn’t Speak Chinese. They were kind of inside a room with no other way to talk to the outside world, other than taking in slips of paper with Chinese written
Time 0:07:06
2023-10-20
1min Snip Time 0:11:24
2023-10-20
1min Snip Summary: In the world of machine learning, everyone was obsessed with zero shot and few shot learning after GPT’s success. Transfer learning was ignored, until a key idea emerged: using the ULM Fit approach, but fine tuning on an instruction corpus and then on RLHF fast classification. This breakthrough shifted the focus and proved to be effective.
Speaker 1
Actually quite a few months after the first ULM Fit example, I think. But yeah, there was a bit of this stuff going on. And the problem was everybody was doing, and particularly after GPT came out there, and everybody wanted to focus on zero shot and few shot learning. You know, everybody hated fine tuning, everybody hated transfer learning. And like, I literally did tours trying to get people to start doing transfer learning. And nobody was interested, particularly after GPT showed such good results with zero shot and few shot learning. And so I actually feel like we kind of went backwards for years. And not to be honest, I’m a bit sad about this now, but I kind of got so disappointed and dissuaded. It felt like these much bigger labs, you know, like fast AI had only ever been just me and Rachel were getting all of this attention for an approach. I thought was the wrong way to do it. You know, I was convinced was the wrong way to do it. And so yeah, for years, people were really focused on getting better zero shot and few shot. And it wasn’t until this key idea of like, well, let’s take the ULM fit approach. But for step two, rather than fine tuning on a kind of a domain corpus, let’s fine tune on an instruction corpus. And then in step three, rather than fine tuning on a reasonably specific task classification, let’s fine tune on a RLHF fast classification. And so that was really key,
Time 0:15:41
2023-10-20
1min Snip Time 0:16:26
2023-10-20
Discord Communities for Building and Learning Summary: Join the active and engaging Discord communities like Luther Discord, Carper AI, Alignment Lab, and OS Skunk Works. These communities are open, accessible, and perfect for people who want to build stuff. Just ask me for admin access if you need it. We’re looking for people who are eager to learn and contribute, not those who just want to boss others around. So, if you’re willing to take on small helpful tasks and learn along the way, we’d love to have you!
Speaker 1
So you can do the aloof or discord still. You know, one problem with your luther discord is it’s been going on for so long that it’s like, it’s very easy to say baseball. It’s hard to get started. Yeah. Carper AI. Oh, looks, I think it’s all open. That’s more accessible. Yeah. There’s also just recently, uh, now’s research that does like the Hermes models and dataset just opened. They’ve got some private channels, but it’s pretty open. I think, uh, you mentioned alignment lab that one, it’s all the interesting stuff is on private channels. So just ask if you know me, ask me, because I’ve got admin on that one. There’s also, yeah, uh, OS skunk works, OS skunk work, say I is a good discord, which I think it’s open. So they’re, yeah, they’re all pretty good.
Speaker 3
I don’t want you to leak any discourse. I don’t want any publicity, but no, I mean, we all want people.
Speaker 1
We all want people. We just want people who like want to build stuff.
Speaker 3
Yeah.
Speaker 1
And like it’s fine to not know anything as well. But if you don’t know anything, but you want to tell everybody else what to do and how to do it, that’s annoying. If you don’t know anything and want to be told, like, here’s a really small kind of task that as somebody who doesn’t know anything, it’s going to take you a really long time to do, but it Would still be helpful. Then, and then you go and do it. That would be great. The truth is maybe 5% of people who come in with great enthusiasm and saying that they want to learn and they’ll do anything.
Time 0:41:51
2023-10-24
Expanding Fast AI’s Goals: Making Coding More Accessible Summary: Fast AI’s goals can be expanded to make coding more accessible, eliminating the need for prerequisite experience. Open AI seems to have some knowledge but still lacks understanding in training models, fine-tuning, and other areas. There is much uncertainty regarding the capabilities and limitations of AI models.
Speaker 1
So, you know, yeah, what does it look like to like, really grab this opportunity? Maybe fast AI’s goals can be dramatically expanded now to being like, let’s make coding more accessible, you know, or kind of AI oriented coding more accessible. If so, our calls should probably look very different, you know, and we’d have to throw away that like, oh, you have to have at least a year of full time programming as a prerequisite. Yeah, what would happen if we got rid of that? So that’s kind of one thought that’s in my head, you know, as to what should other people do. Honestly, I don’t think anybody has any idea, like the more I look at it, what’s going on. I know I don’t, you know, like, we don’t really know how to do anything very well. Clearly open AI do, like they seem to be quite good at some things or they’re talking to folks at or who have recently left open AI. Even there, it’s clear there’s a lot of stuff they’d haven’t really figured out and they just kind of like using recipes that they’ve noticed have been okay. So yeah, we don’t really know how to train these models well, we don’t know how to fine-tune them well, we don’t know how to do rack well, we don’t know what they can do, we don’t know what They can’t do, we don’t know how big a model you need to solve different kinds of problems, we don’t know what kind of problems they can’t do, we don’t know what good prompting strategies Are for particular problems, you know,
Time 0:57:52
2023-10-26
The Unknown Capabilities of GPT-4: A Closer Look at Its Potential Summary: We have limited knowledge on how to properly train and fine-tune models, utilize them in diverse tasks, determine their limitations, or optimize prompting strategies. However, someone shared a Python code of 6000 lines for a GPT-4 prompting strategy that achieved an Elo of 3400 when playing chess against top chess engines, challenging the belief that GPT-4 was incapable of playing chess. This highlights the uncertainty surrounding the capabilities of these models. It feels like the early days of computer vision in 2013, where we had yet to discover the true potential of techniques like AlexNet and VGGNet.
Speaker 1
So yeah, we don’t really know how to train these models well, we don’t know how to fine-tune them well, we don’t know how to do rack well, we don’t know what they can do, we don’t know what They can’t do, we don’t know how big a model you need to solve different kinds of problems, we don’t know what kind of problems they can’t do, we don’t know what good prompting strategies Are for particular problems, you know, somebody sent me a message the other day saying they’ve written something that is a prompting strategy for GPT-4, they’ve written like 6000 Lines of Python code and it’s to help it play chess. And they said they’ve had it play against other chess engines, including the best stockfish engines and it’s got an Elo of 3400. Oh my god. Which would make it close to the best chess engine in existence. And I think this is a good example of like people were saying like GPT-4 can’t play chess, I was sure that was wrong, I mean obviously it can play chess, but the difference between like With no prompting strategy, they can’t even make legal moves with good prompting strategies, it might be just about the best chess engine in the world, far better than any human player. So yeah, I mean we don’t really know what the capabilities are yet. So I feel like it’s all blue sky at this point. It feels like computer vision in 2013 to me, which was like in 2013 computer vision, we just said the other step. We’ve had Alex net, we’ve had VGG net, around the time Zyla and Fergus like, no, it’s probably before that. So we hadn’t had the Zyla and Fergus like, oh, this is actually what’s going on inside the layers.
Time 0:58:50
2023-10-26
The Importance of Recognizing the Diverse Capabilities of People and Embracing Technology Summary: There is an important message to remember: the world is full of diverse people with valuable experiences and capabilities. We now have powerful technology that can be seen as either scary or as an opportunity for people to improve humanity. It has always been a battle between those who want to control the power and those who believe in the potential of humanity as a whole.
Speaker 2
Awesome. And yeah, before wrapping, what’s a one message, one idea you want everyone to remember and think about?
Speaker 1
You know, I guess the main thing I want everybody to remember is that, you know, there’s a lot of people in the world and they have a lot of, you know, diverse experiences and capabilities. You know, they all matter. And now that we have a newly powerful technology in our lives, we could think of that one of two ways. One would be, gee, that’s really scary. What would happen if all of these people in the world had access to this technology? Some of them might be bad people. Let’s make sure they can’t have it. Or one might be, wow, with all those people in the world, I bet a lot of them could really improve the lives of a lot of humanity if they had this tool. This has always been the case, you know, from the invention of writing to the invention of the printing press to the development of education. And it’s been a constant battle between people who think that the distributed power is unsafe. And it should be held on to by an elite few. And people who think that humanity on net is a marvelous species, particularly when part of a society in a civilization.
Time 1:06:10
2023-10-26

Cover

AuthorLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, …

TypePodcast

Listen to episode(share.snipd.com)

Graph View

The End of Finetuning — With Jeremy Howard of Fast.ai
Highlights

Created with Quartz v4.5.2 © 2026

about
so, what's enzyme?