pollen

❯

❯

❯

The End of Finetuning — With Jeremy Howard of Fast.ai

The End of Finetuning — With Jeremy Howard of Fast.ai

Oct 28, 202312 min read

How to Do Transfer Learning With Text PhD programs were the only available courses, and they didn’t focus on practical skills. We wanted to teach useful skills like working with vision, data tables, recommendation systems, and text. NLP was more academic than practical. I had a crazy idea inspired by cognitive science about symbolic manipulations. It’s been 30 years, and finally, we have a chance to make it happen.
Jeremy Howard
I mean, there wasn’t much in the way of courses. Really, the courses out there were PhD programs that had happened to have recorded their lessons. They would rarely mention it at all. We wanted to show how to do four things that seemed really useful, you know, work with vision, work with tables of data, work with kind of recommendation systems and collaborative filtering, And work with text. Because we felt like those four kind of modalities covered a lot of the stuff that are useful in real life. And no one was doing anything much useful with text. Everybody was talking about Word2Vec, you know, like king plus queen minus woman and blah, blah, blah. It was like cool experiments, but nobody’s doing anything like useful with it. NLP was all like lemmatization and stop words and topic models and bigrams and SVMs. And it was really academic and not practical. Yeah, I mean, to be honest, I’ve been thinking about this crazy idea for nearly 30 years, since I had done cognitive science at university, where we talked a lot about the CELS Chinese Room experiment, this idea of like, what if there was somebody that could kind of like, knew all of the symbolic manipulations required to answer questions in Chinese, but they didn’t Speak Chinese. They were kind of inside a room with no other way to talk to the outside world other than taking in slips of paper with Chinese written
Time 0:07:06
2023-10-20
1min Snip Time 0:11:24
2023-10-20
How to Get Better Zero Shot and Few Shot Learning In the world of machine learning, everyone was obsessed with zero shot and few shot learning after GPT’s success. Transfer learning was ignored, until a key idea emerged: using the ULM Fit approach, but fine tuning on an instruction corpus and then on RLHF fast classification. This breakthrough shifted the focus and proved to be effective.
Jeremy Howard
You know actually quite a few months after the first ulm fit example i think but yeah there’s a bit of this stuff going on and the problem was everybody was doing and particularly after Gpt came out then everybody wanted to focus on zero shot and few shot learning. You know, everybody hated fine tuning. Everybody hated transfer learning. And like, I literally did tours trying to get people to start doing transfer learning. And nobody was interested, particularly after GPT showed such good results with zero shot and few shot learning. And so I actually feel like we kind went backwards for years. And not to be honest, I mean, I’m a bit sad about this now, but I kind of got so disappointed and dissuaded. It felt like these much bigger labs, you know, like Fast.ai had only ever been just me and Rachel were getting all of this attention for an approach I thought was the wrong way to do it. You know, I was convinced was the wrong way to do it. And so, yeah, for years, people were really focused on getting better zero shot and few shot. And it wasn’t until this key idea of like, well, let’s take the ULM fit approach. But for step two, rather than fine tuning on a kind of a domain corpus, let’s fine tune on an instruction corpus. And then in step three, rather than fine tuning on a reasonably specific task classification, let’s fine tune on a RLHF task classification. And so that was
Time 0:15:41
2023-10-20
1min Snip Time 0:16:26
2023-10-20
Discord Communities for Building and Learning Join the active and engaging Discord communities like Luther Discord, Carper AI, Alignment Lab, and OS Skunk Works. These communities are open, accessible, and perfect for people who want to build stuff. Just ask me for admin access if you need it. We’re looking for people who are eager to learn and contribute, not those who just want to boss others around. So, if you’re willing to take on small helpful tasks and learn along the way, we’d love to have you!
Jeremy Howard
Open so you can do the aluther discord still you know one problem with the alutha discord is it’s been going on for so long that it’s like it’s very inside baseball it’s hard to it’s quite Hard to get started yeah carper ai oh looks i think it’s all open that’s more accessible yeah there’s also just recently uh naus research that does like the Hermes models and data set Just opened. They’ve got some private channels, but it’s pretty open, I think. You mentioned Alignment Lab. That one, it’s all the interesting stuff is on private channels. So just ask. If you know me, ask me because I’ve got admin on that one. There’s also OS Skunkworks. OS Skunkworks AI. There’s a good Discord, which I think it’s open so they’re yeah they’re all pretty good i don’t want you to leak any discords that don’t want any publicity but no i mean we all like we all Want people we just want people who like want to build stuff exactly yeah and like it’s fine to not know anything as well but if you don’t know anything but you want to tell everybody else What to do and how to do it that’s annoying If you don’t know anything and want to be told, like, here’s a really small kind of task that as somebody who doesn’t know anything, it’s going To take you a really long time to do, but it would still be helpful. And then you go and do it. That would be great. The truth is maybe 5% of people who come in with great enthusiasm and saying that they want to learn and they’ll do anything.
Time 0:41:51
2023-10-24
Expanding Fast AI’s Goals: Making Coding More Accessible Fast AI’s goals can be expanded to make coding more accessible, eliminating the need for prerequisite experience. Open AI seems to have some knowledge but still lacks understanding in training models, fine-tuning, and other areas. There is much uncertainty regarding the capabilities and limitations of AI models.
Jeremy Howard
And whatever. So, you know, yeah, what does it look like to like really grab this opportunity? Maybe fast AI’s goals can be dramatically expanded now to being like, let’s make coding more accessible, you know, or kind of AI oriented coding more accessible. If so, our course should probably look very different, you know, and we’d have to throw away that like, oh, you have to have at least a year of full-time programming as a prerequisite. Yeah. What would happen if we got rid of that? So that’s kind of one thought that’s in my head, you know, as to what should other people do? Honestly, I don’t think anybody has any idea. Like the more I look at it, what’s going on? I know I don’t, you know, like we don’t really know how to do anything very well. Clearly open AI do, like they seem to be quite good at some things. So they’re talking to folks at or who have recently left OpenAI. Even there, it’s clear there’s a lot of stuff they haven’t really figured out. And they’re just kind of like using recipes that they’ve noticed have been okay. So yeah, we don’t really know how to train these models well. We don’t know how to fine tune them well. We don’t know how to do RAG well. We don’t know what they can do. We don’t know what they can’t do. We don’t know how big a model you need to solve different kinds of problems. We don’t know what kind of problems they can’t do. We don’t know what good prompting strategies are for particular problems. You know,
Time 0:57:52
2023-10-26
The Unknown Capabilities of GPT-4: A Closer Look at Its Potential We have limited knowledge on how to properly train and fine-tune models, utilize them in diverse tasks, determine their limitations, or optimize prompting strategies. However, someone shared a Python code of 6000 lines for a GPT-4 prompting strategy that achieved an Elo of 3400 when playing chess against top chess engines, challenging the belief that GPT-4 was incapable of playing chess. This highlights the uncertainty surrounding the capabilities of these models. It feels like the early days of computer vision in 2013, where we had yet to discover the true potential of techniques like AlexNet and VGGNet.
Jeremy Howard
So yeah, we don’t really know how to train these models well. We don’t know how to fine tune them well. We don’t know how to do RAG well. We don’t know what they can do. We don’t know what they can’t do. We don’t know how big a model you need to solve different kinds of problems. We don’t know what kind of problems they can’t do. We don’t know what good prompting strategies are for particular problems. You know, somebody sent me a message the other day saying they’ve written something that is a prompting strategy for GPT-4. They’ve written like 6,000 lines of Python code, and it’s to help it play chess. And then they’ve said they’ve had it play against other chess engines, including the best stock fish engines. And it’s got an ELO of 3,400. Oh my God. Which would make it close to the best chess engine in existence and i think this is a good example of like people were saying like gpt4 can’t play chess i was sure that was wrong i mean obviously It can play chess but the difference between like with no prompting strategy it can’t even make legal moves with good prompting strategies it might be just about the best chess engine In the world far better than any human player so yeah i mean we don’t really know what the capabilities are yet so i feel like it’s all blue sky at this point it feels like computer vision In 2013 to me which was like in 2013 computer vision we just said the alex net we’ve had alex net we’ve had vgg net it’s around the time xyla and fergus like no it’s probably before that so We hadn’t yet had the xyla and Fergus like, oh, this is actually what’s going on inside the layers. So, you know,
Time 0:58:50
2023-10-26
The Importance of Recognizing the Diverse Capabilities of People and Embracing Technology There is an important message to remember: the world is full of diverse people with valuable experiences and capabilities. We now have powerful technology that can be seen as either scary or as an opportunity for people to improve humanity. It has always been a battle between those who want to control the power and those who believe in the potential of humanity as a whole.
Jeremy Howard
Like that. Awesome.
Alessio Fanelli
And yeah, before wrapping, what’s a one message one idea you want everyone to remember and think about you know i guess the main thing i want everybody to remember is that you know there’s
Jeremy Howard
A lot of people in the world and they have a lot of you know diverse experiences and capabilities you know they all matter and now that we have a nearly powerful technology in our lives We could think of that one of two ways. One would be, gee, that’s really scary. What would happen if all of these people in the world had access to this technology? Some of them might be bad people. Let’s make sure they can’t have it. Or one might be, wow, of all those people in the world, I bet a lot of them could really improve the lives of a lot of humanity if they had this tool. This has always been the case, you know, from the invention of writing to the invention of the printing press to the development of education. And it’s been a constant battle between people who think that the distributed power is unsafe, and it should be held on to by an elite few, and people who think that humanity on net is a Marvelous species, particularly when part of a society and a civilization, and
Time 1:06:10
2023-10-26
Episode AI notes

PhD programs didn’t focus on practical skills like working with vision, data tables, recommendation systems, and text. After 30 years, there is finally a chance to make it happen.
The focus in machine learning shifted from zero shot and few shot learning to transfer learning using the ULM Fit approach, fine tuning on an instruction corpus, and RLHF fast classification.
Join active and engaging Discord communities like Luther Discord, Carper AI, Alignment Lab, and OS Skunk Works to build and learn from others.
Fast AI’s goals can be expanded to make coding more accessible, eliminating prerequisite experience and addressing uncertainties in training models and fine-tuning.
There is limited knowledge about the capabilities and limitations of GPT-4, but it has shown promising performance in playing chess.
Recognize the diverse capabilities of people and embrace technology as an opportunity to improve humanity. Time 0:00:00
2023-10-28

Cover

AuthorLatent Space: The AI Engineer Podcast

TypePodcast

Listen to episode(share.snipd.com)

Graph View

The End of Finetuning — With Jeremy Howard of Fast.ai
Highlights

Created with Quartz v4.5.2 © 2026

about
so, what's enzyme?