• Episode AI notes
  1. Training models from scratch on large amounts of code yields similar results as fine-tuned models.
  2. Language learning benefits are helpful for coding.
  3. Copilot requires more data than code to create a useful model.
  4. Scaling and regularization techniques were unsuccessful in training a larger model.
  5. Testing models with prompts show limitations.
  6. There were no transfer benefits for the final codecs model trained on 100 billion tokens of Python code.
  7. Benefits from language or learning language are helpful with code.
  8. Dealing with much less data than code in CAD makes training a useful model challenging.
  9. Scaling and regularization techniques were unsuccessful in training a useful model with limited CAD data.
  10. There is no transfer when testing models. Time 0:00:00

  • The Challenges of Training a Useful Model for Code Generation Key takeaways:
  • There were no transfer benefits for the final codecs model trained on 100 billion tokens of Python code.
  • Benefits from language or learning language are helpful with code.
  • Dealing with much less data than code in CAD makes training a useful model challenging.
  • Scaling and regularization techniques were unsuccessful in training a useful model with limited CAD data.
  • There is no transfer when testing models.

    Speaker 1
    But then for like the final codecs model, it turns out that there were no transfer benefits, meaning you just took a model. You trained it from scratch on those 100 billion tokens of code of Python code. It would do just as well as the GBP3 12 billion model. That was fine tuned. The issue was that it was only true for GBP3 and 100 billion tokens of Python code. These days, I mean, the jury’s still out on this, but it seems pretty clear that the benefits from language or learning language are quite helpful with code. I guess that kind of goes into the issues with CAD where one, you’re dealing with much less data than code. If you assume first off that 50 billion, 100 billion tokens is all you need, then maybe with like 10x less, you could get a pretty useful model. In reality, co-pilot today is powered by probably trillions of tokens of code as well as text. And when you’re dealing with, at most, from scraping every single bit of CAD data, you can find 10 billion tokens. It’s just not enough to train a useful model. We tried scaling and no matter what kinds of regularization techniques we used, we just couldn’t get it past a few billion parameters without overfitting. That was a big thing. And then the other is that there’s no transfer. If you try to test these models today and even with GBP4, there’s a prompt that I like to use, which is good for testing like 3.5 versus four. If you don’t know which ones behind the scenes. And even for sometimes struggles with it.