pollen

❯

❯

❯

RAG Is a Hack With Jerry Liu From LlamaIndex

RAG Is a Hack - With Jerry Liu From LlamaIndex

Nov 10, 20238 min read

The Evolution of AI Data • Data loaders were added to enhance retrieval querying. • Modularity and decoupling of storing state from queries were considered. • Complex interactions like chain of thought reasoning, routing, and agent loops were explored. • Inbound interest from VCs sparked the idea of starting the company. • Video data was initially proposed but deemed a crowded market. • The intersection of AI data and building practical applications was a focus. • The opportunity evolved into a larger one around December. • Official departure from the previous position was in early March.
Jerry Liu
We started adding like some data loaders saw an opportunity there started adding more stuff on the retrieval querying side right we still had like the core data structures but how do How do you actually make them more modular and kind of like decouple storing state from the types of like queries I could run on top of this a little bit, and then starting to get into more
Alessio Fanelli
Complex interactions like chain of thought reasoning, routing, and, you know, like agent loops. You and I spent a bunch of time earlier this year talking about Lama Hub, what that might become. You were still at Robust. When did you decide it was time to start the company and then start to think about what Lama Index is today? Yeah, I mean, probably December.
Jerry Liu
It was kind of interesting. I was getting some inbound from initial VCs. I was talking about this project. And then in the beginning, I was like, oh yeah, this is just like a design project, but what about my other idea on video data? And then I was trying to get their thoughts on that. And then everybody was just like, oh yeah, whatever. Like that part’s like a crowded market. And then it became clear that, you know, this was actually a pretty big opportunity. And like coincidentally, right, like this actually did relate to, like my interests have always been at the intersection of AI data and kind of like building practical applications. And it was clear that this was evolving into a much bigger opportunity than the previous idea was. So around December. And then I think I gave a pretty long notice, but I left officially like early March. What were
Time 0:10:22
2023-10-30
LlamaIndex started as oss but investors were interested in taking it further a an opportunity. pmf
Fine Tuning Embedding Models • It is important to consider various factors in retrieval, such as the chunking algorithm, metadata, and the embedding model itself. • Exploration and optimization of embedding models is an active area of research. • Fine-tuning the embedding model can improve performance, but may require re-indexing all documents. • Free and accessible fine-tuning processes can benefit developers, but there are complexities to consider in a production-grade data pipeline. • An alternative approach involves freezing document embeddings and training a linear or other transform on the query embeddings.
Jerry Liu
Just think it’s it’s not the only parameter because i think in the end if you think about everything that goes into retrieval the chunking algorithm how you define like metadata will Bias your embedding representations then there’s the actual embedding model itself which is something that you can try optimizing and then there’s like the retrieval are you going To just do top k are you going to do like hybrid search? Are you going to do auto retrieval? Like there’s a bunch of parameters. And so I do think it’s something everybody should try. I think by default, we use like open AI as embedding model. A lot of people these days use like sentence transformers because it’s, it’s just like free open source and you can actually optimize, directly optimize it. This is an active area of exploration. I do think one of our goals is it should ideally be relatively free for every developer to just run some fine tuning process over their data to squeeze out some more points in performance. And if it’s that relatively free and there’s no downsides, everybody should basically do it. There’s just some complexities, right, in terms of optimizing your embedding model, especially in a production grade data pipeline. If you actually fine tune the embedding model and the embedding space changes, you’re going to have to re-index all your documents. And for a lot of people, that’s not feasible. And so I think like Joe from Vespa on our webinars, like there’s this idea that depending on if you’re just using like document and query embeddings, you could keep the document embeddings Frozen and just train a linear transform on the query or any sort of transform on the query, right? So therefore
Time 0:40:17
2023-11-10
Optimizing embedding models for performance and efficiency • It should be relatively free for every developer to run some fine tuning process over their data for improved performance. • Optimizing the embedding model in a production grade data pipeline may require re-indexing documents. • A possible solution is to keep document embeddings frozen and train a transform on the query instead. • Trying different parameters can help optimize the retrieval process by adding bias to the embeddings. • The text exists in a latent space.
Jerry Liu
Is an active area of exploration. I do think one of our goals is it should ideally be relatively free for every developer to just run some fine tuning process over their data to squeeze out some more points in performance. And if it’s that relatively free and there’s no downsides, everybody should basically do it. There’s just some complexities, right, in terms of optimizing your embedding model, especially in a production grade data pipeline. If you actually fine tune the embedding model and the embedding space changes, you’re going to have to re-index all your documents. And for a lot of people, that’s not feasible. And so I think like Joe from Vespa on our webinars, like there’s this idea that depending on if you’re just using like document and query embeddings, you could keep the document embeddings Frozen and just train a linear transform on the query or any sort of transform on the query, right? So therefore it’s just a query side transformation instead of actually having to re-index all the document embeddings.
swyx
That’s pretty smart.
Jerry Liu
We weren’t able to get, like, huge performance gains there, but it does, like, improve performance a little bit, and that’s something that basically, you know, everybody should be Able to kick off. You can actually do that in Llama Index, too.
swyx
OpenAI has a cookbook on adding bias to the embeddings, too, right?
Jerry Liu
Yeah, there’s just, like, different parameters that you can try adding to try to like optimize the retrieval process. And the idea is just like, okay, by default, you have all this text. It kind of lives in some latent space, right? You should take a drink every time. But it lives in some latent space. But
Time 0:40:49
2023-11-10
Optimizing embedding models and retrieval process in a latent space • There are different parameters that can be added to optimize the retrieval process. • The latent space may not be optimized to retrieve the specific types of questions that users want to ask. • Shifting the embedding points and optimizing the embedding model can improve retrieval. • There are areas for improvement in terms of ranking and sunsetting stale data. • The retrieval space presents practical problems that need to be addressed.
Jerry Liu
Yeah, there’s just, like, different parameters that you can try adding to try to like optimize the retrieval process. And the idea is just like, okay, by default, you have all this text. It kind of lives in some latent space, right? You should take a drink every time. But it lives in some latent space. But like depending on the specific types of questions that the user might want to ask, the latent space might not be optimized to actually retrieve the relevant piece of context that The user want to ask. So can you shift the embedding points a little bit, right? And how do we do that, basically? That’s really a key question here. So optimizing the embedding model, even changing the way you chunk things, these all shift the embeddings. So the retrieval is interesting.
Alessio Fanelli
I got a bunch of startup pitches that are like, like, rag is cool, but there’s a lot of stuff in terms of ranking that could be better There’s a lot of stuff in terms of sunsetting data once It starts to become stale, that could be better. Are you going to move into that part too? So you have SEC Insights as one of your demos, and that’s a great example of, hey, I don’t want to embed all the historical documents because a lot of them are outdated, and I don’t want Them to be in the context. What’s that problem space like? How much of it are you going to also help with versus how much you expect others to take care of?
Jerry Liu
Yeah, I’m happy to talk about SEC insights in just a bit. I think more broadly about the overall retrieval space, we’re very interested in it because a lot of these are very practical problems that people have asked us. And so
Time 0:41:49
2023-11-10

Cover

AuthorLatent Space: The AI Engineer Podcast

TypePodcast

Listen to episode(share.snipd.com)

Graph View

RAG Is a Hack - With Jerry Liu From LlamaIndex
Highlights

Created with Quartz v4.5.2 © 2026

about
so, what's enzyme?