pollen

❯

❯

❯

RAG Is a Hack With Jerry Liu From LlamaIndex 3

RAG Is a Hack - With Jerry Liu From LlamaIndex-3

Nov 10, 20238 min read

The Evolution of AI Data Key takeaways: • Data loaders were added to enhance retrieval querying. • Modularity and decoupling of storing state from queries were considered. • Complex interactions like chain of thought reasoning, routing, and agent loops were explored. • Inbound interest from VCs sparked the idea of starting the company. • Video data was initially proposed but deemed a crowded market. • The intersection of AI data and building practical applications was a focus. • The opportunity evolved into a larger one around December. • Official departure from the previous position was in early March.
Speaker 1
And so we started adding like some data loaders, saw an opportunity there, started adding more stuff on the retrieval querying side, right, we still have like the core data structures, But how do you actually make them more modular and kind of like decouple storing state from the types of like queries that could run on top of this a little bit, and then starting to get Into more complex interactions like chain of thought reasoning, routing, and you know, like agent loops, you and I spent a bunch of time earlier this year talking about lama hub, what
Speaker 3
That might be home. You were still a robust when did you decide it was time to start the company and then start to think about what my index is today?
Speaker 1
Yeah, I mean, probably December. It was kind of interesting. I was getting some inbound from initial VCs. I was talking about this project. And then in the beginning, I was like, oh, yeah, you know, this is just like a design project, but you know, what about by other idea on like video data, right?
Speaker 4
And I was trying to like, yeah, yeah, their thoughts on that.
Speaker 1
And then everybody was just like, ah, yeah, whatever, like that part’s like a crowded market. And then it became clear that, you know, this was actually a pretty big opportunity. And like, coincidentally, right, like this actually did relate to like my interests have always been at the intersection of AI data and kind of like building practical applications. And it was clear that this was evolving into a much bigger opportunity than the previous idea was around December. And then I think I gave a pretty long notice, but I left officially like early March.
Time 0:10:22
2023-10-30
LlamaIndex started as oss but investors were interested in taking it further a an opportunity. pmf
Fine Tuning Embedding Models Key takeaways: • It is important to consider various factors in retrieval, such as the chunking algorithm, metadata, and the embedding model itself. • Exploration and optimization of embedding models is an active area of research. • Fine-tuning the embedding model can improve performance, but may require re-indexing all documents. • Free and accessible fine-tuning processes can benefit developers, but there are complexities to consider in a production-grade data pipeline. • An alternative approach involves freezing document embeddings and training a linear or other transform on the query embeddings.
Speaker 1
I just think it’s it’s not the only parameter because I think in the end, if you think about everything that goes into retrieval, the chunking algorithm, how you define like metadata Will bias your embedding representations, then there is the actual embedding model itself, which is something that you can try optimizing. And then there’s like the retrieval are you going to just do top care, you’re going to do like hybrid search or you’re going to do auto retrieval. Like there’s a bunch of parameters. And so I do think it’s something everybody should try. I think by default, we use like open AI is embedding model. A lot of people these days use like sentence transformers because it’s it’s just like free open source and you can actually optimize directly optimize it. This is an active area of exploration. I do think one of our goals is it should ideally be relatively free for every developer to just run some fine tuning process over their data to squeeze out some more points in performance. And if it’s that relatively free and there’s no downsides, everybody should basically do it. There’s just some complexities, right? In terms of optimizing your embedding model, especially in a production grade data pipeline, if you actually fine tune the embedding model and the embedding space changes, you’re Going to have to re-index all your documents. And for a lot of people, that’s not feasible. And so I think like Joe from Vespa on our webinar is like, there’s this idea that depending on if you’re just using like document and query embeddings, you could keep the document embeddings Frozen and just train a linear transform on the query or or any sort of transform on the query.
Time 0:40:17
2023-11-10
Optimizing embedding models for performance and efficiency Key takeaways: • It should be relatively free for every developer to run some fine tuning process over their data for improved performance. • Optimizing the embedding model in a production grade data pipeline may require re-indexing documents. • A possible solution is to keep document embeddings frozen and train a transform on the query instead. • Trying different parameters can help optimize the retrieval process by adding bias to the embeddings. • The text exists in a latent space.
Speaker 1
This is an active area of exploration. I do think one of our goals is it should ideally be relatively free for every developer to just run some fine tuning process over their data to squeeze out some more points in performance. And if it’s that relatively free and there’s no downsides, everybody should basically do it. There’s just some complexities, right? In terms of optimizing your embedding model, especially in a production grade data pipeline, if you actually fine tune the embedding model and the embedding space changes, you’re Going to have to re-index all your documents. And for a lot of people, that’s not feasible. And so I think like Joe from Vespa on our webinar is like, there’s this idea that depending on if you’re just using like document and query embeddings, you could keep the document embeddings Frozen and just train a linear transform on the query or or any sort of transform on the query. Right. So therefore, it’s just a query side transformation instead of actually having to re-index all the document embeddings. That’s pretty smart. We weren’t able to get like huge performance gains there, but it does like improve performance a little bit. And that’s something that basically, you know, everybody should be able to kick off. You can actually do that on lama and docs too.
Speaker 2
Open AI is a cookbook on adding bias to the embeddings too, right?
Speaker 1
Yeah, there’s just like different parameters that you can try adding to try to like optimize the retrieval process. And the idea is just like, okay, by default, you have all this text. All right. It kind of lives in some latent space, right? Yeah. No, no, no, no, latent space. You should take a drink every time. But it lives in some latent space.
Time 0:40:49
2023-11-10
Optimizing embedding models and retrieval process in a latent space Key takeaways: • There are different parameters that can be added to optimize the retrieval process. • The latent space may not be optimized to retrieve the specific types of questions that users want to ask. • Shifting the embedding points and optimizing the embedding model can improve retrieval. • There are areas for improvement in terms of ranking and sunsetting stale data. • The retrieval space presents practical problems that need to be addressed.
Speaker 1
Yeah, there’s just like different parameters that you can try adding to try to like optimize the retrieval process. And the idea is just like, okay, by default, you have all this text. All right. It kind of lives in some latent space, right? Yeah. No, no, no, no, latent space. You should take a drink every time. But it lives in some latent space. But like depending on the type, specific types of questions that the user might want to ask, the latent space might not be optimized to actually retrieve the rawl and piece of context That the user want to ask. So can you shift the embedding points a little bit, right? And how do we do that? Basically, that’s really a key question here. So optimizing the embedding model, even changing the way you like chunk things, these all shift the embeddings.
Speaker 3
So the retrieval is interesting. I got a bunch of startup pitches that are like, like, rag school, but like, there’s sort of a lot of stuff in terms of ranking that could be better. There’s a lot of stuff in terms of sun setting data. Once it starts to become stale, that could be better. Are you going to move into that part too? So like you have SEC insights as one of kind of like your demos. And that’s like a great example of, hey, I don’t want to embed all these historical documents because a lot of them are outdated and I don’t want them to be in the context. What’s that problem space like? How much of it are you going to also help with and versus how much you expect others to take care of? Yeah.
Speaker 1
I’m having to talk about SEC insights in just a bit. I think more broadly about the like overall retrieval space, we’re very interested in it because a lot of these are very practical problems that people have access.
Time 0:41:49
2023-11-10

Cover

AuthorLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, …

TypePodcast

Listen to episode(share.snipd.com)

Graph View

RAG Is a Hack - With Jerry Liu From LlamaIndex
Highlights

Created with Quartz v4.5.2 © 2026

about
so, what's enzyme?