Reflections on Competition in the AI Space The speaker introduces the concept of the ‘four wars’ as a framework to analyze the significant events in the AI space. These wars include the data wars, GPU rich-poor war, multi-modal war, and Reagan ops war. The discussion revolves around Inflection, a well-known contender in the AI domain that recently faced significant changes with most of its team moving to Microsoft and a new CEO being appointed. This shift raises questions about the harsh reality of competition in the AI field and the importance of resources like talent and compute power.
swyx
Yeah, so maybe I’ll take this one. So The Four Wars is a framework that I came up around trying to recap all of 2023. I tried to write sort of monthly recap pieces. And I was trying to figure out what makes one piece of news last longer than another or more significant than another. And I think it’s basically always around battlegrounds. Wars are fought around limited resources. And I think probably the most limited resource is talent, but the talent expresses itself in a number of areas. And so I kind of focus on those areas first. So the four wars that we cover are the data wars, the GPU rich poor war, the multimodal war, and the rag and ops war. And I think you actually did a dedicated episode to that. So thanks for covering that.
Nathaniel Whittemore
Yeah. Not only did I do a dedicated episode, I actually use that. I can’t remember if I told you guys. I did give you big shout outs, but I used it as a framework for a presentation at Intel’s big AI event that they hold each where they have all their folks who are working on AI internally. And it totally resonated. That’s amazing. Yeah. So what got me thinking about it again is specifically this inflection news that we recently had. Basically, I can’t imagine that anyone who’s listening wouldn’t have thought about it. But inflection is one of the big contenders, right? I think probably most folks would have put them, you know, just a half step behind the anthropics and open AIs of the world in terms of labs. But it’s a company that raised 1.3 billion last year, less than a year ago. Reid Hoffman’s a co-founder. Mustafa Sullyman, who’s a co-founder of DeepMind, you know, so it’s like this is not a small startup, let’s say, at least in terms of perception. And then we get the news that basically most of the team, it appears, is heading over to Microsoft and they’re bringing in a new CEO. And, you know, I’m interested in kind of your take on how much that reflects the cold aside, I guess, you know, all the other things that might be about, how much it reflects this sort of The stark, brutal reality of competing in the frontier model space right now and just the access to compute. ThereThe Battle of Modality Models in AI Development The battle in AI development revolves around the competition between large multi modality companies and small dedicated modality companies. The trend is shifting towards the large companies, as seen in instances like Sora’s success in video generation. Having multiple state-of-the-art models under one roof brings synergy and benefits, like the case of Sora and Dolly. This approach allows for cross-modality enhancements and synthetic data improvements. Startups focusing on a single modality face challenges in keeping up with the advancements. Despite this, each company carves out its niche, like Suno AI in the music domain, leading to broader user engagement and interest beyond the target audience. The recommendation is to explore the Sora and Dolly blog posts to understand the key methodologies and advantages of having multiple models collaborating in one ecosystem, which is a limitation for dedicated modality companies.
Nathaniel Whittemore
We wandered very naturally into sort of another one of these wars, which is the multimodality kind of idea, which is basically a question of whether it’s going to be these sort of big Everything models that end up winning or whether, you know, you’re going to have really specific things, you know, like something, you know, Dolly 3 inside of sort of OpenAI’s larger Models versus, you know, a mid-journey or something like that. I was kind of thinking like for most of the last, call it six months or whatever, it feels pretty definitively both and in some ways, you know, and that you’re seeing just like great innovation On sort of the everything models. But you’re also seeing lots and lots happen at sort of the level of kind of individual use cases. But then Sora comes along and just like obliterates what I think anyone thought, you know, where we were when it comes to video generation. So how are you guys thinking about this particular battle or war at the moment?
swyx
Yeah, this was definitely a both-and story and Sora tipped things one way for me in terms of scale being all you need. And benefit i think um of having multiple models being developed under one roof i think a lot of people aren’t aware that sora was developed in a similar fashion to dolly 3 and dolly 3 had A very interesting paper out where they talked about how they sort of bootstrapped their synthetic data based on gpt44 vision and GPT-4. And it was just all like really interesting. Like if you work on one modality, it enables you to work on other modalities. And all that is more beneficial if it’s all in the same house. Whereas the individual startups who don’t, who sort of carve out a single modality and work on that definitely, you know, won’t have the state of the-art stuff on helping them out on Synthetic data. So I do think the balance is tilted a little bit towards the God model companies, which is challenging for the dedicated modality companies. But everyone’s carving out different niches. We just interviewed Suno AI, the music, you know, I don’t see OpenAI pursuing music anytime soon.
Nathaniel Whittemore
Yeah, Suno has been phenomenal to play with. Suno has done that rare thing where, which I think a number of different AI product categories have done, where people who don’t consider themselves particularly interested in doing The thing that the AI enables find themselves doing a lot more of that thing, right? Like it’d be one thing if just musicians were excited about Suno and using it, but what you’re seeing is tons of people who just like music all of a sudden like playing around with it and Finding themselves kind of down that rabbit hole, which I think is kind of like the highest compliment that you can give one of these startups at the early days of it. Yeah.
swyx
I asked them directly in the interview about whether they consider themselves mid-journey for music. And he had a more sort of nuanced response there. But I think that probably the business model is going to be very similar because he’s focused on the B2C element of that. Multi-modality companies versus small dedicated modality companies. Yeah, I highly recommend people to read the Sora blog posts and then read through to the Dali blog posts because they strongly correlated themselves with the same synthetic data bootstrapping Methods as Dali. And I think once you make those connections, you’re like, oh, it is beneficial to have multiple state-of models in-house that all help each other. And that’s the one thing that a dedicated modality company cannot do. SoFocus on Specificity for Success Companies are increasingly favoring vertical agents tailored for specific domains over generic, bottom-shelf products. The challenge with overly broad applications is that they struggle to achieve reliability and true effectiveness. Entrepreneurs are now prioritizing specialized use cases, as this approach appeals more to investors who seek clear, actionable solutions rather than abstract generic functionalities. As a result, successful ventures are emerging in niches such as financial research, security, compliance, and legal, where targeted applications resonate more with user needs and market demands.
Alessio Fanelli
And David, one from Adept, you know, in our episode, he specifically said we don’t want to do a bottom-up product. You know, we don’t want something that everybody can just use and try because it’s really hard to get it to be reliable. So we’re seeing a lot of companies doing vertical agents that are narrow for a specific domain, and they’re very good at something. Mike Conover, who was at Databricks before, is also a friend of Latent Space. He’s doing this new company called Brightwave, doing AI agents for financial research, and that’s it. And they’re doing very well. There are other companies doing it in security doing it in um compliance doing it in legal all of these things that like people nobody just wakes up and say oh i i cannot wait to go on auto Gpd and ask it to do a compliance review of my thing you know just not what inspires people so i think the gap on the developer side has been the more bottom-side hacker mentality is trying To build this like very generic agents that can do a lot of open-ended tasks. And then the more business side of things is like, hey, if I want to raise my next round, I cannot just like sit around and mess around with like super generic stuff. I need to find a use case that really works. And I think that that is worth for a lot of folks. In parallel, you have a lot of companies doing evals. There areEmbrace the Evolution of Diffusion Technology The ongoing advancements in diffusion technology are reshaping the landscape of AI-generated art and text. With pioneers like Bill Pebels leading the way, innovations such as stable diffusion, glass diffusion, and SDXL turbo are enhancing efficiency, reducing costs, and simplifying the creation process. Individuals who believe that generating stable diffusion art is challenging or expensive lack awareness of the latest models. Additionally, the potential of text diffusion presents a groundbreaking approach, allowing for the generation of entire text segments through diffusion models rather than traditional token-by-token methods.
swyx
Guy who wrote the diffusion transformer paper, Bill Pebbles, is the lead tech guy on Sora. So you’ll just see a lot more diffusion transformer stuff going on. But there’s more sort of experimentation with diffusion. I’m holding a meetup actually here in San Francisco that’s going to be like the state of diffusion, which I’m pretty excited about. Stability is doing a lot of good work. And if you look at the architecture of how they’re creating Stable Diffusion 3, Hourglass Diffusion, and late consistency models, SDXL Turbo. All of these are very, very interesting innovations on the original idea of what Stable Diffusion was. So if you think that it is expensive to create or slow to create Stable Diffusion or AI-generated art, you are not up to date with the latest models. If you think it is hard to create text and images, you are not up to date with the latest models. And people still are kind of far behind. The last piece of which is the wildcard I always kind of hold out, which is text diffusion. So instead of using autogenerative or autoregressive transformers, can you use text to diffuse? So you can use diffusion models to diffuse and create entire chunks of text all at once instead of token by token.Progress is Perpetually Five Years Away Advancements in technology, particularly in AI and autonomy, often take longer than anticipated, resembling the delayed rollout of self-driving cars. The concept of ‘levels of autonomy’ serves as a useful framework for understanding this progression, indicating that initial expectations may not align with real-world implementation. This highlights the importance of recognition that technological breakthroughs may face substantial practical challenges, causing anticipated timelines to stretch indefinitely. Observing the gradual evolution of self-driving capabilities reinforces the notion that achieving practical autonomy demands sustained effort and patience.
Milind Naphade
Long do you think it is till we get to early versions? This is my equivalent of AGI timelines. I know, I know.
swyx
You set yourself up for this. Lots of active, I mean, I have supported companies actively working on that. I think it’s more useful to think about levels of autonomy. And so my answer to that is perpetually five years away until it figures it out. No, but my actual anecdote, the closest comparison we have to that is self-driving. We’re doing this in San Francisco. For those who are watching the live stream, if you haven’t come to San Francisco and taken a Waymo ride, just come, get a friend, take a Waymo ride. I remember 2014, we covered a little bit of autos in my hedge fund. And I remember telling a friend I was like self-driving cars around the corner like this is it like you know parking will be like parking will be a thing of the past and it didn’t happen For the next 10 years and but now we know like most of us in San Francisco can can take it for granted so I think like you just have to be mindful that the the the rough edges take a long time And yes, it’s going to work in demos. Then it’s going to work a little bit further out. And it’s just going to take a long time. The more useful mental model I have is sort of levels of autonomy. So in self-driving, you have level one, two, three, four, five, just the amount of human attention that you get. At first, your hands are always on 10 and 2, and you have to pay attention to the driving every 30
