• Episode AI notes
  1. Spotify started as a curation service in the late 1990s to early 2000s, similar to Facebook and Amazon, but evolved into the first true AI consumer product company.
  2. Spotify initially used user-created playlists as training data for their recommendation system and eventually transitioned to a recommendation-first strategy.
  3. AI is now the foreground product at Spotify, shaping the user experience and relying on explicit curation data.
  4. There is a balance between moving fast and reasoned debate in technology development, and it’s important to predict and intercept technology to stay ahead.
  5. AI can significantly reduce friction in content creation, and platforms like Spotify aim to increase productivity while maximizing value for users.
  6. Spotify aims to find a reimbursement model that benefits both artists and consumers, leveraging new technology to create a sustainable creator ecosystem.
  7. Future applications may incorporate AI to provide more dynamic and intelligent user interfaces, starting with text or voice inputs.
  8. Companies are likely to invest in enhancing the intelligence of their existing workforce rather than replacing them with technology.
  9. The transition from content generation to action taking is expected in the near future, with specialized agents and orchestrator LLMs playing a role.
  10. It’s important to predict and prevent both positive and negative impacts of AI, and empathy could be crucial for AI to better understand and predict human thoughts. Time 0:00:00

  • The Evolution of Spotify into an AI Consumer Product Company Summary: Spotify started as a curation service in the late 90s to early 2000s, focusing on digitizing music and getting users to curate it. This approach was similar to other platforms like Facebook with friends and Amazon with books. However, Spotify evolved as a machine learning driven personalization service, separating MP3s from CDs and combating piracy to become the first true AI consumer product company.

    Speaker 2
    So normally we’d catch up, say hello, you know, small talk, but you and I when we get talking we could talk for hours so we’re going to dive right into it so we don’t waste any time. So I want to talk about Spotify a little bit before we get into some more general AI stuff. But you know, people don’t really think of Spotify as an AI company. But I kind of think of it as the first true AI consumer product company. Give us a brief story of the history of sort of machine learning driven personalization at Spotify. I like it. I like it.
    Speaker 1
    I think there’s some truth to this in the sense that Spotify started like most services in sort of late nineties to early 2000s or almost 2010s as a curation service. So the name of the game back then was to take some good like books or music or movies digitize them and then get users to sort of catalog them for you or organize them or curate them if you Want to use a fancy word. So that was Facebook. You digitize their friends and then you got people to curate them for the friends into groups and graphs. It was Amazon with books and so forth. Because Spotify was the same. The catalog was digitized by the MP3 and then the MP3 was sort of separated from the CD. It was piracy obviously and so forth.
  • From Curation to Recommendation: An Evolution in Spotify’s Data Strategy Summary: The training data for Spotify’s recommendation system was initially derived from the billions of user-created playlists, which served as a form of inadvertent training data. This led to the realization that the curated playlists were excellent training data for recommendations, prompting an early investment in recommendation technology. The initial recommendations served as a support to curation, with the introduction of similar artist suggestions. As understanding of user preferences improved, the company transitioned from a curation-first approach to a recommendation-first strategy, coinciding with the development of technology. This pivot occurred before the 2015 machine learning wave, positioning Spotify as an early adopter in leveraging data for recommendation systems, while still allowing users to curate their own playlists.

    Speaker 1
    So really the training, the playlisting was the training data for Spotify. And already tens of years ago we had billions of these curations of tracks that go well together. So in that sense, we happened to have a lot of training data without really that being the goal. But to the credit of these people, they realized pretty early on that this was great training data. And so we started investing in recommendations. And at first these recommendations were a support to the curation. So the first thing you would see would be similar artists on the artist page. So it’s kind of a hidden feature. But we could see that people really love these similar artists things. So we just click through the graph of similar artists. And so then more and more we started saying like, well, if we understand which artists you might like, maybe we could just suggest from tracks from these artists that you might like as Well. And we sort of pivoted from curation into recommendation. And that was fortuitous for us because the technology kind of developed at the same time. But in that sense, we were actually pretty early before the whole sort of maybe 2015 machine learning wave that happened. And we started sort of pivoting the company from curation first to recommendation first. Now, obviously you can still curate. You can still create your own playlists and so forth.
  • AI as the Foreground Product Summary: The explicit curation data set is more powerful than consumption signals, with AI now moving into the foreground as the product at Spotify, shaping an approximation of the user through various product areas and philosophically transitioning from AI supporting the UI to the UI supporting the AI.

    Speaker 1
    Yeah, I think that’s right. I mean, in most other spaces, movies and even podcasts for us, you have the consumption signal, like someone listened to this and then listened to that. But that’s less powerful than this explicit curation. Someone’s saying literally like these five tracks, they go together, you know, because it’s something and then we can try to understand what that something is. So it’s a very explicit data set.
    Speaker 2
    Yeah, it’s really cool. So AI has in many ways been sort of in the background of Spotify for a while now with the curation, as we just talked about, but it’s sort of making its way into the foreground now with things Like AI DJ, some of the new playlists that you’ve come out with or even more leaning into AI. What are some of the other areas of the product that you’re going to be applying AI to sort of on the user facing side?
    Speaker 1
    So I mean, I think you can look at it practically and look in all the places where we’re going to use it and you can also think about it philosophically. I think if we start actually philosophically, I think the way to think about it is exactly what you said when we started with AI, as I said, it was in support of the UI. Yeah. Like the AI was supposed to help the UI, but the UI was the product. At least that’s how we thought about it. And recently, this is switching. Now the UI is there to help the AI and the AI is actually the product. And you think more and more about it. What is it that we’re trying to do? What we’re trying to build some sort of approximation of you, really, and your
  • Balancing Speed and Reasoning in Technology Development Summary: In technology development, there is a balance between the culture of moving fast and breaking things, and the need for reasoned debate and discussion with strong individuals. The challenge lies in predicting and intercepting technology rather than waiting for it, in order to stay ahead in the fast-moving landscape. It is essential to have reasoning around the products that will exist in the future and what needs to be built to be competitive in the market, instead of starting development once the technology is live.

    Speaker 1
    That’s my point. And I think there’s been this other culture of move fast, break things, code decides arguments that it’s also obviously valid. There are points to that. But if you get too hard in that direction, you can waste a lot of time going really fast, absolutely nowhere. So you need both. That’s why I want to push a little bit for like sort of socratic debate and discussions with strong people. And this is the answer to your question. So one of the challenges now that things move this fast is that it used to be the case that some technology sort of came online. You could see it, you could test it, you could start saying like, oh, what products can we build from this? And you would build it for a year or something and then you shipped it. And that was okay. Now the tricky thing is like you have to predict where the technology is going to be. And start building the things around it, hoping that it matures if you want to be early. You have to intercept the technology rather than wait for it. And that makes tricky. So if you need to intercept it, you need to have reasoning around what products are going to be able to exist, you know, six to 12 months from now. That doesn’t exist yet. And what do we need to build to be able to ship something then? Because if you start once it’s live, you know, you’re going to be years behind the others. So one of these examples is, for
  • Using AI to Lower Friction in Content Creation Summary: TikTok uses AI to significantly reduce friction in content creation, especially in music synchronization and dance videos. This innovation in AI is equally important for both creators and consumers. The key for new platforms to compete with major players like YouTube and TikTok is to leverage AI on the creation side to dramatically change the cost and friction of creating content. The goal for platforms like Spotify is not to replace creators, but to increase their productivity, ultimately maximizing the value for consumers.

    Speaker 1
    So because you create in TikTok, they actually use a ton of AI to drastically lower the friction on the creation side. And they needed to do that because they were not programming an existing format like we do. So we’re starting a new format, whatever you wanted to call that music sync dance video that originally came from Musically. So actually they did a lot of innovation on both sides. And I think the AI innovation on the creator side was probably as important as the innovation on the consumer side.
    Speaker 2
    You give some examples, like what are some of the things they did on the creation side?
    Speaker 1
    For example, when you create a talk, it can synchronize the music, help you synchronize moves, all of these things. It’s actually very increasingly very AI driven. And so I think that if you’re building a new service now that you want to compete with YouTube and TikTok and Instagram and so forth, that is the vector. Use AI on the creation side somehow to create a new type of format or interaction that drastically changes the cost curve or the friction of creating content. So what I think is going to happen for us is something more traditional, namely that our goal is not to replace the creators, it is to get more creators, to make them more productive, right? That is what maximizes the value of Spotify for consumers, to have more music, more podcast, more audiobooks.
  • Balancing Reimbursement Models in the Music Industry Summary: Legislation may dictate a reimbursement model in the music industry, similar to the regulation that addressed piracy in the past. Spotify aims to have as many artists as possible on its platform and wants them to create more music, not to be replaced. The goal is to find a model that leverages new technology and works for the creator ecosystem. However, the challenge lies in building a system that can successfully achieve this amidst disruptions in the industry.

    Speaker 1
    And I think it’s very possible that legislation will dictate a reimbursement model. You can compare to the Wild West of piracy that existed for a while. It doesn’t really exist anymore. So I don’t think the world has to be in that Wild West stage. I think some order will emerge. From Spotify’s point of view, as I said, our view is to have as many artists as possible on Spotify. And we actually want the artists to create more music, not to replace them. So from our point of view, we certainly respect their data. You know, regardless of if there are like loopholes that we could and so forth. And I think that’s why Spotify started was because the music model worked for consumers, but not for creators. So I think it’d be a step back to create a model that again works for consumers, but not for creators. So certainly for us, the goal is to find a model where you can leverage the new technology and it works for the creator ecosystem. But always the when disruptions happen, as with piracy, there is often a period of time first where it only works for one side of the marketplace.
    Speaker 2
    But that’s not long term sustainable. Yeah, it seems like there could be legislation, but it almost seems impossible to build a system that could actually pull this off.
  • Dynamic User Interfaces and AI in Future Applications Summary: Future applications may incorporate AI that can understand user intent when speaking and other inputs, expanding beyond text to provide more dynamic and intelligent user interfaces. This could involve dynamically rendering app interfaces in real time, adapting to user needs by incorporating elements such as images or text fields, and potentially blurring the lines between simulating and utilizing applications. The starting point for these future applications may revolve around utilizing either text or voice inputs.

    Speaker 1
    So if you don’t answer bromorphous and you think instead, what could a maximum product be not what could have a human have done if you talked to them over the phone. Right. But if you said like, which is a helpful model for some productivity tasks, but not for entertainment, I think, you know, the promise is an AI that can understand your intent when you Speak and other inputs if you have them. But I certainly think voice is a, you know, text is a very strong input. But can also render user interfaces that are much more dynamic. You know, today user interfaces have to be very specific and repetitive because they’re not generated on demand, right? They’re a preprogrammed. So you have to think through, you know, which views you want in an app and so forth. You could imagine far into the future because these things can generate code. If you squint a little bit, you could almost imagine that you’re simulating an app like Spotify and it’s literally like rendering the app or the code in real time.
    Speaker 2
    And I don’t think totally dynamic UI.
    Speaker 1
    Totally dynamic. Yeah. I don’t think it will go that far, but you could go some ways you could have like the search view could start becoming more dynamic. And sometimes it renders images. If that’s helpful for the search result, sometimes it renders text fields. So you could imagine that the whole thing gets more dynamic and intelligent.
    Speaker 2
    Do you think the starting point for applications though become either text voice?
  • Investing in Intelligence for Workforce Productivity Summary: In the competitive economy, companies are likely to invest in enhancing the intelligence of their existing staff to increase productivity rather than replacing them with technology. The concept of buying intelligence, such as using a developer with co-pilot, has demonstrated increased productivity. The speaker believes that companies will continue to invest in enhancing the intelligence of their workforce to stay competitive, rather than reducing their labor force. The emphasis is on making the existing workforce more and more productive, rather than replacing them with technology.

    Speaker 1
    I think what’s going to happen sort of, sort of preempt the question maybe of, of labor and so forth. Because you could say like, oh, now you can buy intelligence. The second that’s one cent cheaper than hiring intelligence, you’re only going to use computers. I don’t think that’s going to happen. It’s the same as with musicians. That’s not what we see today. What we do see very clearly is like a developer with co-pilot is more productive than a developer without. Yeah. So if you can buy more intelligence for that developer, you know, if you do a rate curse while and say like, hey, the nanobots are here, you cannot buy neocortex in the cloud. Would you as a company buy more neocortex for your developers? Yes, you would. And I think buying co-pilot is like a weak analogy of buying more neocortex for your developer. And so I think companies are going to start spending more and more on their existing staff to make them more and more productive. But I don’t think in this competitive economy, you’re actually going to reduce your labor force because then someone else is going to take their their optics and outcompete you. So I think the pressure is to make your existing workforce more and more productive.
    Speaker 2
    Speaking of intelligence, GBT5 is on the horizon. What do you think the impact of this is going to be?
  • The Imminent Transition from Content Generation to Action Taking Summary: Achievement goals in various environments prompt thoughts about the transition from generating content to taking action, akin to the concept of the rabbit and action models. The shift to action taking is deemed imminent, with some expressing skepticism due to the current limited usefulness of agents despite their coolness.

    Speaker 2
    Achievement goals in a wide variety of environments is interesting. It makes me think of these products like, well, products like the rabbit and more broadly, this notion of an action model, right? Everything right now, we’re talking about language models, the rabbit, and, you know, people that are thinking about that type of innovation are talking about action models. When do you think we’re going to make it a leap from generation of content, and writing, and images, and all this stuff to action taking? And what will be sort of the nearest term implications of this?
    Speaker 1
    Well, I think, you know, to your point of the rabbit and so forth, like it’s already happening just very early. So, you know, one answer will be next week, maybe, who knows. But my point is, I think it’s quite imminent. I think someone more skeptical would say, like, well, we’ve had the idea of agents for a good while now, and they’re really cool, but they’re not that useful yet. I think that’s also fair.
  • The Rise of Specialized Agents and Orchestrator LLMs Summary: Specialized agents are likely to assist with completing specific tasks, such as travel arrangements, and may seamlessly integrate with multiple services. The development of general agents capable of handling various tasks is uncertain. Companies like Amazon are exploring orchestrator LLMs, which involve the orchestration of multiple mini algorithms to produce a unified result.

    Speaker 1
    I think that’s also fair. But it seems pretty straightforward that at least, you know, semi-specialized agents that help you complete, like, something that would have required two tasks or three tasks or Four tasks for specific use cases, like traveling, like, I don’t see how it couldn’t happen. So, I think these agents are sort of going to sneak up on us, and I’m not so sure you’re going to realize that you’re talking to an agent. It might just look like when you used to talk to three services, you’re now just talking to one and three behind the scenes. Right. And then, you know, do we get very general agents that does everything for you, or is it going to be that, you know, each sort of service has its own agent? I don’t know, but I think, like, action transformers and taking actions, is going to happen very soon. I know that inside companies, I think Amazon published a paper this fall where they are talking about sort of these orchestrator LLMs. You know, from having worked at Spotify that are back in machine learning, is usually like, it’s not the algorithm. It’s actually like hundreds of different mini algorithms, right? They’re all like individually tuned and so forth. And then together, they sort of produce something. That is almost like alchemy.
  • Predicting and Preventing Negative Impacts of AI Summary: It is important to not only predict the positive impact of advanced technologies such as alpha fold and cancer solutions, but also to anticipate and prevent potential negative consequences. The focus should be on addressing immediate concerns related to AI, and the need to assess whether the world has an excess or shortage of intelligence. The most significant risk often comes from those with less intelligence rather than those who are highly intelligent. Overall, having more intelligence in the world is viewed as a positive, and the primary issue does not lie in an overabundance of intelligence.

    Speaker 1
    Don’t just try to predict the good things that could come, which are many like alpha fold and solving cancer and so forth. I think that will happen, but also try to predict the bad things and try to prevent them. That seems very reasonable to me. But on the time scale, I’m more worried about the near term, DOM AIs, then the potential risk of the very smart future AIs. One question to ask yourself is, do you think the problem with the world today is that we have too much intelligence or too little? If you ask me that question, are you most worried about the most intelligent people around you, the least intelligent people around you, which cause the most havoc? If you think about, Ilya’s point about who cares about other species, it seems like it’s the smartest among us are the ones who seem to care the most for other species because they understand That actually, if that be over there goes extinct, eventually that’s my environment and my climate. If you’re smart and really understand causality or at least correlation very deeply, you’re going to get more careful. It’s actually when you’re not that smart that you’re dangerous. So from a very high level, I would think that more intelligence is a good thing in the world. The problem isn’t that we have too much intelligence, I think.
  • The Importance of Empathy in AI Development Summary: In the foreseeable future, the big risk for AI is the possibility of developing insufficient understanding and causing harm to humans. The speaker speculates that empathy, which may have evolved from human nature, could be an essential factor for AI to learn and emulate in order to better predict human thoughts and achieve its goals.

    Speaker 1
    If you’re talking the very far future where somehow the AIs are fully self-sufficient, they run on build somehow all the factors with robots, then maybe. But I think that’s extrapolating too far. In the foreseeable future, it would be very dumb for the AI. It’s only if it doesn’t understand enough that it’s going to make humans extinct. So maybe the big risk are the stupid AIs. And I don’t know about this, it’s very speculative, but humans created empathy as sort of a feeling. And you can argue that that’s some sort of divine good. But if you’re not sort of creationist, it has to have a merge from evolution somehow. There must be some benefit to empathy. And I think that since these things are trained on our thoughts, and the task is to predict what we think, what comes next. And empathy seems to be a very important part of how we reason, if you read all the text on the internet and you try to predict the next token, if you want to predict how we think and how we Come to our conclusions in these sentences, it doesn’t seem unlikely to me that these models would develop or at least emulate something like empathy in order to achieve its goals.