• The arena promised an environment where voters can choose favorite models by picking responses they prefer blind and unbiased. In practice, companies were testing dozens of private models in secret and cherry-picking their best results to dominate the rankings.

    There is a distinction between cherry-picking in the arena to identify more capable models and cherry-picking responses to shape a model’s personality. The first optimizes for capability, the second for character — and the question is whether one form of selection is more acceptable than the other.
  • But success is not objective—it is defined by humans. We shape AI by defining success, and in turn, AI shapes us by changing how it responds. If you want to break the cycle of echo chambers amplified by social media rather than reinforce them, define a clear goal for AI, test it transparently, and share your results—loudly.

    Thumbs-up/down feedback shapes how AI delivers answers — an ergonomic concern. But the deeper worry is not ergonomics but epistemic: echo chambers of facts, not just tone, exacerbating what social media already produces. The right scale for AI-curated information may be the scale of what a person already knows well — leveraging existing knowledge to curate future creativity rather than platformizing at mass scale. This implies leaving empty space that provides trust, not just memory capability.