A new AI coding competition has delivered sobering results about the state of AI programming skills. The nonprofit Laude Institute announced the first winner of the K Prize, a challenge created by Databricks and Perplexity co-founder Andy Konwinski. Brazilian prompt engineer Eduardo Rocha de Andrade took home the $50,000 prize but won with correct answers to only 7.5% of test questions, highlighting how far AI still has to go in tackling real-world software problems.
Konwinski, who launched the challenge to create a tougher benchmark for AI, noted that the K Prize is designed to be contamination-free, unlike existing tests such as SWE-Bench, which allows models to train on known problem sets. To prevent prior exposure, the K Prize uses a timed entry system and issues from GitHub flagged only after submission deadlines. “We’re glad we built a benchmark that is actually hard,” Konwinski said, emphasizing the goal of leveling the playing field for smaller, open models.
The current gap is striking: SWE-Bench’s “Verified” test has seen top scores of 75%, while its harder “Full” version peaks at 34%. The K Prize’s much lower 7.5% result suggests that real-world programming tasks may be far more challenging when prior knowledge is removed. Konwinski has pledged $1 million to the first open-source model capable of scoring over 90%.
Experts say this new challenge is critical to addressing AI benchmark contamination, where models gain unfair advantages through exposure to training data. Princeton researcher Sayash Kapoor called the effort important for evaluating AI fairly, noting that existing leaderboards may not accurately reflect model performance.
For Konwinski, the takeaway is clear: the hype around AI replacing skilled professionals like software engineers is premature. “If we can’t even get more than 10% on a contamination-free SWE-Bench, that’s the reality check for me,” he said, expecting future rounds of the challenge to push developers to adapt and improve AI coding performance.