Epoch AI, a nonprofit focused on creating mathematical benchmarks for AI, has come under criticism for not disclosing funding from OpenAI until December 20.
This revelation, made through a public post, detailed that OpenAI supported the development of FrontierMath, a benchmark designed to assess an AI’s mathematical problem-solving capabilities.
OpenAI used FrontierMath in demonstrating its upcoming flagship AI, o3. Contributors to the project were reportedly unaware of this partnership, which has sparked allegations of non-transparency.
A contractor for Epoch AI, posting under the username “Meemi” on the LessWrong forum, expressed concerns that the lack of transparency left contributors uninformed about their work’s association with OpenAI.
Critics fear this could undermine the credibility of FrontierMath as an unbiased benchmark. Furthermore, Stanford mathematics PhD student Carina Hong alleged that OpenAI gained exclusive access to the benchmark, a point that unsettled several contributing mathematicians.
Epoch AI’s associate director, Tamay Besiroglu, acknowledged mistakes in communication, explaining that contractual restrictions delayed the disclosure.
He admitted that Epoch AI should have ensured transparency with contributors from the outset, emphasizing that mathematicians deserved clarity about how their work would be used.
Besiroglu reassured the community that OpenAI has verbally agreed not to train its AI on FrontierMath’s problems and that a separate holdout set ensures independent verification of results.
Despite these assurances, Epoch AI lead mathematician Ellot Glazer stated that the organization has yet to independently verify OpenAI’s reported results for o3. While Glazer expressed confidence in OpenAI’s adherence to agreed terms, he noted that final verification is still pending.
This controversy highlights the ongoing challenges in developing AI benchmarks while ensuring transparency and avoiding conflicts of interest. It underscores the need for clearer communication and safeguards to maintain trust in the AI research community.