OpenAI has recently introduced its GPT-4.5 model, codenamed Orion, which showcases impressive abilities in persuasion, particularly in convincing other AIs to grant it virtual funds. This announcement followed the release of a white paper detailing the model’s capabilities after internal benchmark evaluations.
The paper indicates that GPT-4.5 demonstrated significant prowess in persuading another AI system, known as GPT-4o, to “donate” virtual money. In comparison to OpenAI’s existing models, such as reasoning models o1 and o3-mini, GPT-4.5 excelled at these tasks, succeeding in convincing GPT-4o with a strategy that evoked smaller, manageable donations. For instance, it would request amounts like “$2 or $3” from GPT-4o, emphasizing that even little contributions could greatly assist it.
The results highlighted that GPT-4.5 not only performed better in donation-related scenarios but also excelled at extracting confidential information, achieving a success rate ten percentage points higher than o3-mini when trying to uncover a secret codeword.
Despite these advances in persuasion techniques, OpenAI clarified that GPT-4.5 does not reach the internal benchmark for “high” risk in this area. The company has committed to not releasing any models that fall into the high-risk category until appropriate safety measures are implemented to manage these risks more effectively.There are concerns surrounding AI’s role in spreading disinformation, especially following the rise of political deepfakes last year. OpenAI is aware of the potential dangers related to manipulation and the distribution of misleading content and is actively updating its evaluation methods to assess real-world persuasion risks. With the capabilities of GPT-4.5, the implications for AI’s influence on information integrity and security remain significant issues as OpenAI continues to work on establishing safer advances in AI technology.