Technology

François Chollet Co-Founding Nonprofit to Develop AI Benchmarks

Published January 9, 2025

François Chollet, a notable AI researcher and former Google engineer, is launching a nonprofit organization aimed at establishing benchmarks that can assess artificial intelligence's capacity for "human-level" intelligence.

The organization, named the ARC Prize Foundation, will be led by Greg Kamradt, who has previously worked as an engineering director at Salesforce and founded the AI product studio called Leverage. Kamradt will serve as both president and a board member of the nonprofit.

In a statement on the nonprofit's website, Chollet explained, "We’re growing into a proper nonprofit foundation to act as a useful north star toward artificial general intelligence (AGI). AGI refers to AI that can perform most tasks that humans can do." He added that the organization aims to inspire advancements by addressing the gap in basic human capabilities.

The ARC Prize Foundation plans to build on the ARC-AGI, a test created by Chollet intended to determine if an AI can learn new skills outside of its training data. The test consists of various puzzle-like problems, requiring the AI to produce a correct configuration from a set of differently colored squares. These problems challenge the AI to adapt to unfamiliar tasks.

Chollet introduced the ARC-AGI, which stands for Abstract and Reasoning Corpus for Artificial General Intelligence, back in 2019. Many AI systems are skilled enough to tackle Math Olympiad questions and even attempt PhD-level problems. However, until recently, the highest-performing AI could solve just under 33% of the tasks in the ARC-AGI.

Chollet notes, "Unlike most advanced AI benchmarks, we are not trying to measure AI risk with superhuman exam questions. Future versions of the ARC-AGI benchmark will focus on closing the human capability gap toward zero."

In June of the past year, Chollet, alongside Mike Knoop from Zapier, began a competition aimed at developing an AI model capable of outperforming the ARC-AGI. The unreleased o3 model from OpenAI was the first to attain a qualifying score, though it required substantial computing resources.

Chollet recognizes that the ARC-AGI approach has limitations, as numerous models managed to achieve high scores through brute force methods. He does not believe that o3 demonstrates human-level intelligence. He stated, "Early data points indicate that the upcoming successor to the ARC-AGI benchmark will likely present significant challenges to o3, possibly lowering its score to below 30%, whereas a skilled human could score over 95% without any training." He remarked that true artificial general intelligence will be indicated by the impossibility of creating tasks that are easy for average humans but challenging for AI.

Knoop has mentioned plans to launch a second-generation ARC-AGI benchmark within the year, as well as a new competition. The nonprofit is also set to work on designing the third version of the ARC-AGI.

It remains uncertain how the ARC Prize Foundation will tackle the criticism that Chollet has received regarding his promotion of ARC-AGI as an effective benchmark for reaching AGI. The definition of AGI is currently a topic of heated debate; one employee from OpenAI recently suggested that AGI has already been realized if it is defined as AI that excels in most tasks compared to average humans.

Notably, Sam Altman, the CEO of OpenAI, expressed in December his intention for the company to collaborate with the ARC-AGI team to create future benchmarks. However, no updates were provided by Chollet regarding potential partnerships in the current announcement.

AI, Nonprofit, AGI