A new, challenging AGI test stumps most AI models | TechCrunch

Illustrate an image in a positive and light style, maintaining 3:2 aspect ratio. Display a variety of visual puzzles and patterns, symbolizing an AGI test. The puzzles can be made up of geometric shapes, intricate designs or complex mazes. Additionally, add a division showing human and AI participants. On the human side, indicate variety by illustrating a Caucasian man, an Asian woman, and a Black child enthusiastically solving the puzzles. On the AI side, show digital bots in different shapes and sizes trying to interpret the puzzles. Also, feature representations of AI models, such as circuits or binary codes, scoring lower than the human individuals. Lastly, incorporate a scoreboard showing humans scoring higher than AI models.

A new AGI test, called ARC-AGI-2, has been developed by the Arc Prize Foundation to assess the general intelligence of AI models. Co-founded by AI researcher François Chollet, the test has proven challenging, with leading models like OpenAI’s o1-pro and DeepSeeks R1 scoring between 1% and 1.3%. In contrast, humans participating in the test achieved an average score of 60%. The ARC-AGI-2 test features puzzle-like problems requiring AI to identify visual patterns and adapt to new challenges. This iteration aims to address flaws from its predecessor, emphasizing efficiency and the ability to interpret patterns rather than relying on brute-force computing. Chollet noted that the new test is a better measure of AI intelligence, as it prevents models from simply memorizing solutions. The foundation has also launched a contest encouraging developers to achieve 85% accuracy on the new test while minimizing costs. Overall, ARC-AGI-2 seeks to provide a more nuanced benchmark for evaluating AI’s capabilities.

Full article

Leave a Reply