Searched for
AI BENCHMARKS IN MATHEMATICS
Humans outperform AI at this highly rigorous mathematics testA new AI test called First Proof used unpublished math problems. AI systems were tested against human mathematicians. While AI showed some ...
Google unveils Gemma 4, expands lightweight open model lineup for developersThe Gemma 4 model offers capabilities such as advanced reasoning, agentic workflows, coding, and support for over 140 languages. The models...
US stock market crashes today: Dow Jones, S&P 500, and Nasdaq fall hard as bond yields hit 19-year high - Nvidia, Broadcom, AMD, and Amazon lead massive Wall Street selloffUS stock market crashes today as Dow Jones, S&P 500 and Nasdaq plunge. The Dow Jones Industrial Average dropped more than 237 points, while...
Elon Musk earned more last year than most countries — Tesla's $158 billion pay filing has everyone stunnedElon Musk Tesla compensation $158 billion stunned markets with a record-setting payout tied fully to performance. On paper, it is the large...
OpenAI launches GPT-5.5 with API pricing starting at $5 per 1 million tokensOpenAI has launched GPT-5.5, its most advanced AI model yet, designed for complex, multi-part tasks and acting as an active collaborator. T...
Researchers unveil ‘Humanity’s Last Exam,’ it's so difficult that today’s AI systems consistently fail itHumanity’s Last Exam (HLE) is a groundbreaking 2,500-question assessment created to reveal the limits of advanced AI systems. Developed by ...
Sarvam unveils two new large language models focused on real-time use, advanced reasoningThe company said the model is optimised for “efficient thinking”, delivering stronger responses while using fewer tokens — a key factor in ...
'Basically zero, garbage': Renowned mathematician Joel David Hamkins declares AI Models useless for solving math. Here's whyOne of the world's biggest mathematicians Joel David Hamkins has slammed AI models used for solving mathematics and called them basically z...
Anthropic launches new vibe coding model Claude Sonnet 4.5: All you need to knowAnthropic launched Claude Sonnet 4.5, its latest AI coding model, which it claims has improved coding, reasoning, and mathematical skills. ...
High school maths trips Olympiad gold medalist AI models: Google Deepmind CEO answers whyDemis Hassabis, Google Deepmind CEO, notes AI models like Gemini paradoxically struggle with basic math despite acing advanced Olympiad pro...
Award-winning variant of Gemini's AI model is live, confirms CEO Sundar PichaiThe Gemini 2.5 Deep Think model has been rolled out to a small group of mathematicians and academics to take early feedback and enhance the...
Musk says xAI will make kid-friendly app 'Baby Grok'Elon Musk’s AI firm xAI is set to launch "Baby Grok," a separate, kid-friendly chatbot app. Announced via Musk’s X post, the move follows e...
Can AI quicken the pace of math discovery?DARPA’s new Exponentiating Mathematics programme aims to accelerate pure mathematics research by developing AI tools capable of high-level ...
Telegram, Elon Musk's xAI partner to distribute Grok to messaging app's usersTelegram CEO Pavel Durov announced a partnership with Elon Musk's AI company xAI to distribute its chatbot Grok to Telegram’s billion-plus ...
Sarvam AI unveils multilingual LLM; tepid response poses questions over India’s AI chopsSarvam AI launched Sarvam M, a 24-billion-parameter model, built on top of French AI company Mistral's model Small. The startup said the mo...
Google unveils Gemini 2.5, claims enhanced reasoning and coding capabilitiesGoogle describes Gemini 2.5 as a “thinking model”, capable of reasoning through its processes before responding, leading to enhanced accura...
Alibaba shares surge after it unveils reasoning modelQwen, the ecommerce leader's artificial intelligence unit, said on X that its QwQ-32B, with 32 billion parameters, can achieve performance ...
AI war heats up: OpenAI’s GPT-4.5 tops charts, but Elon Musk says its reign won’t last — Is xAI ready to strike?Elon Musk has responded to a report by AI benchmarking platform LMArena, which ranked OpenAI’s GPT-4.5 as the top-performing AI model acros...
Grok to be a standalone app for MacOS, WindowsThe chatbot operated by Musk's AI company xAI has been in the news for releasing its latest version, Grok 3, which boasts to be the world's...
Alibaba releases QwQ-32B-Preview, an AI rival to OpenAI's o1This model is focused on advancing AI reasoning capabilities. In contrast to most AI, QwQ-32B-Preview and similar models can fact check the...