logo
đź”’

Member Only Content

To access all features, please consider upgrading to full Membership.

AI Ecosystem Intelligence Explorer

3D
AI Detection
AI Fundamentals

21 of 163 articles

Reasoning models don’t always say what they think

Research from Anthropic on the faithfulness of AI models’ Chain-of-Thought

AI Fundamentals
 
4/9/2025

On the Biology of a Large Language Model

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic’s lightweight production model — in a variety of contexts, using our circuit tracing methodology.

LLM
Research
AI Fundamentals
 
4/9/2025

The 2025 AI Index Report | Stanford HAI

Your browser does not support the video tag.

Geopolitics
China
Business
AI Fundamentals
 
4/8/2025

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Recent math benchmarks for large language models (LLMs) such as MathArena indicate that state-of-the-art reasoning models achieve impressive performance on mathematical competitions like AIME, with the leading model, o3-mini, achieving scores comparable to top human competitors. However, these benchmarks evaluate models solely based on final numerical answers, neglecting rigorous reasoning and proof generation which are essential for real-world mathematical tasks. To address this, we introduce the first comprehensive evaluation of full-solution reasoning for challenging mathematical problems. Using expert human annotators, we evaluated several state-of-the-art reasoning models on the six problems from the 2025 USAMO within hours of their release. Our results reveal that all tested models struggled significantly, achieving less than 5% on average. Through detailed analysis of reasoning traces, we identify the most common failure modes and find several unwanted artifacts arising from the optimization strategies employed during model training. Overall, our results suggest that current LLMs are inadequate for rigorous mathematical reasoning tasks, highlighting the need for substantial improvements in reasoning and proof generation capabilities.

LLM
AI Fundamentals
 
4/6/2025

Large Language Models Pass the Turing Test

We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time -- not significantly more or less often than the humans they were being compared to -- while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test. The results have implications for debates about what kind of intelligence is exhibited by Large Language Models (LLMs), and the social and economic impacts these systems are likely to have.

AI Fundamentals
 
4/2/2025

GitHub - VAST-AI-Research/TripoSG: TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models - VAST-AI-Research/TripoSG

3D
 
4/1/2025

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

3D
 
3/25/2025

tencent/Hunyuan3D-2mv · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

3D
 
3/23/2025

Transformers from scratch

Let’s build a Transformer Neural Network from Scratch together !

AI Fundamentals
 
3/22/2025
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only