logo
đź”’

Member Only Content

To access all features, please consider upgrading to full Membership.

AI Ecosystem Intelligence Explorer

AI Fundamentals

21 of 91 articles

Reasoning models don’t always say what they think

Research from Anthropic on the faithfulness of AI models’ Chain-of-Thought

AI Fundamentals
 
4/9/2025

On the Biology of a Large Language Model

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic’s lightweight production model — in a variety of contexts, using our circuit tracing methodology.

LLM
Research
AI Fundamentals
 
4/9/2025

The 2025 AI Index Report | Stanford HAI

Your browser does not support the video tag.

Geopolitics
China
Business
AI Fundamentals
 
4/8/2025

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Recent math benchmarks for large language models (LLMs) such as MathArena indicate that state-of-the-art reasoning models achieve impressive performance on mathematical competitions like AIME, with the leading model, o3-mini, achieving scores comparable to top human competitors. However, these benchmarks evaluate models solely based on final numerical answers, neglecting rigorous reasoning and proof generation which are essential for real-world mathematical tasks. To address this, we introduce the first comprehensive evaluation of full-solution reasoning for challenging mathematical problems. Using expert human annotators, we evaluated several state-of-the-art reasoning models on the six problems from the 2025 USAMO within hours of their release. Our results reveal that all tested models struggled significantly, achieving less than 5% on average. Through detailed analysis of reasoning traces, we identify the most common failure modes and find several unwanted artifacts arising from the optimization strategies employed during model training. Overall, our results suggest that current LLMs are inadequate for rigorous mathematical reasoning tasks, highlighting the need for substantial improvements in reasoning and proof generation capabilities.

LLM
AI Fundamentals
 
4/6/2025

Large Language Models Pass the Turing Test

We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time -- not significantly more or less often than the humans they were being compared to -- while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test. The results have implications for debates about what kind of intelligence is exhibited by Large Language Models (LLMs), and the social and economic impacts these systems are likely to have.

AI Fundamentals
 
4/2/2025

Transformers from scratch

Let’s build a Transformer Neural Network from Scratch together !

AI Fundamentals
 
3/22/2025

A Practical Guide to Implementing DeepSearch/DeepResearch

QPS out, depth in. DeepSearch is the new norm. Find answers through read-search-reason loops. Learn what it is and how to build it.

Research
AI Fundamentals
 
2/25/2025

GitHub - smartaces/Anthropic_Claude_Sonnet_3_7_extended_thinking_colab_quickstart_notebook

Contribute to smartaces/Anthropic_Claude_Sonnet_3_7_extended_thinking_colab_quickstart_notebook development by creating an account on GitHub.

LLM
AI Fundamentals
 
2/25/2025

Aman’s AI Journal • Natural Language Processing • Attention

Aman’s AI Journal | Course notes and learning material for Artificial Intelligence and Deep Learning Stanford classes.

AI Fundamentals
 
2/23/2025
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only