logo
🔒

Member Only Content

To access all features, please consider upgrading to full Membership.

AI Ecosystem Intelligence Explorer

LLM

21 of 235 articles

Limit of RLVR

Reasoning LLMs Are Just Efficient Samplers: RL Training Elicits No Transcending Capacity

LLM
AI Fundamentals
 
5/28/2025

LLM Inference Economics from First Principles

The main product LLM companies offer these days is access to their models via an API, and the key question that will determine the profitability they can enjoy is the inference cost structure.

LLM
Research
AI Fundamentals
 
5/17/2025

Say What You Mean: A Response to ‘Let Me Speak Freely’

A recent paper from the research team at Appier, Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models, made some very serious accusations about the quality of LLM evaluation results when performing structured generation. Their (Tam, et al.) ultimate conclusion was:

LLM
 
5/5/2025

Developing an AI-Powered Tool for Automatic Citation Validation Using NVIDIA NIM

The accuracy of citations is crucial for maintaining the integrity of both academic and AI-generated content. When citations are inaccurate or wrong…

LLM
Research
 
5/2/2025

On the Biology of a Large Language Model

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic’s lightweight production model — in a variety of contexts, using our circuit tracing methodology.

LLM
Research
AI Fundamentals
 
4/9/2025

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Recent math benchmarks for large language models (LLMs) such as MathArena indicate that state-of-the-art reasoning models achieve impressive performance on mathematical competitions like AIME, with the leading model, o3-mini, achieving scores comparable to top human competitors. However, these benchmarks evaluate models solely based on final numerical answers, neglecting rigorous reasoning and proof generation which are essential for real-world mathematical tasks. To address this, we introduce the first comprehensive evaluation of full-solution reasoning for challenging mathematical problems. Using expert human annotators, we evaluated several state-of-the-art reasoning models on the six problems from the 2025 USAMO within hours of their release. Our results reveal that all tested models struggled significantly, achieving less than 5% on average. Through detailed analysis of reasoning traces, we identify the most common failure modes and find several unwanted artifacts arising from the optimization strategies employed during model training. Overall, our results suggest that current LLMs are inadequate for rigorous mathematical reasoning tasks, highlighting the need for substantial improvements in reasoning and proof generation capabilities.

LLM
AI Fundamentals
 
4/6/2025

GitHub - smartaces/Anthropic_Claude_Sonnet_3_7_extended_thinking_colab_quickstart_notebook

Contribute to smartaces/Anthropic_Claude_Sonnet_3_7_extended_thinking_colab_quickstart_notebook development by creating an account on GitHub.

LLM
AI Fundamentals
 
2/25/2025

Dust Off That Old Hardware and Run DeepSeek R1 on It

No A100 GPU? No problem! You can use exo to combine old laptops, phones, and Raspberry Pis into an AI powerhouse that runs even DeepSeek R1.

LLM
Applied AI
 
2/22/2025
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only
Members Only