Gabriel Lam

Hello 2026...

A bit of a random one...but here are my 2026 predictions:

World model demo for a playable FPS game (max ~10 minutes coherence, world generated on the fly, stylised by a multimodal prompt)

World model paradigm continues — world models for code gen become the next big thing (action → observation loop for predicted state, so fewer syntax + logic errors vs the standard “react agentic” pattern)
World models for robotics allow robots to perform tasks never before seen

12/1/26 - Neo X1
(world models for predication - ??)
Cursor releases Composer 2 (SOTA SWE scores across benchmarks)

Diminishing “intelligence + problem solving” reasoning benchmark headroom / saturation (see AMIE, GPQA Diamond, etc.)
ARC-AGI 4 released; frontier models achieve <30%
ARC-AGI 3 declared saturated by late 2026 (we need a new test...!)
Continuous learning models — breakthrough on a novel way to enable models to retain information in internal states (i.e., natively, without fine-tuning or a harness / “skills”)
Agentic harnesses/frameworks become the limiting factor for achieving ever more complex long-horizon tasks
- Therefore, shift to multi-agent orchestration takes centre stage
You say prompt engineering, I say context engineering - context engineering becomes new thing, new paradigm of "models" (agentic frameworks) which use programmatic tool calling, sub-agents, python environments emerge as ways to manage context in long-horizon tasks
Single-agent system runtime for autonomous coding tasks exceeds 48 hours by mid-2026 (see GPT-5.1-Codex-Max benchmarks)
CUA models become smaller in size, increase in performance (running on phones, laptops); Google releases a CUA model that can run on Android phones
Google (DeepMind) becomes the leader in AI biology, releasing a framework to run fully autonomous simulated clinical trials
Consistently more than 50% of internet traffic is web agents (not web scrapers)
Interactions with AI move to async single-threaded interactions (e.g., on iMessage / Slack directly) rather than web interfaces
AI browsers don't take off, mass market adoption slow, Chrome remains top (due to security / data privacy concerns)
Multi-agent swarms start working very well due to agents being able to manage context of sub-agents better
OpenAI releases new operator model, post-trained GPT-5.2 varient, tops CUA benchmarks
OpenAI IPOs after releasing GPT-6 (late 2026)
OpenAI releases a new non-“GPT” series model (o4 and/or o4-Pro, or a continuous-learning model), and also releases GPT-2-OSS 120B and 20B and smaller / larger architectures (keeping MoE)
Meta releases Llama 5 (getting lower SOTA scores)
Meta’s acquisition of Manus is a catalyst for them to develop computer-use models themselves (they release their own CUA, integrated into Manus by mid-2026)
Diffusion LLMs go commercial, with a Chinese lab releasing the first SOTA dLLM by Q3 2026 (undercutting US labs)
Models saturate current coding benchmarks
US lab claims they have achieved AGI by late 2026 (assuming the running AGI definition is Samantha from Her). Though this is debated (personally) because it would involve the development of a real-time data pipeline, persistent memory layer, test-time compute strategies for data aggregation, insights synthesis, and a framework to facilitate proactive actions (e.g. interrupts)
“Ambient intelligence” becomes the phrase thrown around (AI which comes to you), and proactivity becomes the next step to achieving AGI (see above)
Therefore (see above again!), in 2026, there will still be no agreed-upon definition of AGI / ASI.
Memory (and context engineering) becomes the limiting factor in agent performance — we see a wave of new startups that are the “memory layer for AI”
A startup develops a full generative UI “OS”, claiming it’s the next shift in human–machine interactions
Thinking Machines Lab gets acquired by big tech (Apple?)
Anthropic models continue to dominate SWE-bench for most of 2026
Claude 5 (early 2026) & Opus 5 (summer 2026) released
Gemini 3.5 Pro, then 3.5 Flash released — with 3.5 Pro having strong benchmarks in scientific (biological) applications (Google becomes the leader in AI for biology; some kind of PR backlash due to biology-related applications)

By Google I/O time, they're processing 3 quadrillion tokens per month
Apple can't get AI right - they partner with Google to integrate Gemini into Siri, still no sign of promised Apple Intelligence features of 2 years ago (running on device without Private Cloud Compute)

Data breach / outage in a major SaaS company blamed on vibe coding (e.g. agent merged faulty commit - this has happened, but it will be in a large company instead and publicly acknowledged to be agent error)

Observability into LLMs becomes the next trend — tools released to inspect and interpret hidden states (we find out transformer-based LLMs store facts as geometry)
A US company (Figure, etc.) deploys 1,000 systems for residential use across the US in a first pilot programme
AG-UI and A2A catch on, and frameworks (e.g., LangGraph, Vercel AI SDK, etc.) plus Anthropic and OpenAI implement first-class support (similar adoption pattern to MCP)
OpenAI app store flops; other labs don’t adopt or copy
(therefore) Startups race to build an AI app store (using standards (MCP, A2A, etc.) for communication between different agentic systems; keep user memory stored in a central place for use across all their AI apps; access control, etc.)
Microsoft never learns...and keeps releasing clearly 'fake' ads about Copilot's functionalities
Apple hardware announcements: folding iPhone (book-style form factor, fall 2026), home hub device (iPad stuck onto a home pod), OLED macbook pros with M6 (Sep - Nov), first A chip on 2nm node, low cost macbook with A series chip, M5 Ultra, Pro Display XDR + Studio Display (with miniLED + 120Hz), Apple will tease smart glasses in late 2026