January 1, 2026
Predictions for 2026
Hello 2026...
A bit of a random one...but here are my 2026 predictions:
- World model demo for a playable FPS game (max ~10 minutes coherence, world generated on the fly, stylised by a multimodal prompt)
- World model paradigm continues — world models for code gen become the next big thing (action → observation loop for predicted state, so fewer syntax + logic errors vs the standard “react agentic” pattern)
-
World models for robotics allow robots to perform tasks never before seen
12/1/26 - Neo X1
- (world models for predication - ??)
- Cursor releases Composer 2 (SOTA SWE scores across benchmarks)
- Diminishing “intelligence + problem solving” reasoning benchmark headroom / saturation (see AMIE, GPQA Diamond, etc.)
- ARC-AGI 4 released; frontier models achieve <30%
- ARC-AGI 3 declared saturated by late 2026 (we need a new test...!)
- Continuous learning models — breakthrough on a novel way to enable models to retain information in internal states (i.e., natively, without fine-tuning or a harness / “skills”)
- Agentic harnesses/frameworks become the limiting factor for achieving ever more complex long-horizon tasks
- Therefore, shift to multi-agent orchestration takes centre stage
- You say prompt engineering, I say context engineering - context engineering becomes new thing, new paradigm of "models" (agentic frameworks) which use programmatic tool calling, sub-agents, python environments emerge as ways to manage context in long-horizon tasks
- Single-agent system runtime for autonomous coding tasks exceeds 48 hours by mid-2026 (see GPT-5.1-Codex-Max benchmarks)
- CUA models become smaller in size, increase in performance (running on phones, laptops); Google releases a CUA model that can run on Android phones
- Google (DeepMind) becomes the leader in AI biology, releasing a framework to run fully autonomous simulated clinical trials
- Consistently more than 50% of internet traffic is web agents (not web scrapers)
- Interactions with AI move to async single-threaded interactions (e.g., on iMessage / Slack directly) rather than web interfaces
- AI browsers don't take off, mass market adoption slow, Chrome remains top (due to security / data privacy concerns)
- Multi-agent swarms start working very well due to agents being able to manage context of sub-agents better
- OpenAI releases new operator model, post-trained GPT-5.2 varient, tops CUA benchmarks
- OpenAI IPOs after releasing GPT-6 (late 2026)
- OpenAI releases a new non-“GPT” series model (o4 and/or o4-Pro, or a continuous-learning model), and also releases GPT-2-OSS 120B and 20B and smaller / larger architectures (keeping MoE)
- Meta releases Llama 5 (getting lower SOTA scores)
- Meta’s acquisition of Manus is a catalyst for them to develop computer-use models themselves (they release their own CUA, integrated into Manus by mid-2026)
- Diffusion LLMs go commercial, with a Chinese lab releasing the first SOTA dLLM by Q3 2026 (undercutting US labs)
- Models saturate current coding benchmarks
- US lab claims they have achieved AGI by late 2026 (assuming the running AGI definition is Samantha from Her). Though this is debated (personally) because it would involve the development of a real-time data pipeline, persistent memory layer, test-time compute strategies for data aggregation, insights synthesis, and a framework to facilitate proactive actions (e.g. interrupts)
- “Ambient intelligence” becomes the phrase thrown around (AI which comes to you), and proactivity becomes the next step to achieving AGI (see above)
- Therefore (see above again!), in 2026, there will still be no agreed-upon definition of AGI / ASI.
- Memory (and context engineering) becomes the limiting factor in agent performance — we see a wave of new startups that are the “memory layer for AI”
- A startup develops a full generative UI “OS”, claiming it’s the next shift in human–machine interactions
- Thinking Machines Lab gets acquired by big tech (Apple?)
- Anthropic models continue to dominate SWE-bench for most of 2026
- Claude 5 (early 2026) & Opus 5 (summer 2026) released
- Gemini 3.5 Pro, then 3.5 Flash released — with 3.5 Pro having strong benchmarks in scientific (biological) applications (Google becomes the leader in AI for biology; some kind of PR backlash due to biology-related applications)
- By Google I/O time, they're processing 3 quadrillion tokens per month
- Apple can't get AI right - they partner with Google to integrate Gemini into Siri, still no sign of promised Apple Intelligence features of 2 years ago (running on device without Private Cloud Compute)
- Data breach / outage in a major SaaS company blamed on vibe coding (e.g. agent merged faulty commit - this has happened, but it will be in a large company instead and publicly acknowledged to be agent error)
- Observability into LLMs becomes the next trend — tools released to inspect and interpret hidden states (we find out transformer-based LLMs store facts as geometry)
- A US company (Figure, etc.) deploys 1,000 systems for residential use across the US in a first pilot programme
- AG-UI and A2A catch on, and frameworks (e.g., LangGraph, Vercel AI SDK, etc.) plus Anthropic and OpenAI implement first-class support (similar adoption pattern to MCP)
- OpenAI app store flops; other labs don’t adopt or copy
- (therefore) Startups race to build an AI app store (using standards (MCP, A2A, etc.) for communication between different agentic systems; keep user memory stored in a central place for use across all their AI apps; access control, etc.)
- Microsoft never learns...and keeps releasing clearly 'fake' ads about Copilot's functionalities
- Apple hardware announcements: folding iPhone (book-style form factor, fall 2026), home hub device (iPad stuck onto a home pod), OLED macbook pros with M6 (Sep - Nov), first A chip on 2nm node, low cost macbook with A series chip, M5 Ultra, Pro Display XDR + Studio Display (with miniLED + 120Hz), Apple will tease smart glasses in late 2026
29/1/26 - Google Genie 3
29/1/26 - Composer 1.5
19/2/26 - Not quite... Gemini 3.1 Pro instead...
Link here
12/1/26 - "Google Gemini to power Apple's AI features like Siri"
21/2/26 - AWS outage caused by fault merged commit
Link here