OpenAI has taken a bold step in the world of artificial intelligence by releasing its first AI model designed to run on non-Nvidia hardware. This move, which involves the implementation of the new GPT-5.3-Codex-Spark coding model on Cerebras chips, marks a significant moment in the race for innovation within the industry.
Unprecedented Speed
The Codex-Spark model stands out for its extraordinary speed, capable of processing code at over 1,000 tokens per second. This performance is approximately 15 times faster than its predecessor, a quantum leap that promises to revolutionize how programmers interact with AI tools. For comparison, Anthropic's Claude Opus 4.6 model, in its paid fast mode, reaches about 2.5 times its standard speed of 68.2 tokens per second, although it is a larger and more capable model than Spark. "Cerebras has been a great engineering partner, and we are excited to add fast inference as a new capability to the platform," said Sachin Katti, Head of Compute at OpenAI.
Availability and Functionality
Codex-Spark is currently available in preview for ChatGPT Pro subscribers ($200 per month) via the Codex app, the command-line interface, and the VS Code extension. OpenAI is gradually extending API access to selected partners. The model features a 128,000-token context window and, at launch, handles only text. This version is based on the full GPT-5.3-Codex model, launched by OpenAI earlier this month. While the full model is designed for complex coding tasks, OpenAI has optimized Spark for speed, at the expense of knowledge depth. The model was built exclusively for text and specifically for coding, not for the generic tasks handled by the larger GPT-5.3 version.
Performance and Comparisons
According to OpenAI, on SWE-Bench Pro and Terminal-Bench 2.0, two benchmarks for evaluating software engineering capabilities, Spark outperforms the previous GPT-5.1-Codex-mini, completing tasks in a fraction of the time. The company has not shared independent validation of these numbers. In the past, Codex's speed has been a weak point; when Ars tested four AI coding agents building Minesweeper clones in December, Codex took about twice as long as Anthropic's Claude Code to produce a working game.
The Race for AI Coding Agents
The 1,000 tokens per second speed of GPT-5.3-Codex-Spark represents a significant advance over anything OpenAI has previously offered via its own infrastructure. According to independent benchmarks by Artificial Analysis, OpenAI's fastest models on Nvidia hardware max out well below that figure: GPT-4o offers about 147 tokens per second, o3-mini reaches about 167, and GPT-4o mini stands at about 52. However, 1,000 tokens per second is actually modest by Cerebras standards. The company measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on OpenAI's open-weight gpt-oss-120B model, suggesting that Codex-Spark's relatively lower speed reflects the overhead of a larger or more complex model. AI coding agents have had a successful year, with tools like OpenAI's Codex and Anthropic's Claude Code reaching a new level of utility for rapid prototyping, interfaces, and boilerplate code. OpenAI, Google, and Anthropic are competing to provide more capable coding agents, and latency has become what separates the winners; a model that codes faster allows a developer to iterate faster.
Diversification and Hardware Strategy
The hardware story of Spark might be more significant than its benchmark scores. The model runs on Cerebras's Wafer Scale Engine 3, a plate-sized chip that Cerebras has built its business around since at least 2022. OpenAI and Cerebras announced their partnership in January, and Codex-Spark is the first product to result from it. OpenAI has spent the last year systematically reducing its dependence on Nvidia. The company signed a multi-year deal with AMD in October 2025, struck a $38 billion cloud computing deal with Amazon in November, and designed its own custom AI chip for eventual fabrication by TSMC. Meanwhile, a $100 billion infrastructure deal with Nvidia has so far fallen through, although Nvidia committed to a $20 billion investment. Reuters reported that OpenAI grew dissatisfied with the speed of some Nvidia chips for inference tasks, which is exactly the type of workload OpenAI designed Codex-Spark for. Regardless of the chip used, speed is important, although it may come at the expense of accuracy. For developers spending their days inside a code editor waiting for AI suggestions, 1,000 tokens per second might feel less like carefully piloting a puzzle and more like using a rip saw. Just watch what you're cutting.
Sponsored Protocol