OpenAI's First Chip, Jalapeño, Takes Aim at NVIDIA's Inference Margins

A reticle-sized, inference-only chip built with Broadcom. OpenAI claims much better performance per watt, but it has shared almost no hard specs yet.

Published June 24, 2026 · Flopper.io Research

On June 24, 2026, OpenAI revealed its first custom chip, codenamed Jalapeño. It is an inference processor designed in-house and built with Broadcom, aimed at running OpenAI's own models. The chip is still being tested, but OpenAI says early results show significantly better performance per watt than today's best alternatives, and that it used its own AI models to help design the part. The catch for anyone planning infrastructure: these are self-reported numbers from a chip still in testing, and OpenAI has not shared any TFLOPS, memory capacity, power draw, or confirmed process node. Here is what we can confirm, and what is still marketing.

What Jalapeño Actually Is

Jalapeño is OpenAI's first chip, and it is built only for inference. It is not a training chip, and it is not a reworked general-purpose part. OpenAI calls it the first step in a multi-generation compute platform, part of its October 2025 deal with Broadcom to deploy 10 GW of OpenAI-designed accelerators. Here is the confirmed shape.

AttributeJalapeño (confirmed)
VendorOpenAI (design) plus Broadcom (silicon, networking) and TSMC (fab)
TypeCustom chip, built only for LLM inference
PackageLarge, reticle-sized; eight HBM sites around the central compute die
MemoryHBM. Generation, capacity, and bandwidth not disclosed
Process nodeTSMC. Node not confirmed (the 10 GW program spans 3nm and 2nm)
PerformanceNo TFLOPS or precision figures (FP8, FP4, INT8) disclosed
Power / TDPNot disclosed
InterconnectBroadcom Ethernet, not NVLink
Systems partnerCelestica (boards, racks, integration)
StatusStill in testing
DeploymentLate 2026, at gigawatt scale

Confirmed across the launch coverage and OpenAI's own announcements. “Not disclosed” means OpenAI has not published the number.

The Performance Claim, and Why to Hold It Loosely

OpenAI's own claim is simple: early results show significantly better performance per watt than today's best alternatives. The announcement points to low running costs when the chip serves real-time coding models. Greg Brockman, OpenAI's president, framed it around fit: “We have a deep understanding of the workload. We've really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what's possible?”

Treat even that as a vendor claim, not a benchmark. The chip is still in testing, the numbers are self-reported, and OpenAI has published no per-precision performance figure. So there is nothing yet to line up in a TFLOPS-per-watt comparison against shipping parts like the H200 or GB300. Some reports put the gain at roughly 50% lower cost per token versus current NVIDIA GPUs, but that figure is not in OpenAI's own announcement. And cost per token mixes power, utilization, and hardware cost, so a cost win for in-house silicon is not the same as a raw speed win.

The Architecture Bet: Inference Is a Memory Problem

Jalapeño's design idea is that LLM inference is limited by moving data, not by raw math. OpenAI says it built the chip from scratch to cut data movement and push utilization close to its limit. That is why the package wraps eight HBM sites tightly around the compute die instead of chasing peak FLOPS. It fits where inference costs really bite: the decode step is limited by memory bandwidth, and the size of the KV cache, not raw matrix throughput, increasingly sets how many sessions a single accelerator can serve.

It is a narrower bet than a GPU makes. A general-purpose accelerator like the H200 has to handle training and inference across every kind of model. Jalapeño only has to run OpenAI's own models well. Knowing the exact workload is the whole advantage a custom chip has over a GPU.

Ethernet, Not NVLink: The Broadcom Stack

The most strategic detail is how the racks connect. NVIDIA ties its GPUs together with its own NVLink and NVSwitch. OpenAI's systems connect over standard Ethernet, using Broadcom's networking chips. Broadcom supplies the chip build and the networking, and Celestica builds the boards, racks, and systems. The goal is to build a full inference platform without paying NVIDIA's margin on either the accelerator or the network.

Jalapeño is the first part of the larger 10 GW program from October 2025: OpenAI-designed accelerators across both 3nm and 2nm generations, Ethernet-based racks, with Broadcom deployments planned from the second half of 2026 through the end of 2029. Reports also say Microsoft is expected to take about 40% of first-phase chips. If true, that matters as much as any spec, because it means Jalapeño is meant to ship in volume to a major cloud, not just power OpenAI's own service.

OpenAI's framing makes the full-stack play explicit: “OpenAI is not only developing frontier models or building products on top of them; it is designing the infrastructure underneath them: chip architecture, kernels, memory systems, networking, scheduling, deployment systems, and product experience. Because OpenAI operates across the stack, each layer can be optimized around the same goal: making its models faster, more reliable, and more affordable for users.”

What It Means for NVIDIA, and for Buyers

Jalapeño does not threaten NVIDIA at the frontier. Training the next round of models still runs on GPUs, and OpenAI's own multi-billion-dollar commitments to NVIDIA, AMD, and Oracle are not going away. The target is narrower and more profitable: the inference half of the market, where the workload is steady, the volume is huge, and a custom chip tuned to one model family can win back margin that now goes to NVIDIA.

For infrastructure buyers, the takeaway is twofold. Jalapeño is not a chip you can rent. It is in-house silicon for OpenAI and, reportedly, Microsoft, so its direct effect is on the price of OpenAI inference, not on your buying options. Less directly, it is the clearest sign yet that custom chips are the serious play in inference, joining Google's TPUs, AWS Trainium, and Meta's MTIA. The GPUs you can actually buy and rent are still the right tool for training and for inference on models you do not control, and those are the parts we track spec by spec.

Source Confidence

We hold sourcing to the same standard as our spec data. With Jalapeño, the gap between confirmed and claimed is unusually wide.

High confidence: multiply corroborated

The chip's existence, its name, its inference-only purpose, the package with eight HBM sites, the Broadcom and TSMC partnership, the Celestica systems work, the Ethernet-based networking, the late-2026 gigawatt deployment, and the wider 10 GW program (3nm and 2nm) are all consistent across the launch coverage and OpenAI's own October 2025 announcement.

Claimed: self-reported, unverified

The “significantly better performance per watt” result is OpenAI's own, from a chip still in testing. The roughly 50%-lower-cost-per-token figure comes from secondary reports and is not in OpenAI's announcement. The reported Microsoft 40% first-phase share is also from reporting, not an OpenAI statement.

Not disclosed

Peak TFLOPS at any precision, the supported precision formats, the HBM generation, capacity, and bandwidth, the die size, the power draw, and Jalapeño's exact process node. Until OpenAI publishes these, the chip cannot sit in a like-for-like spec table next to shipping GPUs.

Sources

Compare the GPUs Jalapeño is built to undercut

Custom inference chips are in-house silicon. The accelerators you can actually buy and rent, compared by precision, memory, bandwidth, power, and price.

Stay updated

Original Flopper.io analysis on AI infrastructure, new chip launches, and pricing shifts, delivered when it matters.