AMD has introduced the Instinct MI350P PCIe, a new enterprise GPU accelerator designed to fit into servers that companies already have, without requiring a full data center rebuild. Targeted at agentic AI and on‑premises inference, the card is a dual‑slot, air‑cooled PCIe add‑in that can be dropped into standard 2U (or larger) servers.
The MI350P is AMD’s first PCIe‑based Instinct accelerator in four years. Traditionally, AMD has shipped Instinct GPUs as OAM modules in groups of eight, but the MI350P lets enterprises start with a single card and scale up gradually. This makes it easier for companies to experiment with AI without a large upfront hardware commitment.
Designed for On‑Prem AI and RAG
The MI350P is built for on‑premises inference within existing data center power, cooling, and rack infrastructure. It works in air‑cooled systems with up to eight accelerator cards per node, supporting small, medium, and large AI models for inference and retrieval‑augmented generation (RAG) pipelines.
Key specs include:
-
144GB of HBM3E memory per card
-
Memory bandwidth up to 4 TB/s
-
Estimated 2,299 TFLOPS of performance
-
Up to 4,600 peak TFLOPS at MXFP4, which AMD says is the highest performance currently available in an enterprise PCIe card.
The card also supports lower‑precision formats such as MXFP6 and MXFP4, which are optimized for high throughput and can be used with sparsity acceleration for most mainstream 8‑bit and 16‑bit precisions.
Sparsity and Efficiency
AMD highlights sparsity as a key efficiency feature. The MI350P can skip processing of zero values in data and matrices, reducing the amount of work needed for inference. This helps higher‑precision formats like INT8 and BF16 deliver more efficient performance without sacrificing accuracy.
Each MI350P card can handle roughly 200–250 billion parameter large language models on its own. With up to eight cards per node, the system can cover a wide range of workloads, including SLM / MLM / LLM inference and RAG, across small to very large models.
Software and Ecosystem
The Instinct MI350P is supported by AMD’s ROCm open‑source software stack, the same stack used across its Instinct and Radeon products. This means developers can use familiar tools and frameworks for AI training and inference, rather than learning a new ecosystem from scratch.
AMD has not yet announced a launch date or pricing for the MI350P. The card is aimed at companies that want to deploy AI inference on‑premises using existing infrastructure, while retaining the option to scale up as models and workloads grow.