AI Factory Performance Testing
How to utilize GPUs with a purpose-built AI stack?
Summary: A purpose-built AI stack solves the GPU idle time problem caused by data bottlenecks. Researchers can focus on model development rather than infrastructure constraints using the AI stack with well-integrated storage, networking, and orchestration systems.
We tested the AI Factory cluster – a high-performance computing environment designed specifically for AI development. As an AI development company with over three decades of experience, our goal was to ensure whether this AI research setup could eliminate a common bottleneck: "GPUs waiting for data." Successful performance would enable our researchers to iterate faster, experiment more deeply, and deliver real-world AI results more efficiently.
Why this stack matters?
In advanced AI workflows – such as large language models, multi-modal vision, and real-time inference – the bottleneck often is not the GPU arithmetic, but the data path. If storage, I/O, memory-copies or network links slow down, the GPU sits idle, resulting in prolonged experiments and overall costs rise. A purpose-built stack can solve multiple issues.
Technical Lead at the NLP team
What are the components of the AI stack?
The tested cluster is powered by a state-of-the-art storage and compute stack, featuring a Pure Storage FlashBlade//S storage system that is array feeding data to an NVIDIA DGX-class GPU system. The entire environment is orchestrated by Run:AI, a software platform that acts as a traffic controller, managing and allocating GPU resources for maximum efficiency.
- The FlashBlade//S by Pure Storage is designed for throughput at AI development scale. The officially certified access protocol for NVIDIA SuperPod is NFS with RoCE (RDMA over Converged Ethernet), which enables direct-to-GPU workflows.
- The NVIDIA GPUDirect Storage (GDS) pipeline allows storage to stream data directly into GPU memory via DMA – bypassing CPU and host bottlenecks.
- The Run:AI platform (built on Kubernetes) turns this high-end infrastructure into a productive, shared environment: researchers login, launch notebooks, run jobs – all in minutes rather than hours.
- NVIDIA ecosystem is important to mention – even if NVIDIA was not part of the initial PoC conversations, the ecosystem of DGX, GDS, certified storage and orchestration is tightly aligned with their technology roadmap.
What we did and what we discovered?
Once logged into the Jupyter/Run:AI interface, we initiated a series of real research-grade workloads. These included: mixed-type I/O (featuring large sequential loads, numerous small files, and overlapped compute and data streams), multi-GPU tasks, and parallel model runs.. Observationally, what stood out:
- Data-feeding to GPUs was seamless – the storage to GPU path did not show visible stalls or downtime even under heavy load.
- During high concurrency workloads (multiple notebooks, competing jobs), GPU utilisation stayed high – signalling the underlying system was not the limiting factor.
- The user-experience was smooth: from login to a running experiment in minutes. For a research team that means more time designing models and less time babysitting infrastructure.
- From our perspective, the system felt like "local NVMe performance" but with the flexibility and scale of external array shared among all worker nodes in cluster.
Why it is important for the broader AI ecosystem?
The implications are clear: when the data pipeline is solved, researchers focus on model innovation rather than infrastructure wrangling. We were able to accelerate iteration, reduce idle time, and explore more model variants in the same calendar week. Beyond.pl and Pure Storage infrastructure supports real-world AI work, not just synthetic benchmarks. For the broader ecosystem of European AI research and production, this kind of stack shows that you can deploy infrastructure that scales, shares, and performs – and that data-feeding no longer has to be a drag on productivity.
Key take-aways
- GPUs are only as fast as the slowest link in the chain – and that link increasingly is the I/O/data path, not the compute.
- With technologies like NFS path and RoCE storage, GDS-enabled pipelines and smart orchestration, remote/shared storage can behave like local NVMe for GPU-centric workflows.
- The orchestration layer (Run:AI) converts infrastructure from "hard to use" to "easy to use" – unlocking real productivity gains.
- For organisations striving to scale research, iterate faster, deploy inference at higher volume or support multiple teams – this stack is a hammer, not just a toy.
Head of AI Infrastructure & Services
References
- Philip Ninan. "Pure Storage Reference Architecture for NVIDIA Enterprise AI Factory Accelerates Intelligence at Scale." Pure Storage Blog. June 11, 2025.
- NVIDIA. "GPUDirect Storage Release Notes." NVIDIA Documentation Hub. Accessed February 3, 2026.
- NVIDIA. "Run:ai DOCS Overview." NVIDIA Run:ai docs. Accessed February 3, 2026.
Related Solutions
Neurotechnology provides solutions, based on NIST-proven biometric algorithms, which combine high performance with top reliability. These solutions may be applied in various industries, as well as utilized for both civil and forensic applications, which requre properly-tested, accurate technologies.
Creates secure identity registers, preventing identity fraud and ensuring accurate distribution of benefits.
Biometric technologies ensure fair and secure elections due to accurate voter management capabilities.
Facial recognition and fingerprint scanners verify traveler identity and prevent unauthorized entry.
Biometric technologies are used for criminal identification, investigation and tracking.
Biometric authentication enhances security and streamlines transactions by replacing traditional passwords.
help you achieve your AI-related goals.
