Image & Video Generation APIs

VRAM fragmentation gen time p95 OOM events concurrent jobs

How Graphite achieves this

Diffusion pipeline metrics (generation time, OOM count, queue depth) ship via StatsD UDP to Graphite, zero-latency ingestion alongside DCGM hardware counters

Graphite's nPercentile() function computes generation time p95 and p99 across all workers in real time, without pre-computation or recording rules

OOM event counter with Graphite threshold alerts: instant notification when VRAM is exhausted, enabling rapid batching configuration changes

VRAM fragmentation (fb_used / fb_total) calculated inline using Graphite's divideSeries(), one metric path, no custom exporter

MetricFire includes pre-built Grafana dashboards for generative AI pipelines. VRAM pressure, OOM event tracking, queue depth, and generation latency panels, ready on day one with no dashboard configuration needed

Graphite metrics collected

gpu.{id}.fb_used_mib gpu.{id}.fb_free_mib diffusion.gen_time_ms diffusion.queue_depth diffusion.concurrent_jobs diffusion.oom_events diffusion.throughput_img_per_s

Self-hosted pain solved

✕Custom exporter required to ship diffusion metrics alongside GPU counters → Graphite StatsD accepts arbitrary application metrics natively

✕OOM events go unalerted on self-hosted stacks → Graphite threshold alert on oom_events counter is a 2-minute setup

✕p95 generation latency requires recording rules → Graphite percentile functions compute at query time

📅 Book a GenAI Demo 🚀 Start Free Trial 📖 Read Blog Post

GPU Monitoring Use Cases

Explore other use cases

MetricFire's Hosted Graphite covers every GPU workload. See how it fits your team's specific challenge.

🤖 Large Model Training Runs ML Training ⚡ LLM Inference at Scale Inference 🔬 HPC & Scientific Computing HPC / Research 🎮 Cloud Gaming & Video Streaming Cloud Gaming 💰 GPU Cost Attribution & Optimization FinOps 🛡️ GPU Fleet Health & SRE Platform Ops / SRE 🚗 Edge AI & Embedded GPU Fleets Edge AI