🎨
Generative AI

Image & Video Generation APIs

VRAM fragmentation gen time p95 OOM events concurrent jobs

How Graphite achieves this
Diffusion pipeline metrics (generation time, OOM count, queue depth) ship via StatsD UDP to Graphite, zero-latency ingestion alongside DCGM hardware counters
Graphite's nPercentile() function computes generation time p95 and p99 across all workers in real time, without pre-computation or recording rules
OOM event counter with Graphite threshold alerts: instant notification when VRAM is exhausted, enabling rapid batching configuration changes
VRAM fragmentation (fb_used / fb_total) calculated inline using Graphite's divideSeries(), one metric path, no custom exporter
MetricFire includes pre-built Grafana dashboards for generative AI pipelines. VRAM pressure, OOM event tracking, queue depth, and generation latency panels, ready on day one with no dashboard configuration needed
Graphite metrics collected
gpu.{id}.fb_used_mib gpu.{id}.fb_free_mib diffusion.gen_time_ms diffusion.queue_depth diffusion.concurrent_jobs diffusion.oom_events diffusion.throughput_img_per_s
Self-hosted pain solved
Custom exporter required to ship diffusion metrics alongside GPU counters → Graphite StatsD accepts arbitrary application metrics natively
OOM events go unalerted on self-hosted stacks → Graphite threshold alert on oom_events counter is a 2-minute setup
p95 generation latency requires recording rules → Graphite percentile functions compute at query time
Graphite value: Generative AI API teams reduce costly OOM restarts and queue timeout incidents by surfacing VRAM pressure before it causes failures, while keeping generation latency SLOs visible to the whole business through MetricFire-hosted Grafana dashboards backed by a single Graphite data store.

GPU Monitoring Use Cases
Explore other use cases

MetricFire's Hosted Graphite covers every GPU workload. See how it fits your team's specific challenge.