How Graphite achieves this
Diffusion pipeline metrics (generation time, OOM count, queue depth) ship via StatsD UDP to Graphite, zero-latency ingestion alongside DCGM hardware counters
Graphite's nPercentile() function computes generation time p95 and p99 across all workers in real time, without pre-computation or recording rules
OOM event counter with Graphite threshold alerts: instant notification when VRAM is exhausted, enabling rapid batching configuration changes
VRAM fragmentation (fb_used / fb_total) calculated inline using Graphite's divideSeries(), one metric path, no custom exporter
MetricFire includes pre-built Grafana dashboards for generative AI pipelines. VRAM pressure, OOM event tracking, queue depth, and generation latency panels, ready on day one with no dashboard configuration needed
Graphite metrics collected
gpu.{id}.fb_used_mib
gpu.{id}.fb_free_mib
diffusion.gen_time_ms
diffusion.queue_depth
diffusion.concurrent_jobs
diffusion.oom_events
diffusion.throughput_img_per_s
Self-hosted pain solved
✕Custom exporter required to ship diffusion metrics alongside GPU counters → Graphite StatsD accepts arbitrary application metrics natively
✕OOM events go unalerted on self-hosted stacks → Graphite threshold alert on oom_events counter is a 2-minute setup
✕p95 generation latency requires recording rules → Graphite percentile functions compute at query time