How Graphite achieves this
Graphite metric paths natively encode team and project tags,
gpu.team.{name}.project.{name}.utilization, no complex label taxonomy required
Graphite's integral() and sumSeries() functions compute total GPU-hours consumed per team per billing period, directly reportable to finance
Idle GPU alert: Graphite threshold fires when utilisation stays below 10% for >15 minutes during business hours, reclaim spend automatically
Cost modelling: divideSeries(cost_per_hour, tokens_per_hour) produces a live cost-per-token metric visualised in the same Graphite dashboard
MetricFire includes pre-built Grafana dashboards for GPU FinOps. Per-team cost attribution, idle GPU tracking, cluster efficiency, and spend trends, ready on day one with no dashboard configuration needed
Graphite metrics collected
gpu.{id}.utilization_pct
gpu.{id}.power_watts
cost.team.{name}.gpu_hours
cost.project.{name}.gpu_hours
cluster.allocated_gpus
cluster.idle_gpus
cost.per_token
Self-hosted pain solved
✕Legacy monitoring stacks lack standard cost attribution taxonomy → Graphite path conventions encode cost dimensions natively
✕Ops teams can't produce GPU spend reports for engineering managers → Graphite summaries are directly exportable
✕Idle GPU detection requires complex query logic in legacy stacks → Graphite threshold alerts on utilisation paths are simple and reliable