ContainersPrometheus Telemetry + System Job

HashiCorp Nomad Integration

Monitor job allocations, resource utilization, scheduler health, and client node capacity across your Nomad clusters. Complete workload orchestration visibility from a single pane.

Connect Nomad Book a Demo

Setup

How It Works

Enable Nomad Telemetry

Add the telemetry stanza to your nomad.hcl configuration. Nomad exposes Prometheus-compatible metrics at the /v1/metrics endpoint. TigerOps scrapes this using a Nomad ACL token with the metrics:read capability.

Create an ACL Token

Create a Nomad ACL policy granting read access to metrics and job status. TigerOps uses this minimal token — never an admin or management token — to collect operational telemetry.

Deploy the TigerOps Scraper Job

Submit the TigerOps collector as a Nomad system job. It auto-discovers all Nomad servers and clients in the cluster and collects allocation-level metrics across all namespaces.

Set Job Health Alerts

Configure alerts for failed allocations, scheduler pending depth, and resource utilization per job namespace. TigerOps applies AI baselines based on your deployment patterns.

Capabilities

What You Get Out of the Box

Job Allocation Health

Running, pending, failed, and lost allocation counts per job, task group, and namespace. TigerOps fires alerts on allocation failures and tracks restart loops for individual tasks.

Resource Utilization Per Job

CPU MHz, memory bytes, and disk utilization per allocation and task group. Compare reserved vs. actual usage and identify over-provisioned jobs to optimize resource efficiency.

Scheduler Throughput

Scheduler queue depth, evaluation throughput, and placement latency. Detect when the scheduler is falling behind under high job churn or when placement is slowed by resource contention.

Client Node Health

Per-client CPU, memory, and disk availability alongside Nomad fingerprint status. TigerOps alerts when client nodes become full or when drivers (Docker, exec, Java) report errors.

Deployment & Canary Metrics

Canary allocation health, deployment progress, and rollback events for Nomad rolling deployments. TigerOps tracks deployment success rate and correlates rollbacks with metric anomalies.

Consul Service Mesh Integration

When Nomad is used with Consul Connect, TigerOps correlates Nomad allocation health with Consul service mesh sidecar proxy metrics for full workload observability.

Configuration

nomad.hcl Telemetry Config

Add the telemetry stanza to your Nomad server and client configurations to enable Prometheus metric collection.

nomad.hcl

# nomad.hcl — server and client telemetry configuration
data_dir  = "/opt/nomad/data"
bind_addr = "0.0.0.0"

server {
  enabled          = true
  bootstrap_expect = 3
}

client {
  enabled = true
}

# Enable Prometheus-compatible metrics endpoint
telemetry {
  collection_interval        = "10s"
  disable_hostname           = false
  prometheus_metrics         = true
  publish_allocation_metrics = true
  publish_node_metrics       = true
}

# ACL must be enabled for production clusters
acl {
  enabled = true
}

---
# Nomad ACL policy for TigerOps (tigerops-policy.hcl)
namespace "*" {
  policy = "read"
}

node {
  policy = "read"
}

# Apply the policy and create a token
nomad acl policy apply tigerops-monitoring tigerops-policy.hcl

nomad acl token create   -name="tigerops-scraper"   -policy=tigerops-monitoring   -type=client

# TigerOps collector as a Nomad system job (tigerops.nomad)
job "tigerops-collector" {
  type = "system"

  group "collector" {
    task "tigerops" {
      driver = "docker"
      config {
        image = "atatus/nomad-collector:latest"
      }
      env {
        TIGEROPS_API_KEY  = "your_api_key"
        NOMAD_TOKEN       = "your_acl_token"
        NOMAD_ADDR        = "http://${attr.unique.network.ip-address}:4646"
      }
    }
  }
}

FAQ

Common Questions

Does TigerOps support Nomad ACL-enabled clusters?

Yes. TigerOps supports Nomad clusters with ACLs enabled. You create a minimal ACL policy granting only metrics and node read access, and provide the resulting token to TigerOps. No management or admin token is ever required.

Can TigerOps monitor Nomad multi-region federated deployments?

Yes. TigerOps supports Nomad multi-region federation. You configure one collector per region, and TigerOps aggregates metrics across all regions into a unified dashboard with region-level labeling.

How does TigerOps handle Nomad task driver failures?

TigerOps ingests Nomad client driver health events and correlates them with allocation failures. When a Docker or exec driver reports unhealthy status on a client node, TigerOps groups the affected allocations and fires a single correlated alert rather than flooding you with per-allocation alerts.

Does TigerOps support Nomad namespaces?

Yes. TigerOps collects metrics across all Nomad namespaces the configured ACL token has access to. You can filter dashboards and alerts by namespace and set per-namespace alert thresholds.

Can I monitor Nomad batch jobs and their completion rates?

Yes. TigerOps tracks Nomad batch job completion rates, task failure rates, and overall batch job throughput. You can set alerts for batch jobs that exceed their expected execution time or fail beyond a configurable error rate threshold.

Get Started

Full Visibility Into Your Nomad Cluster

No credit card required. Connect in minutes. Allocations, scheduler health, and resource utilization immediately.

Start Free Talk to an Engineer

Explore More

Related Integrations

View all 275+ integrations