All Integrations
Cloudfly.toml + Machines API

Fly.io Integration

Monitor machine metrics, volume health, edge deployment latency, and global network performance across your Fly.io applications. Full multi-region visibility from a single dashboard.

Setup

How It Works

01

Configure fly.toml Metrics

Add the [metrics] stanza to your fly.toml to expose an application metrics endpoint. TigerOps scrapes it from within the Fly private network using Prometheus remote write, requiring no public exposure.

02

Connect the Fly API Token

Create a read-only Fly.io API token with the metrics:read scope. TigerOps uses the Fly Machines API and Prometheus-compatible metrics endpoint to collect machine, volume, and network data.

03

Deploy the TigerOps Machine

Optionally deploy a TigerOps collector Machine to your Fly organization that scrapes all apps in your organization from within the Fly private network. Zero public endpoints required.

04

Set Region-Aware Alerts

TigerOps labels all metrics with the Fly.io region code. Set per-region latency thresholds and machine health alerts to catch regional issues before they impact global users.

Capabilities

What You Get Out of the Box

Machine CPU & Memory

Per-machine CPU utilization, memory usage, and OOM kill events across all your Fly Machines. Group by app, region, or process group for targeted capacity analysis.

Edge Deployment Latency

Request latency, error rates, and throughput per Fly region. TigerOps builds a global latency heatmap showing which regions are experiencing degradation in real time.

Volume Health

Fly volume read/write IOPS, throughput, and fullness percentage. TigerOps alerts before volumes reach capacity and tracks I/O latency degradation for stateful workloads.

Machine Restart & Lifecycle

Machine start, stop, restart, and crash events with the reason code. TigerOps correlates machine restarts with memory OOM events, health check failures, and application errors.

Network & Anycast Metrics

Bytes sent and received per machine, anycast IP traffic distribution, and WireGuard tunnel health for private networking. Detect network saturation before it impacts performance.

Autoscaling & Scale Events

Track Fly autoscaling decisions, machine provisioning latency, and scale-to-zero transitions. TigerOps correlates scaling events with traffic patterns to validate your autoscale policy.

Configuration

fly.toml Metrics Config

Add the metrics stanza to your fly.toml to expose your application metrics endpoint for TigerOps to scrape.

fly.toml
# fly.toml — expose application metrics for TigerOps
app = "my-app"
primary_region = "iad"

[build]
  image = "my-org/my-app:latest"

[env]
  TIGEROPS_API_KEY    = "your_api_key"   # Use fly secrets set in production
  TIGEROPS_ENDPOINT   = "https://ingest.atatus.net/api/v1/write"
  TIGEROPS_SERVICE    = "my-app"

# Expose a Prometheus-compatible /metrics endpoint on port 9091
# TigerOps scrapes this from within the Fly private network
[metrics]
  port = 9091
  path = "/metrics"

[[services]]
  protocol   = "tcp"
  internal_port = 8080

  [[services.ports]]
    port = 443
    handlers = ["tls", "http"]

  [services.concurrency]
    type       = "requests"
    hard_limit = 200
    soft_limit = 150

# TigerOps collector — deploy to the same org to scrape via 6PN
# fly launch --image atatus/fly-collector:latest --name tigerops-collector
# fly secrets set TIGEROPS_API_KEY=your_key --app tigerops-collector
# fly secrets set SCRAPE_APPS="my-app,my-api,my-worker" --app tigerops-collector
FAQ

Common Questions

Does TigerOps support Fly.io private networking (6PN)?

Yes. The TigerOps collector Machine runs inside your Fly organization and uses the Fly 6PN private network to scrape application metrics endpoints without any public exposure. All metric collection happens over the encrypted private network.

Can TigerOps monitor Fly Machines that scale to zero?

Yes. TigerOps tracks machine lifecycle events including scale-to-zero transitions and cold-start latency. When a machine scales to zero, its last known metrics are preserved and the scale-to-zero event is recorded as a deployment marker.

Does TigerOps support Fly.io Postgres (managed)?

Yes. TigerOps monitors Fly Postgres clusters including replication lag, connection pool utilization, and query performance. Fly Postgres metrics are collected through the Fly Machines API and the PostgreSQL metrics endpoint exposed on the private network.

Can I monitor multiple Fly.io organizations in one TigerOps account?

Yes. You can connect multiple Fly API tokens from different organizations to a single TigerOps workspace. Resources from each organization are labeled with the organization slug for easy filtering and separation.

How does TigerOps handle Fly.io multi-region deployments?

TigerOps labels every metric with the fly_region tag and builds per-region dashboards automatically. You can compare P99 latency across regions, set per-region alert thresholds, and receive alerts that specify which region is impacted.

Get Started

Full Visibility Into Your Fly.io Applications

No credit card required. Connect in minutes. Machine metrics, edge latency, and volume health immediately.