Apache Kafka Integration
Monitor consumer group lag, broker health, and partition metrics across your Kafka clusters. Get predictive lag alerts and AI root cause analysis before incidents impact your consumers.
How It Works
Deploy kafka-exporter via Helm
Add the TigerOps Helm chart to your cluster and set your Kafka broker addresses. The exporter auto-discovers consumer groups, topics, and broker nodes.
Configure Remote Write
Point the exporter to your TigerOps remote-write endpoint. All consumer group lag, partition offsets, and broker JVM metrics flow in within minutes.
Set Lag Thresholds
Define consumer lag SLOs per topic or consumer group. TigerOps fires alerts when lag exceeds your threshold and predicts lag growth with AI forecasting.
Correlate with Upstream Services
TigerOps automatically correlates Kafka lag spikes with producer throughput drops, consumer CPU exhaustion, or downstream database slow queries.
What You Get Out of the Box
Consumer Group Lag Tracking
Per-partition and aggregate consumer group lag with historical trend analysis. TigerOps alerts when lag growth rate predicts an SLO breach before it happens.
Broker Health Monitoring
Track under-replicated partitions, offline partitions, active controller count, and broker request handler idle percentage across your entire cluster.
Partition Metrics
Per-topic and per-partition byte rates, message rates, leader election rates, and log end offset tracking for complete partition-level visibility.
Producer Throughput
Monitor producer request rates, batch sizes, compression ratios, and error rates. Correlate producer slowdowns with downstream consumer lag automatically.
JVM & OS Metrics
Kafka broker JVM heap usage, GC pause times, thread counts, and OS-level CPU and network I/O for complete broker resource visibility.
AI Lag Root Cause Analysis
When consumer lag spikes, TigerOps AI analyzes the correlated signals — producer slowdowns, consumer restarts, partition rebalances — and surfaces the root cause.
Helm Values for kafka-exporter
Deploy the TigerOps kafka-exporter to your Kubernetes cluster with these Helm values.
# TigerOps kafka-exporter Helm values
# helm repo add tigerops https://charts.atatus.net
# helm install kafka-exporter tigerops/kafka-exporter -f values.yaml
kafkaExporter:
brokers:
- kafka-broker-0.kafka.svc.cluster.local:9092
- kafka-broker-1.kafka.svc.cluster.local:9092
- kafka-broker-2.kafka.svc.cluster.local:9092
# TLS configuration (optional)
tls:
enabled: true
caFile: /etc/kafka-tls/ca.crt
certFile: /etc/kafka-tls/client.crt
keyFile: /etc/kafka-tls/client.key
# SASL authentication (optional)
sasl:
enabled: true
mechanism: SCRAM-SHA-512
username: tigerops-exporter
passwordSecret:
name: kafka-sasl-secret
key: password
remoteWrite:
endpoint: https://ingest.atatus.net/api/v1/write
bearerToken: "${TIGEROPS_API_KEY}"
# Send metrics every 15 seconds
scrapeInterval: 15s
# Consumer groups to monitor (empty = monitor all)
consumerGroups:
- payment-processor
- order-events-consumer
- analytics-pipeline
# Topics to monitor (empty = monitor all)
topics:
- orders
- payments
- user-events
# Alert thresholds (applied as recording rules)
alerts:
consumerLagCritical: 100000 # messages
consumerLagWarning: 10000
underReplicatedPartitions: 1
offlinePartitions: 0Common Questions
Which Kafka versions does TigerOps support?
TigerOps supports Apache Kafka 2.x and 3.x via the kafka-exporter (JMX-based). AWS MSK, Confluent Cloud, and Redpanda are also supported with their respective metric endpoints. The Helm chart configures the exporter automatically.
How do I monitor consumer lag without JMX access?
TigerOps can use the Kafka Admin API (no JMX required) to poll consumer group offsets directly. Configure the tigerops-kafka-exporter with adminClientMode: true in your Helm values to use this approach.
Can TigerOps alert me before consumer lag becomes critical?
Yes. TigerOps includes predictive lag alerting — it computes the lag growth rate and forecasts when the lag will breach your configured threshold. You receive an early warning with estimated time to breach.
Does TigerOps support Confluent Schema Registry metrics?
Yes. The TigerOps Confluent integration includes Schema Registry subject counts, serialization error rates, and compatibility check latency alongside your standard Kafka broker and consumer metrics.
How are Kafka alerts correlated with other services?
TigerOps uses its AI correlation engine to link Kafka consumer lag events with traces from the consuming services, database query latency from downstream stores, and producer error rates from upstream services — giving full context in one incident.
Stop Discovering Kafka Lag After the Fact
Predictive lag alerts, broker health monitoring, and AI root cause analysis. Deploy in 5 minutes.