What's New in TigerOps
We ship fast. Every release makes TigerOps smarter, faster, and more autonomous. Subscribe to get updates straight to your inbox.
AI SRE Agent: Multi-Step Runbook Execution
The AI SRE Agent can now execute complex, multi-step runbooks with conditional branching. Define runbooks in YAML with if/else logic, parallel steps, and rollback handlers — the agent will execute them autonomously with full audit logging.
- ✓YAML-based runbook DSL with conditional branching and loops
- ✓Parallel step execution with configurable concurrency limits
- ✓Automatic rollback on step failure with configurable retry policies
- ✓Dry-run mode for testing runbooks before enabling autonomous execution
- ✓Full audit trail: every action logged with reasoning and output
- ✓New runbook library with 50+ pre-built templates for common incidents
Causal Graph: Deployment Impact Analysis
The causal graph engine now tracks deployments as first-class events and automatically correlates post-deploy anomalies with specific code changes. TigerOps will surface the precise commit that caused a regression within seconds of deploy.
- ✓Native GitHub, GitLab, and Bitbucket deployment event ingestion
- ✓Automatic correlation of error rate changes with deployment windows
- ✓Per-commit error attribution with blame diff viewer
- ✓Deployment health score with configurable auto-rollback triggers
- ✓Slack and PagerDuty notifications enriched with deployment context
Log Intelligence: Semantic Search & Pattern Clustering
Log search is now powered by a fine-tuned embedding model that understands log semantics, not just keywords. Search for "payment processing errors" and surface relevant logs even if they never contain those exact words. Automatic pattern clustering groups similar log lines to cut noise by 90%.
- ✓Vector-based semantic log search with sub-500ms query latency
- ✓Automatic log pattern detection and clustering using ML
- ✓Cluster diff: compare log patterns across deployments or time windows
- ✓Anomalous log volume detection with per-service baselines
- ✓Log-to-trace correlation: click any log line to jump to its parent trace
- ✓Structured log extraction for unstructured log formats using LLM parsing
Performance: 40% Query Latency Reduction
After six weeks of profiling and re-architecting our query layer, we've cut p99 dashboard query latency by 40% and p50 by 55%. Large time-range queries on high-cardinality metrics that previously timed out now return in under 2 seconds.
- ✓Rewrote the columnar storage engine with vectorized query execution
- ✓Adaptive materialized view system for common dashboard queries
- ✓Smarter query planning with cardinality-aware filter pushdown
- ✓Parallel fanout queries across storage shards with result merging
- ✓Reduced memory allocation by 30% in the metrics aggregation pipeline
SLO Manager: Burn Rate Alerts & Error Budget Forecasting
The SLO Manager now supports multi-window burn rate alerting and AI-powered error budget forecasting. Get alerted when you're burning error budget too fast — hours before you breach your SLO — and see a forecast of when the budget will be exhausted.
- ✓Multi-window burn rate alerts (1h/6h windows configurable)
- ✓Error budget forecast with 95% confidence intervals
- ✓Per-SLO alert routing to separate on-call schedules
- ✓SLO dashboard with rolling 30/90-day historical views
- ✓Automatic SLO suggestions based on historical service behavior
- ✓SLO compliance reports with PDF export for quarterly reviews
Bug Fixes & Reliability Improvements
This release addresses several issues reported by customers following the v4.5.0 rollout, including edge cases in trace sampling, Kubernetes pod label discovery, and dashboard rendering performance.
- ✗Fixed: Trace sampling rate not applying correctly when using head-based sampling with custom rules
- ✗Fixed: Kubernetes pod labels not being discovered when pods are in Terminating state
- ✗Fixed: Dashboard panels with >50 series causing browser memory spike on initial load
- ✗Fixed: Webhook alerts failing silently when the target URL returned a 301 redirect
- ✗Fixed: SAML SSO login loop for organizations with custom domain mappings
- ✓Improved: Agent reconnect logic is now more resilient to transient network partitions
Want to Shape What We Build Next?
Our most impactful features come directly from customer conversations. Tell us what would make TigerOps a game-changer for your team.