Rollback Strategies: How to Undo a Bad Deployment in Seconds
The Deployment That Went Wrong
You've just deployed your latest feature to production. The build passed, the tests were green, and you were feeling good about that AI-generated code that Claude helped you write. Then your monitoring dashboard lights up like a Christmas tree. Error rates spike. Users start complaining. Your heart sinks.
Welcome to every developer's nightmare scenario. But here's the thing - it doesn't have to be a nightmare if you have solid rollback strategies in place.
Why Rollbacks Matter More Than Ever
In the age of AI-assisted development, we're shipping code faster than ever before. Tools like Cursor and Bolt help us iterate rapidly, but with great velocity comes great responsibility. When you're deploying multiple times a day (as you should be), having a bulletproof rollback strategy isn't just nice to have - it's absolutely critical.
The golden rule of production incidents: The fastest way to fix a bad deployment is to undo it, not debug it live.
Strategy 1: Blue-Green Deployments
Blue-green deployment is like having a stunt double for your application. You maintain two identical production environments - only one serves live traffic at any time.
# Current traffic goes to 'blue' environment
# Deploy new version to 'green' environment
# Test green environment
curl -H "Host: myapp.com" http://green.myapp.internal/health
# Switch traffic to green (new version)
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'
# If something goes wrong, switch back to blue
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'
Rollback time: 5-10 seconds
The beauty of blue-green is that your rollback is instantaneous - you're just switching a load balancer setting. The downside? You need double the infrastructure, which means double the cost.
Strategy 2: Rolling Deployments with Quick Rollback
Rolling deployments gradually replace old instances with new ones. Most container orchestrators like Kubernetes make this dead simple:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
To rollback a rolling deployment:
# Check rollout history
kubectl rollout history deployment/myapp
# Rollback to previous version
kubectl rollout undo deployment/myapp
# Or rollback to specific revision
kubectl rollout undo deployment/myapp --to-revision=2
Rollback time: 30 seconds to 2 minutes
Rolling deployments use your existing infrastructure efficiently, but rollbacks take longer since you need to wait for new pods to start up.
Strategy 3: Feature Flags for Instant Rollbacks
Sometimes the fastest rollback is just turning off a feature:
// In your app code
if (featureFlag('new-checkout-flow')) {
return <NewCheckoutComponent />;
} else {
return <OldCheckoutComponent />;
}
# Instant "rollback" via feature flag
curl -X POST https://api.launchdarkly.com/api/v2/flags/new-checkout-flow/off \
-H "Authorization: Bearer $API_KEY"
Rollback time: 1-5 seconds
Feature flags give you the fastest possible rollback, but require planning ahead and can add complexity to your codebase.
Strategy 4: Database Migration Rollbacks
Code rollbacks are one thing, but what about database changes? This is where things get tricky:
-- Always write reversible migrations
-- migration_001_up.sql
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE;
-- migration_001_down.sql
ALTER TABLE users DROP COLUMN email_verified;
# Rollback database migration
migrate -path ./migrations -database $DATABASE_URL down 1
Pro tip: Never delete columns or tables in migrations. Mark them as deprecated first, then remove them in a future release after you're confident the rollback won't be needed.
Strategy 5: Canary Deployments with Automatic Rollback
Canary deployments let you test new versions with a small percentage of users:
# Istio VirtualService example
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
http:
- match:
- headers:
canary:
exact: "true"
route:
- destination:
host: myapp
subset: v2
- route:
- destination:
host: myapp
subset: v1
weight: 95
- destination:
host: myapp
subset: v2
weight: 5
Combine this with automated monitoring:
#!/bin/bash
# Simple canary rollback script
ERROR_RATE=$(curl -s "http://monitoring.internal/api/error_rate?service=myapp&version=v2")
if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )); then
echo "Error rate too high, rolling back canary"
kubectl patch virtualservice myapp --type='json' -p='[{"op": "replace", "path": "/spec/http/1/route/1/weight", "value": 0}]'
fi
Rollback time: 10-30 seconds (can be automated)
Building Your Rollback Playbook
Here's a practical checklist for setting up rollbacks:
Before You Deploy
- Tag your releases consistently
- Test your rollback procedure in staging
- Set up monitoring and alerting
- Define rollback criteria (error rates, response times)
- Document the rollback process
During an Incident
- Don't panic (easier said than done)
- Check if it's a partial or total outage
- Execute rollback first, debug later
- Communicate status to users
- Monitor the rollback progress
After the Rollback
- Confirm systems are stable
- Analyze what went wrong
- Fix the issue in your branch
- Plan the next deployment
The Human Factor
Technical strategies are only half the battle. The other half is having the discipline to actually execute them under pressure. When your app is down and users are angry, the temptation is to "just push a quick fix" instead of doing a proper rollback.
Resist this temptation. Quick fixes under pressure almost always make things worse.
Modern Tools Make It Easier
If you're using a managed deployment service (like DeployMyVibe), many of these rollback strategies come built-in. You get:
- One-click rollbacks from your dashboard
- Automated health checks
- Database migration management
- Monitoring and alerting out of the box
The key is having these systems in place before you need them, not scrambling to set them up during an outage.
Practice Makes Perfect
Here's a controversial opinion: you should intentionally break your production environment regularly to practice your rollback procedures. Netflix calls this "chaos engineering," and it works.
Set up a staging environment that mirrors production and practice your rollback scenarios monthly. When the real incident happens, muscle memory kicks in.
The Bottom Line
Fast rollbacks are your safety net in the high-velocity world of AI-assisted development. Whether you choose blue-green deployments, rolling updates, feature flags, or a combination, the important thing is to have a strategy and practice it.
Remember: the best rollback strategy is the one you've tested and can execute confidently at 2 AM when everything is on fire. Plan for failure, because in production, failure isn't a possibility - it's an inevitability.
Your future self (and your users) will thank you for taking the time to get this right.
Alex Hackney
DeployMyVibe