Database Migrations in Production: The Guide Nobody Gives You
Let's be real - most tutorials about database migrations show you how to run rails db:migrate or npm run migrate on your laptop and call it a day. But when it comes to production? Suddenly everyone gets mysteriously quiet about the chaos that can ensue.
You've built your app with Claude's help, shipped it with Cursor, and now you need to add a column to your users table with 2 million rows. Your heart rate spikes. Your palms get sweaty. And rightfully so - because production database migrations can be career-defining moments (and not always in a good way).
The Horror Stories You Need to Hear
Before we dive into solutions, let's talk about what happens when migrations go wrong:
- Table locks that bring your entire app down - Your innocent
ALTER TABLEstatement locks the table for 45 minutes while your users rage-quit - Rollback nightmares - You can't easily undo that column you just dropped (along with all its data)
- Memory explosions - Your migration tries to process millions of rows at once and crashes your database
- Deployment deadlocks - Your app expects the new column that doesn't exist yet, or vice versa
The worst part? Most of these disasters are completely preventable with the right approach.
The Safe Migration Playbook
Rule #1: Backwards Compatibility is Your Safety Net
Your migrations should work with both the old and new versions of your application code. This means:
-- Bad: This breaks the old app immediately
ALTER TABLE users DROP COLUMN legacy_field;
-- Good: Add the new column first, deploy, then remove old one later
ALTER TABLE users ADD COLUMN new_field VARCHAR(255);
Deploy in phases:
- Add new columns/tables (old app ignores them)
- Deploy app code that uses both old and new schemas
- Migrate data from old to new columns
- Deploy app code that only uses new schema
- Drop old columns/tables
Rule #2: Know Your Lock Levels
Different database operations have different lock requirements:
PostgreSQL Lock Levels:
-- These are generally safe (minimal locking)
ADD COLUMN (with default NULL)
CREATE INDEX CONCURRENTLY
DROP INDEX
-- These can be dangerous (table-level locks)
ADD COLUMN (with NOT NULL default)
ALTER COLUMN TYPE
DROP COLUMN
ADD CONSTRAINT (without VALID flag)
MySQL Lock Behavior:
-- Safe operations (online DDL)
ADD COLUMN (most cases)
DROP INDEX
CREATE INDEX
-- Potentially blocking
CHANGE COLUMN TYPE
ADD COLUMN (with specific defaults)
DROP COLUMN
Rule #3: Size Matters - A Lot
For large tables (>1M rows), even "safe" operations can cause problems:
-- Instead of this massive operation
UPDATE users SET status = 'active' WHERE status IS NULL;
-- Do this in batches
UPDATE users SET status = 'active'
WHERE status IS NULL
LIMIT 1000;
For really large updates, use a background job:
# Python/Django example
def migrate_user_status():
batch_size = 1000
offset = 0
while True:
users = User.objects.filter(status__isnull=True)[offset:offset+batch_size]
if not users:
break
for user in users:
user.status = 'active'
user.save()
offset += batch_size
time.sleep(0.1) # Be nice to your database
The Production Migration Checklist
Before you run that migration in production, go through this checklist:
Pre-Migration
- Test on a production-sized dataset - Not your 10-row dev database
- Estimate execution time - How long will this take on your actual data?
- Check for blocking queries - What else is running that might conflict?
- Plan your rollback strategy - How will you undo this if things go sideways?
- Schedule during low traffic - 3 AM migrations are your friend
During Migration
- Monitor actively - Don't just run it and hope
- Watch for locks - Use
SHOW PROCESSLIST(MySQL) orpg_stat_activity(PostgreSQL) - Check application metrics - Are response times spiking?
- Have your rollback ready - One click away, not buried in documentation
Post-Migration
- Verify data integrity - Did everything migrate correctly?
- Update statistics - Help your database optimizer understand the new schema
- Monitor for a few hours - Some issues only surface under real load
Advanced Techniques for Zero-Downtime Migrations
Ghost Tables for Major Schema Changes
For massive table restructures, create a new table and migrate data gradually:
-- Create new table with desired schema
CREATE TABLE users_new (
id BIGINT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Migrate data in batches
-- Use triggers to keep tables in sync during migration
-- Swap table names atomically when done
RENAME TABLE users TO users_old, users_new TO users;
Blue-Green Database Deployments
For the truly paranoid (and wise):
- Create a replica of your production database
- Run migrations on the replica
- Switch your application to use the replica
- Keep the original as backup
Using Migration Tools
Consider tools that make this easier:
- gh-ost (GitHub's online schema migration tool)
- pt-online-schema-change (Percona Toolkit)
- pgslice (PostgreSQL partitioning tool)
When Things Go Wrong: Emergency Procedures
Signs Your Migration is in Trouble
- Response times suddenly spike
- Database CPU maxes out
- Lock wait timeouts in your application logs
- Users complaining about errors
Emergency Response
- Don't panic (easier said than done)
- Check if you can kill the migration safely - Some operations can't be interrupted
- Communicate with users - Let them know you're working on it
- Execute your rollback plan - You did prepare one, right?
- Document what happened - Future you will thank present you
Making Migrations Part of Your Deployment Pipeline
At DeployMyVibe, we see too many developers treating migrations as afterthoughts. Integrate them into your CI/CD pipeline:
# GitHub Actions example
name: Deploy with Migrations
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Run database migrations
run: |
# Run migrations first
./migrate.sh
- name: Deploy application
run: |
# Deploy only after successful migration
./deploy.sh
- name: Verify deployment
run: |
# Health checks to ensure everything works
./health_check.sh
The Bottom Line
Database migrations in production don't have to be terrifying. They require respect, planning, and the right techniques. Start with small, backwards-compatible changes. Test thoroughly. Have a rollback plan. Monitor actively.
And remember - if you're spending more time worrying about infrastructure than building your app, maybe it's time to let someone else handle the DevOps complexity. That's literally why we exist.
Because at the end of the day, you should be focused on shipping features that users love, not debugging why your migration locked your database for three hours at 2 PM on a Tuesday.
Ready to deploy without the database drama? Check out how DeployMyVibe handles migrations safely in our managed hosting environment.
Alex Hackney
DeployMyVibe