Node.js in Production: PM2, Memory Leaks, and Crash Recovery
Running Node.js Apps in Production Like a Pro
You've built an amazing app with AI assistance, pushed it to production, and then... it crashes at 3 AM. Sound familiar? Welcome to the wild world of Node.js production environments.
While Node.js is fantastic for building fast, scalable applications, it can be a bit temperamental in production. Memory leaks, unexpected crashes, and the dreaded "it works on my machine" syndrome are all too common. But don't worry - with the right tools and strategies, you can tame your Node.js apps and keep them running smoothly.
Why PM2 is Your Production Best Friend
PM2 (Process Manager 2) is like having a reliable DevOps engineer watching over your Node.js applications 24/7. It's a production process manager that handles everything from keeping your app alive to load balancing across multiple CPU cores.
Installing and Basic Setup
npm install -g pm2
# Start your app
pm2 start app.js --name "my-awesome-app"
# Or use an ecosystem file
pm2 start ecosystem.config.js
Here's a sample ecosystem configuration that'll make your life easier:
module.exports = {
apps: [{
name: 'my-app',
script: './app.js',
instances: 'max', // Use all CPU cores
exec_mode: 'cluster',
env: {
NODE_ENV: 'production',
PORT: 3000
},
// Restart if memory usage exceeds 500MB
max_memory_restart: '500M',
// Auto-restart on file changes (disable in production)
watch: false,
// Restart delay
restart_delay: 4000,
// Max restart attempts
max_restarts: 3,
// Time window for max_restarts
min_uptime: '10s'
}]
}
PM2 Monitoring and Management
PM2 comes with built-in monitoring that's actually useful:
# Real-time monitoring
pm2 monit
# Detailed app info
pm2 show my-app
# Logs (because you'll need them)
pm2 logs my-app --lines 100
# Restart strategies
pm2 restart my-app
pm2 reload my-app # Zero-downtime restart
pm2 gracefulReload my-app
The Memory Leak Monster
Memory leaks in Node.js are sneaky. Your app runs fine for hours or days, then suddenly crashes with an out-of-memory error. Here's how to hunt them down and prevent them.
Common Culprits
Global Variables That Keep Growing
// DON'T do this
let globalCache = [];
app.get('/api/data', (req, res) => {
// This array grows forever
globalCache.push(someData);
res.json(result);
});
// DO this instead
const LRU = require('lru-cache');
const cache = new LRU({ max: 1000, ttl: 1000 * 60 * 5 }); // 5 min TTL
Event Listener Leaks
// DON'T forget to clean up
const EventEmitter = require('events');
const emitter = new EventEmitter();
// This creates a new listener on every request - bad!
app.get('/api/stream', (req, res) => {
emitter.on('data', handleData); // Leak!
});
// DO clean up properly
app.get('/api/stream', (req, res) => {
const handleData = (data) => { /* handle data */ };
emitter.on('data', handleData);
res.on('close', () => {
emitter.removeListener('data', handleData);
});
});
Memory Monitoring and Debugging
Add memory monitoring to your app:
// Simple memory usage logging
setInterval(() => {
const used = process.memoryUsage();
console.log('Memory usage:', {
rss: Math.round(used.rss / 1024 / 1024) + 'MB',
heapTotal: Math.round(used.heapTotal / 1024 / 1024) + 'MB',
heapUsed: Math.round(used.heapUsed / 1024 / 1024) + 'MB',
external: Math.round(used.external / 1024 / 1024) + 'MB'
});
}, 30000); // Log every 30 seconds
For deeper debugging, use the built-in inspector:
# Start with inspector
node --inspect app.js
# Or attach to running PM2 process
pm2 start app.js --node-args="--inspect"
Crash Recovery That Actually Works
Crashes happen. The key is handling them gracefully and getting back up quickly.
Graceful Shutdown Handling
process.on('SIGTERM', gracefulShutdown);
process.on('SIGINT', gracefulShutdown);
function gracefulShutdown(signal) {
console.log(`Received ${signal}. Starting graceful shutdown...`);
server.close(() => {
console.log('HTTP server closed.');
// Close database connections
mongoose.connection.close(() => {
console.log('Database connection closed.');
process.exit(0);
});
});
// Force close after 10 seconds
setTimeout(() => {
console.error('Forceful shutdown');
process.exit(1);
}, 10000);
}
Uncaught Exception Handling
// Last resort error handling
process.on('uncaughtException', (err) => {
console.error('Uncaught Exception:', err);
// Log to external service (Sentry, LogRocket, etc.)
logger.fatal(err, 'Uncaught exception');
// Graceful shutdown
gracefulShutdown('uncaughtException');
});
process.on('unhandledRejection', (reason, promise) => {
console.error('Unhandled Rejection at:', promise, 'reason:', reason);
// Log and potentially restart
logger.error({ reason, promise }, 'Unhandled promise rejection');
});
Health Checks and Monitoring
Implement health check endpoints that PM2 and load balancers can use:
app.get('/health', async (req, res) => {
try {
// Check database connection
await db.ping();
// Check external APIs
// await externalService.ping();
res.status(200).json({
status: 'healthy',
uptime: process.uptime(),
memory: process.memoryUsage(),
timestamp: new Date().toISOString()
});
} catch (error) {
res.status(503).json({
status: 'unhealthy',
error: error.message
});
}
});
Production-Ready PM2 Configuration
Here's a battle-tested PM2 ecosystem config:
module.exports = {
apps: [{
name: 'production-app',
script: './dist/server.js',
instances: 'max',
exec_mode: 'cluster',
// Environment variables
env_production: {
NODE_ENV: 'production',
PORT: 3000
},
// Restart policy
max_memory_restart: '1G',
restart_delay: 4000,
max_restarts: 10,
min_uptime: '10s',
// Logging
log_file: './logs/combined.log',
out_file: './logs/out.log',
error_file: './logs/error.log',
log_date_format: 'YYYY-MM-DD HH:mm Z',
// Advanced options
kill_timeout: 3000,
listen_timeout: 8000,
// Health monitoring
health_check_grace_period: 3000
}]
}
Monitoring and Alerts
Set up PM2's built-in monitoring or integrate with external services:
# PM2 Plus (web monitoring)
pm2 plus
# Custom monitoring script
pm2 start monitor.js --cron "*/5 * * * *" # Every 5 minutes
The Bottom Line
Running Node.js in production doesn't have to be a nightmare. With PM2 handling process management, proper memory leak prevention, and robust crash recovery, your apps can run reliably for months without intervention.
The key is being proactive: monitor memory usage, implement health checks, handle errors gracefully, and always have a restart strategy. Your 3 AM self will thank you.
Remember, the best production setup is one you don't have to think about. Set it up right once, and focus on building features instead of fighting fires.
Alex Hackney
DeployMyVibe