The Call You Don’t Want to Get at 2 AM I’ve walked into this exact situation twice in my career, and it’s the same story both times. A promising startup, six engineers, moving fast. Terraform was introduced early — which was the right call. State was stored locally, or maybe tossed into a single S3…
I’ve walked into enough platform engineering engagements to recognise the smell. It hits you before you even open a single .tf file. Someone says something like: “We have a main.tf that’s getting a bit long” — and when you finally pull up the repo, you’re staring at 4,000 lines of raw Terraform with hardcoded AMI…
The 3 AM pager alert. Slack channels exploding. A single, dreaded message cascades through the organization: “We’re seeing issues with us-east-1.” It’s the outage that every seasoned engineer knows is not a matter of if, but when. I’ve walked into companies where their entire multi-million dollar operation was pinned to a single AWS availability zone,…
I’ve walked into more than one “secure” startup only to find their crown jewels—the production database—exposed to the world with a publicly_accessible = true flag. The engineers usually give me the same line: “Don’t worry, the security group is locked down to our office IP.” That’s not security; it’s a landmine waiting for one compromised…