AWS production secrets they don't teach in cert courses (learned the expensive way)

Most AWS courses are optimized for passing exams. This guide is optimized for not getting paged at 3am because you gave a Lambda function admin access.

Root account: basically the One Ring

What cert courses say: "Enable MFA on your root account."

What actually happens: Devs use root for everything because it's easier, then someone commits the credentials to GitHub and suddenly you're explaining to legal why customer data is on Pastebin.

Your root account bypasses literally everything. SCPs? Doesn't care. Permission boundaries? Laughs in IAM. It's not a user account—it's a factory reset button for your entire AWS presence.

The actual setup (copy-paste this):

  1. Enable MFA immediately (hardware key or authenticator app, never SMS)
  2. Create an IAM admin user for daily work
  3. Put root creds in 1Password/physical safe/wherever you keep nuclear codes
  4. Use a distribution list for root email (team-security@company.com), not your personal Gmail
  5. Turn on CloudTrail before you create anything else

That last one? 60% of "who created this mystery RDS instance" investigations die because CloudTrail wasn't logging from day one.

IAM: the service that will humble you

Classrooms: "Here's AdministratorAccess, go build stuff!"

Production: "Why does this Lambda need S3:* when it reads from one bucket?"

Roles vs Users (finally explained correctly)

IAM Users: Permanent credentials for humans. They get access keys that live forever and eventually leak.

IAM Roles: Temporary credentials for services. Auto-rotate, can't leak long-term keys.

When your EC2 instance needs S3 access, you don't SSH in and configure credentials. You attach a role. The instance assumes it, gets temporary creds, job done. Same pattern for Lambda → DynamoDB, ECS → Secrets Manager, everything.

The footguns nobody mentions:

Over-permissioned service roles: That Lambda doesn't need s3:*. It needs s3:GetObject on arn:aws:s3:::your-bucket/*. Surgical precision or get wrecked.

Hardcoded credentials in code: The number of leaked AWS keys in public GitHub repos would make you weep. Use roles or environment variables from Secrets Manager.

Resource-based policies: S3 buckets and KMS keys have their own policies that can conflict with IAM policies. Both have to allow the action or it fails.

Contractor access that never expires: Created an IAM user for that freelancer in March? It's November and they still have prod access. Organizations implementing auto-expiration reduce unauthorized access by 40%.

The 50% rule

Half of all AWS security incidents trace back to overly permissive IAM. Not breaches—just giving Lambdas admin access because the docs were confusing.

Fix: CloudTrail logging + Access Analyzer + quarterly IAM audits. Boring, effective, prevents incidents.

Region switcher: the UI element that ruins your day

You create an EC2 instance in us-east-1. Console defaults to ap-south-1 (you're in Delhi). Instance is gone. You panic. CloudFormation failed? Billing bug?

Nah, you're just looking in the wrong region.

Why this matters:

  • Most resources are regional (EC2, RDS, Lambda, VPC)
  • Some are global (IAM, Route 53, CloudFront) but manage regional resources
  • Billing metrics only show in us-east-1 regardless of where stuff runs
  • Data transfer between regions costs money

When troubleshooting, "check the region switcher" should be step one. It never is.

Practical strategy:

  • Pick regions based on user latency, data residency laws, service availability
  • Tag everything with region identifiers
  • Set up CloudTrail across all regions
  • Build mental checklist: "Did I check the region?"

For HA/DR designs, you'll replicate across regions. But nothing replicates automatically—Route 53 can route to healthy regions, but you need Lambda or custom automation to sync S3 buckets or RDS replicas.

VPC networking: where theory meets production

Default VPC is a convenience trap. Public subnets, internet gateway, permissive routing—all pre-configured. Great for learning. Terrible for security.

Production VPCs take weeks to design because mistakes are expensive and hard to fix.

Public vs private subnets (the real difference):

Public subnets:

  • Route table: 0.0.0.0/0 → igw-xxx (internet gateway)
  • Resources get public IPs, talk directly to internet
  • Use for: Load balancers, bastion hosts, NAT gateways

Private subnets:

  • No internet gateway route
  • Need NAT gateway in public subnet for outbound internet (updates, APIs)
  • Use for: App servers, databases, most Lambda functions

The mistake that costs money: RDS in public subnets "for testing". Even with security groups blocking external access, this violates compliance and creates attack surface.

CIDR blocks and capacity planning

Courses teach CIDR notation (/16, /24). Don't explain capacity:

  • /16 = 65,536 IPs (minus 5 AWS reserves per subnet)
  • /24 = 256 IPs (minus 5 = 251 usable)

AWS reserves first 4 IPs and last IP in every subnet (network, gateway, DNS, broadcast).

Why it matters: Create a VPC with /24 CIDR and you can't expand it later without tearing everything down. Always use /16 for VPC, carve out /24 subnets.

Security Groups vs NACLs

Both control traffic. Different layers:

Security Groups:

  • Instance level
  • Stateful (return traffic auto-allowed)
  • Only allow rules
  • Default: deny all inbound, allow all outbound

Network ACLs:

  • Subnet level
  • Stateless (need explicit rules both ways)
  • Allow and deny rules
  • Default: allow everything

In practice, use security groups for most things. NACLs for subnet-wide blocks (ban IP ranges, etc.).

The real certification strategy

Certs are fine. They prove you can memorize service limits. But production readiness comes from:

  1. Breaking things in safe environments
  2. Reading post-mortems from companies that broke things in production
  3. Building the muscle memory to check IAM policies before blaming the service
  4. Actually using CloudTrail logs to debug instead of guessing

AWS Golden Jacket? Cool flex. Being the dev who ships features without creating security incidents? Better flex.


Related: AWS Certification notes PDF, AWS Certified Cloud Practitioner, AWS Fast Track, AWS women program, AWS ETC points, AWS Institute

T
Written by TheVibeish Editorial