About
I build reliable, repeatable infrastructure across AWS and on-prem: IaC, CI/CD, observability,
and
security-minded automation.
I validate patterns in my homelab before they hit production. If you want scalable builds, sane
deployments,
and fewer 3 a.m. incidents—you’re in the right place.
What we build, doesn't break.
Desired Role
Infrastructure / Automation Engineering
Engineering Manager
Focus
Automation • IaC • Reliability • Infra at Scale
Primary Environments
AWS • Onprem • Azure
Tooling
Terraform • Docker • Kubernetes • Python • Powershell
What I deliver
Repeatable Infrastructure
Portable Terraform patterns that query real env data instead of brittle tfvars.
Build & Release Systems
CI/CD that scales: ephemeral agents, caching, artifact pipelines, and clean promotion
paths.
Operational Visibility
Datadog / Prometheus / Solarwinds + dashboards + alerting that catch issues early and reduce
false alarms.
Security Hygiene
Least-privilege IAM, hardened images, sane logging, and audit-friendly change tracking.
Homelab-to-Cloud Continuity
Dev environments that mirror prod patterns: reverse proxy, TLS, segmented networks,
automation.
Pragmatic Automation
Scripts + workflows that remove toil: onboarding, self-healing, inventory, and status
reporting.
Contact
Recent updates
Personal GitHub contributions
Selected work
Infrastructure Self-Repair Automation
Intercepts critical alerts and runs safe, staged remediation: restarts, reboots, redeploys, and
proactive scaling
(compute/storage) based on thresholds—then reports outcomes back to engineers.
Global FSxN Replication Fabric
Multi-AZ, multi-region SnapMirror architecture with replication to/from on-prem. Designed for
predictable failover,
controlled RPO/RTO, and repeatable operations at scale.
Golden Image Pipelines (CIS-hardened)
Automated “gold image” AMI pipelines with CIS hardening, scheduled rebuilds, validation, and
global distribution
across accounts/regions to keep fleets current and consistent.
LLM-Assisted Log Parsing & Ops Routing
High-volume log parsing that detects probable failure patterns and routes findings into ops
tickets and/or
automation actions—reducing noisy triage and improving time-to-signal.