As a Cloud Engineer, I feel like the purpose of my job is to look for the best performance and reliability standards while keeping costs in mind.
I enjoy managing, developing and improving the infrastructure of applications and services used every day by a large and diverse customer base — with the challenge of keeping it available 24/7.
I strive to automate every procedure, but I will never trade security for simplicity and speed.
Leading a 2-person SRE team, owning planning, retrospectives, and setting reliability standards to drive operational maturity across the cloud organisation.
Designed the organisation's first structured on-call rotation and drove the company-wide adoption of incident.io, defining alarm severities, privilege escalation procedures for active incidents, and establishing a formal incident management process from the ground up.
Architected and implemented cross-region, cross-account PostgreSQL native logical replication, reducing RPO from 1 day to near-zero with no middleware costs — cutting operational expenditure while maintaining full reliability and performance.
Part of a small team responsible for the maintenance and evolution of the company's entire cloud infrastructure hosted on AWS.
Improved reliability and scalability of microservices by evolving the infrastructure to handle traffic in a more distributed way.
Built an alerting system enabling the team to quickly identify and respond to incidents across the platform.
Deployed a VPN infrastructure with fine-grained permission groups so every department can access only its authorised resources.
Provided day-to-day infrastructure support to software developers, handling unexpected issues as they arise.
Worked as a consultant for a worldwide insurance company, managing its services IT infrastructure.
Owned the maintenance and evolution of two Kubernetes clusters and their CI/CD tooling, while supporting development teams with infrastructure needs.
Drove effort estimation and prioritisation of activities, taking ownership through to production release.
Designed a ticketing system for application developers to minimise interruptions while maintaining a high response time.
Served as on-call engineer two weeks per month, responding to production incidents and unpredictable events.