Lead Principal Software Engineer | DevOps & SRE Expert | System Reliability & Observability
With 20+ years of hands-on experience in software engineering, infrastructure, and system reliability, I build and maintain high-availability, scalable systems, with deep expertise in observability, CI/CD, and modern cloud architecture.
Since I was a kid, I have been curious about technology. I started by typing BASIC commands on a Commodore VIC-20 and later built my first PC with my older brother. These early experiences gave me a strong interest in both hardware and software, even when mistakes (like formatting a hard drive or burning a processor) taught me valuable lessons.
I have always been interested in numbers, metrics, and KPI. This focus on measurement naturally led me to my first observability project in my current company more than 10 years ago, using early versions of Grafana, Prometheus, and Collectd with Ansible on bare metal. Later, I helped migrate our large monolithic system to Kubernetes, writing Dockerfiles, Helm charts, which led me to embrace DevOps and SRE practices at the core of my work.
Today, AI and ML are essential tools for software engineers to develop faster and more reliably. They are also valuable for sizing and forecasting, which I was able to implement to improve the resource usage model of my company’s applications.
Here are the areas I excel in and the tools I use regularly:
| Domain | Technologies & Tools |
|---|---|
| Cloud & Infrastructure | AWS, Kubernetes, Docker, Terraform, Ansible |
| Reliability, Observability & Monitoring | Prometheus, Grafana, ELK, Logging / Tracing / Metrics stacks |
| CI/CD & Automation | GitHub Actions, GitLab CI/CD, GitOps, Flux, Jenkins, automated testing |
| Backend & Systems Engineering | Go, Java, Python, React.js, bash, REST & gRPC services, microservices, event-driven systems |
| DevOps / SRE Practices | Incident response, on-call best practices, SLA/SLO/SLI design, resilience, performance tuning |
| FinOps | AWS Billing, dimensioning/sizing engineering, cost optimizations, billing |
| DevSecOps | Kyverno, Falco, SAST |
| Storage | SAN, NAS, Ceph, FSx, EFS, EBS |
Here are some of my repositories I’m proud to showcase:
vpr-exporter– FinOps tool to optimize infra cost by aligning requests with real usage.resource-model-exporter– ML tool to generate a model of resource consumption.service-availability-exporter– A Prometheus exporter that aggregates and exposes service availability metrics .grafana-dashboards– Observability Grafana dashboards in many different area (using vpr-exporter/node-exporter based on prometheus and AWS datasources)
Here are some of my contributions to the observability eco-system:
Grafana– that I love since version 2.0 !Jenkins Exporter– to get statistics metrics from Jenkins CI.Logstash Exporter– to get statistics metrics from Logstash.
- Evolving reliability practices: service level objectives (SLOs), error budgets, resilience engineering
- Deepening involvement in observability stack integrations (OpenTelemetry, tracing, distributed logging)
- Architecture for large-scale systems: multi-region, microservices, serverless, fault tolerance, HPC
- Cloud security best practices: IAM, Infrastructure Security, Data Protection
- Apetite for AI / MLOps: infrastructure resource usage prediction
- Mentoring / sharing knowledge: writing about best practices, contributing to open source, speaking
My 2 Lord Kelvin's mantras:
“Measure is to know.”
“If you cannot measure it, you cannot improve it.”
Thanks for dropping by. Feel free to explore my repos, open issues / PRs, or reach out if you want to collaborate.


