stratomonitor arnaudlemaignen

👋 Hi, I’m Arnaud

Lead Principal Software Engineer | DevOps & SRE Expert | System Reliability & Observability

With 20+ years of hands-on experience in software engineering, infrastructure, and system reliability, I build and maintain high-availability, scalable systems, with deep expertise in observability, CI/CD, and modern cloud architecture.

Since I was a kid, I have been curious about technology. I started by typing BASIC commands on a Commodore VIC-20 and later built my first PC with my older brother. These early experiences gave me a strong interest in both hardware and software, even when mistakes (like formatting a hard drive or burning a processor) taught me valuable lessons.

I have always been interested in numbers, metrics, and KPI. This focus on measurement naturally led me to my first observability project in my current company more than 10 years ago, using early versions of Grafana, Prometheus, and Collectd with Ansible on bare metal. Later, I helped migrate our large monolithic system to Kubernetes, writing Dockerfiles, Helm charts, which led me to embrace DevOps and SRE practices at the core of my work.

Today, AI and ML are essential tools for software engineers to develop faster and more reliably. They are also valuable for sizing and forecasting, which I was able to implement to improve the resource usage model of my company’s applications.

🛠 My Strengths & Tech Stack

Here are the areas I excel in and the tools I use regularly:

Domain	Technologies & Tools
Cloud & Infrastructure	AWS, Kubernetes, Docker, Terraform, Ansible
Reliability, Observability & Monitoring	Prometheus, Grafana, ELK, Logging / Tracing / Metrics stacks
CI/CD & Automation	GitHub Actions, GitLab CI/CD, GitOps, Flux, Jenkins, automated testing
Backend & Systems Engineering	Go, Java, Python, React.js, bash, REST & gRPC services, microservices, event-driven systems
DevOps / SRE Practices	Incident response, on-call best practices, SLA/SLO/SLI design, resilience, performance tuning
FinOps	AWS Billing, dimensioning/sizing engineering, cost optimizations, billing
DevSecOps	Kyverno, Falco, SAST
Storage	SAN, NAS, Ceph, FSx, EFS, EBS

📌 Pinned Repositories

Here are some of my repositories I’m proud to showcase:

vpr-exporter – FinOps tool to optimize infra cost by aligning requests with real usage.
resource-model-exporter – ML tool to generate a model of resource consumption.
service-availability-exporter – A Prometheus exporter that aggregates and exposes service availability metrics .
grafana-dashboards – Observability Grafana dashboards in many different area (using vpr-exporter/node-exporter based on prometheus and AWS datasources)

🌟 Minor Contributions & Projects

Here are some of my contributions to the observability eco-system:

Grafana – that I love since version 2.0 !
Jenkins Exporter – to get statistics metrics from Jenkins CI.
Logstash Exporter – to get statistics metrics from Logstash.

📈 GitHub Statistics

🔭 What I’m Working On / Interested In

Evolving reliability practices: service level objectives (SLOs), error budgets, resilience engineering
Deepening involvement in observability stack integrations (OpenTelemetry, tracing, distributed logging)
Architecture for large-scale systems: multi-region, microservices, serverless, fault tolerance, HPC
Cloud security best practices: IAM, Infrastructure Security, Data Protection
Apetite for AI / MLOps: infrastructure resource usage prediction
Mentoring / sharing knowledge: writing about best practices, contributing to open source, speaking

📫 Let’s Connect

My 2 Lord Kelvin's mantras:

“Measure is to know.”
“If you cannot measure it, you cannot improve it.”

Thanks for dropping by. Feel free to explore my repos, open issues / PRs, or reach out if you want to collaborate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stratomonitor arnaudlemaignen

Achievements