SRE Engineer – Observability
August 2023 – Present
Criteo
Limassol, Cyprus
- As part of the observability team, led the migration and consolidation of infrastructure during the merger of two companies. Successfully migrated over 50 teams and 600 alerts per hour from Zabbix, unifying the codebase, monitoring tools, and alerting system.
- Designed and implemented a CLI application to consolidate over 30 cron jobs, standardize the codebase, migrate secrets to Vault, and integrate with Kubernetes using vault-secrets-webhook for secure secret management.
- Developed an SLO framework using the Sloth tool, integrated with a Kubernetes operator built on the CRD SDK in Go, to efficiently manage vmalerts instances.
- Resolved an rsyslog throughput collapse on a bare-metal multi-DC log pipeline by diagnosing uneven NIC IRQ distribution and queue-parallelism-induced memory pressure (perf top showing page-fault dominance), then retuning imptcp and queue dequeue parameters.