Senior DevOps Engineer (AWS, Kubernetes, Linux)
Binance is a leading global blockchain ecosystem behind the world’s largest cryptocurrency exchange by trading volume and registered users. We are trusted by over 250 million people in 100+ countries for our industry-leading security, user fund transparency, trading engine speed, deep liquidity, and an unmatched portfolio of digital-asset products. Binance offerings range from trading and finance to education, research, payments, institutional services, Web3 features, and more. We leverage the power of digital assets and blockchain to build an inclusive financial ecosystem to advance the freedom of money and improve financial access for people around the world.
Responsibilities
- Own ultra-low-latency EC2 fleets - Design cluster placement groups with ENA / SR-IOV networking.
- Kernel-level performance tuning - Apply CPU pinning, NUMA alignment, IRQ affinity, hugepages, and TCP/UDP sysctl tweaks to flatten tail latency.
- Immutable infrastructure & automated rollouts - Build Packer AMIs and Terraform Auto Scaling Groups; run GitLab/Jenkins pipelines with blue-green or canary deploys and sub-2-minute automatic rollbacks.
- High-throughput messaging & gateways - Operate Kafka clusters (partition/ISR tuning, rack awareness) and Nginx WebSocket edges serving 100 k + clients with single-digit-ms fan-out.
- Network integrity - Run packet-loss analysis and MTU/ECN/queue-depth tuning; enforce least-privilege security-group micro-segmentation.
- Observability & SLO stewardship - Instrument Prometheus/Grafana dashboards for order-ack latency, queue depth, reject rate; write Alertmanager rules driven by p95/p99 error-budget burn.
- Reliability testing & incident response - Schedule chaos/load drills; take part in 24 × 7 on-call, use perf/eBPF/FlameGraphs/tcpdump for µs-level RCA, and publish post-mortems with remediation actions.
- Capacity planning around macro events - Pre-warm spot pools and leverage Savings Plans to balance headroom and cost.
- Automation & tooling - Write Go/Python scripts for bootstrap, health probes, latency regression tests, and one-click remediation.
- Cross-team collaboration - Pair with Java/Rust engineers and quants to profile hot-path code, and eliminate bottlenecks without trading downtime.
Requirements
- Linux low-latency tuning – CPU pinning, NUMA awareness, IRQ affinity, TCP/UDP stack tweaks, hugepages
- AWS operations at scale – EKS, EC2, VPC, NLB/ALB, Auto Scaling, multi-AZ fail-over, cost & quota managementInfrastructure as Code / GitOps – Terraform (modular state)
- CI/CD pipelines – GitLab CI or Jenkins; blue-green / canary deploys, sub-2-minute rollbacks, latency smoke-test gates
- Observability – Prometheus + Grafana, Alertmanager, high-cardinality metrics, centralized log aggregation, eBPF tracing for µs-level hotspots
- High-throughput messaging – Kafka cluster operations (partition strategy, ISR tuning, < 3 ms end-to-end), Nginx WebSocket terminationTrading-grade networking – ENA/SR-IOV, packet-loss analysis, security-group hardening
- Performance & reliability engineering – perf, FlameGraph, chaos/load testing, p95/p99 latency SLO ownership
- Automation & scripting – Python or Go for tooling, incident remediation, environment bootstrap
- Bonus – Rust/Go code familiarity, CNCF/AWS certifications, XDP/DPDK experience for kernel-bypass networking
Binance is committed to being an equal opportunity employer. We believe that having a diverse workforce is fundamental to our success.By submitting a job application, you confirm that you have read and agree to our Candidate Privacy Notice.
Originally posted on Himalayas
Apply To this Job