Site Reliability Engineer
The BitMEX Infrastructure team sits at the core of the business and is responsible for the reliability and scalability of all the services that power...
The BitMEX Infrastructure team sits at the core of the business and is responsible for the reliability and scalability of all the services that power the platforms and its developers. In only a few years, BitMEX became the leading crypto-products trading platform worldwide, and handles ten of thousands low latency transactions per second, representing several billions of dollars traded every day. We specialize in systems, whether it be networking, the Linux kernel, or some more specific interest in scaling, algorithms, or distributed systems.
Responsibilities:
- Be on a Pager rotation to respond to BitMEX availability incidents and provide support for service engineers with customer incidents.
- Run our infrastructure with Chef, Terraform and Kubernetes.
- Make monitoring and alerting alert on symptoms and not on outages.
- Document every action so findings turn into repeatable actions–and then automation.
- Improve the deployment process to make it as boring as possible.
- Design, build and maintain core infrastructure pieces that allow BitMEX scaling to support hundred of thousands of concurrent users.
- Debug production issues across services and levels of the stack.
- Plan the growth of BitMEX’s infrastructure.
About You:
- Think about systems - edge cases, failure modes, behaviors, specific implementations.
- Have experience with Nginx, HAProxy, Docker, Kubernetes, Terraform, or similar technologies
- 6+ years of professional experience, with a proven track record of designing, implementing, managing, and testing infrastructure at scale on AWS for high value environments,
- 3+ years of professional experience, with a proven stack record of designing, implementing, managing, and testing infrastructure at scale for high value environments,
- Strong engineering skill set with a firm grasp of fundamental Computer Science principles and a modular, maintainable, agile & test-driven approach to software development
- Capacity to multitask and give equal attention to a variety of functions while under pressure
- Strong technical troubleshooting, diagnosing and problem solving skills
- Ability to adapt to changing priorities within a fast moving industry and startup culture
Projects you could work on:
- Coding infrastructure automation with Chef and Terraform
- Improving our Prometheus Monitoring or building new Metrics
- Plan, prepare for, and execute the migration of virtual machines running on AWS to cloud-native container-based deployments with Kubernetes
- Develop a relationship with a product group, define their SLAs, and improve their reliability
Below are some other jobs we think you might be interested in.
-
Site Reliability Engineer
- LayerZero
- Vancouver, BC
Jun 20 -
Blockchain Site Reliability Engineer
- InfStones
- Texas
Jun 18 -
Senior Site Reliability Engineer
- SSV Network
- Anywhere
- Remote
May 26 -
Site Reliability Engineer - Core
- Blockchain, Inc.
- London
Jun 09 -
Director of Site Reliability Engineering
- Stellar
- New York
Jun 16 -
Site Reliability Engineer - Algorithmic Trading
- DRW
- Tel Aviv
Jun 15 -
Senior Site Reliability Engineer, Core AI Infrastructure
- Coinbase
- Remote - USA
- Remote
Jun 09 -
Software Engineer - Data Engineering
- Akuna Capital
- Chicago, IL
Jun 23 -
Growth Engineer / Integration Engineer
- Injective Labs
- Anywhere
- Remote
Jun 03 -
Security Engineer, Red team
- Coinhako
- Vietnam
May 30 -
Principal Engineer, CoinDesk Data Engineering
- CoinDesk
- London
Jun 09 -
Senior Engineer, Custody Engineering (Tokenization)
- CoinDesk
- London
May 21 -
Lead Engineer, Trading Platform Engineering
- CoinDesk
- London
Jun 03 -
Senior Engineer, Trading Product Engineering
- CoinDesk
- London
May 29 -
Lead Engineer, Custody Engineering (Tokenization)
- CoinDesk
- London
Jun 24 -
Core Engine, Senior Backend Engineer
- Uphold
- Braga
Jun 21 -
Growth Engineer
- Goldsky
- Anywhere
- Remote
May 13 -
Engineer - Linux
- Cboe Digital
- Singapore
May 19 -
iOS Engineer
- Blockchain, Inc.
- Buenos Aires
May 19 -
MLOps Engineer
- Akuna Capital
- Sydney, NSW
May 18

