QuickNode
Technical Operations Engineer
CanadaFull-timeGlobal
📊 Senior🏠Remote
Job Description
[AI-summarized by JobStash]
You will ensure the stability, reliability, and performance of production systems. You will lead deployment and optimization of blockchain networks, troubleshoot complex Web3 incidents with log and JSON-RPC analysis, and coordinate with ecosystem partners. You will build and maintain monitoring and alerting solutions, define and enforce SLOs and SLAs, and implement automation using tools like Ansible, Terraform, and Kubernetes. You will collaborate with support, infrastructure, and development teams, and participate in a rotating 24/7 on-call schedule to address critical incidents and perform post-incident analysis.
Requirements
- â—ŹMinimum of 5 years in Technical Operations Site Reliability Engineering or related roles
- â—ŹProven Linux/Unix system administration and advanced troubleshooting capabilities
- â—ŹDeep experience managing complex Web3 infrastructures including RPC services validator setups and node operations
- â—ŹHands-on experience with Helm Terraform Ansible and Consul
- â—ŹContainerization experience with Docker and Kubernetes
- â—ŹCompetency in Python Go and JavaScript
- â—ŹProficiency in monitoring and analytics platforms such as Grafana and DataDog
- â—ŹExperience defining measuring and maintaining SLAs SLOs and using incident response tooling like PagerDuty
- â—ŹAbility to perform benchmarking capacity and cost modeling and root cause analysis
- â—ŹStrong interpersonal and communication skills
Responsibilities
- â—ŹLead blockchain network deployments and optimization
- â—ŹResolve complex Web3 incidents through troubleshooting and log analysis
- â—ŹDevelop and maintain monitoring and alerting solutions using Grafana and DataDog
- â—ŹDefine implement and enforce SLOs and SLAs
- â—ŹImplement and maintain automation with Ansible Terraform and Kubernetes
- â—ŹCollaborate with support infrastructure and development teams on system improvements
- â—ŹParticipate in a rotating 24/7 on-call schedule to address critical incidents
Benefits & Perks
- â—ŹQuarterly bonus tied to company and individual goal achievement
Tech Stack
LinuxnodeKubernetesDockerincident responsebenchmarkingvalidatorPythonmonitoring