Polygon
Network Operations Engineer
NEWLATAMFull-timeGlobal
š Remote
ActivePosted within the last 30 days
Job Description
[AI-summarized by JobStash]
You will serve as the front line of reliability for production infrastructure. You will detect and respond to incidents, triage alerts, coordinate incident response, and document decisions and outcomes in real time. You will also improve observability, refine alerting, build dashboards, and create runbooks to scale operational coverage. This is a shift-based role where you will validate system and user-facing functionality and support ecosystem participants during incidents.
Requirements
- āFoundational experience with Linux systems, including filesystem navigation, log reading, and process awareness
- āUnderstanding of core networking concepts such as DNS, HTTP, and TCP/IP and ability to troubleshoot connectivity issues
- āBasic scripting ability (Python, Bash) to automate tasks and analyze system data
- āExposure to monitoring and observability tools such as Datadog, Grafana, or Prometheus
- āStrong written communication skills for clear incident documentation and procedures under pressure
- āWillingness to work shift-based, follow-the-sun schedules with a structured troubleshooting approach
- āFamiliarity with blockchain infrastructure, including node operation or EVM-based systems (preferred)
- āExperience with Datadog or similar observability platforms in production (preferred)
- āExposure to infrastructure-as-code tools such as Terraform or configuration management tools like Ansible (preferred)
- āPrevious experience in a network operations center, incident response team, or on-call rotation (preferred)
- āExperience working in a remote, globally distributed team (preferred)
Responsibilities
- āMonitor the health and performance of blockchain networks, bridges, RPC services, staking systems, and user-facing products
- āTrack third-party dependencies and identify degradation that may impact the ecosystem
- āValidate and triage alerts by distinguishing signal from noise, assessing severity, and determining impact
- āEscalate confirmed issues to the appropriate SRE or engineering teams with clear structured context
- āCoordinate incident response by engaging stakeholders, maintaining timelines, and ensuring consistent communication
- āDocument incidents in real time, including decisions, actions, and outcomes
- āBuild and improve dashboards, alerting systems, and monitoring coverage to enhance visibility
- āCreate and maintain runbooks for common failure modes and triage workflows
- āSupport validators and infrastructure providers when issues intersect with systems
- āValidate user-facing product functionality during incidents
Benefits & Perks
- āRemote first global workforce
- āMedical insurance
- āDental insurance
- āVision insurance
- āCompany matching 401k with 3% match (United States employees only)
- ā$1,500 Home Office Set Up Allowance (lifetime max)
- ā$200 Annual AI Allowance
- ā$75 Monthly internet or phone reimbursement
- āFlexible Time Off
- āCompany issued laptop
- āEgg freezing benefits
- āMental health and employee wellness benefits
Tech Stack
HTTPRPCGrafanaPrometheusBashDataDogon-callDNSTerraformLinux