Polygon
Site Reliability Engineer
LATAMFull-timeGlobal
š Juniorš Remote
Job Description
[AI-summarized by JobStash]
You will operate and support production infrastructure that powers large-scale blockchain networks. You will monitor systems, respond to incidents, follow and improve runbooks, and perform routine operational tasks such as restarts, upgrades, and configuration changes. You will help maintain and improve monitoring, logging, and alerting systems, participate in post-incident reviews, and continuously build knowledge of distributed systems and networking.
Requirements
- āFoundational understanding of Linux systems, processes, and basic networking concepts
- āFamiliarity with at least one scripting or programming language such as Python, Bash, or Go
- āInterest in site reliability, monitoring, and operating production infrastructure
- āClear written and verbal communication skills and willingness to learn
- āAbility to remain calm, methodical, and responsive during incidents or operational events
- āExposure to cloud platforms such as AWS or GCP
- āFamiliarity with containerization or orchestration technologies including Docker or Kubernetes
- āBasic understanding of blockchain or Web3 concepts such as nodes, RPC services, or validators
- āExperience with monitoring and observability tools such as Grafana, Prometheus, Datadog, or ELK-based stacks
Responsibilities
- āMonitor production systems, alerts, dashboards, and logs across networks including PoS and the Agglayer
- āAssist with incident detection, triage, escalation, and resolution under guidance
- āSupport on-call and operational coverage through structured rotations
- āFollow, maintain, and improve runbooks and standard operating procedures
- āPerform routine operational tasks such as service restarts, upgrades, and configuration changes
- āMaintain and improve monitoring, logging, and alerting systems including dashboards for network health, RPC performance, and node metrics
- āImprove alert signal quality and reduce operational noise
- āSupport cloud-based and containerized infrastructure, including nodes, RPC endpoints, and supporting services
- āCollaborate with protocol, product, and cross-functional teams to understand production issues and user impact
- āParticipate in post-incident reviews and contribute to root-cause analysis documentation
- āContinuously build knowledge of blockchain fundamentals, distributed systems, and networking
Benefits & Perks
- āRemote first global workforce
- āMedical dental and vision health insurance
- āCompany matching 401k with 3% match
- ā$1,500 Home Office Set Up Allowance (life-time max)
- ā$75 Monthly internet or phone reimbursement
- āFlexible Time Off
- āCompany issued laptop
- āEgg freezing mental health and employee wellness benefits
Tech Stack
RPCcontainerizationGrafanaPrometheusDataDogELKPythonGCPon-callBash