TRM Labs
Senior Data Engineer Data Lakehouse Infrastructure
North AmericaFull-timeGlobal
š° USD 190,000 - 220,000/yr
š Midš Remote
Job Description
[AI-summarized by JobStash]
You will design, implement, and scale a modern data lakehouse to support complex analytical and real-time workloads. You will own data modeling, ingestion, metadata management, and query performance optimization. You will build and orchestrate ETL and streaming pipelines, implement open table formats and governance, and create automation and observability for operational reliability.
Requirements
- ā5+ years of experience in data or software engineering focused on distributed data systems
- āProven experience building and scaling data platforms on GCP
- āStrong command of query engines such as Trino Presto Spark or Snowflake
- āExperience with table formats like Apache Hudi Iceberg or Delta Lake
- āProficient programming skills in Python and strong SQL or SparkSQL abilities
- āHands-on experience with Airflow and GCP-native orchestration and streaming services
Responsibilities
- āArchitect and scale a high-performance data lakehouse on GCP
- āDesign build and optimize distributed query engines such as Trino Spark or Snowflake
- āImplement metadata management using open table formats like Iceberg or Hudi
- āDevelop and orchestrate ETL and ELT pipelines with Airflow Spark and GCP-native tools
- āBuild streaming and batch data pipelines using Dataflow and Kafka
- āOptimize query performance and data modeling for analytical workloads
- āAutomate operational tasks including cluster scaling and self-serve infrastructure
- āImplement observability and data discovery frameworks for governance
Benefits & Perks
- āEquity plan participation
Tech Stack
SnowflakeAirflowBigQueryquery optimizationELTStarRocksTrinoHudiIcebergGCS