Remote, IL 60606 US
As a member of our Data Infrastructure team, you’ll help us build and maintain tools and infrastructure to write, test, and schedule batch data pipelines. Your work will enable other developers, data scientists, and analysts to write the high-performing pipelines that power data science, machine learning, and product development.
We primarily write in Scala, Java, and Python and use technologies like Hadoop, Spark, Airflow, Terraform, and Kubernetes, as well as GCP services like Dataproc, Dataflow, and BigQuery. Our team is headquartered in Brooklyn but has a remote-first culture and we encourage remote applicants; no preference in time zone.
What’s this team like?
- We build highly-performant systems that are maintainable and easy to understand by selecting and integrating with the best of current technologies.
- We develop robust, highly available, well-monitored data infrastructure.
- We stay in close communication with our internal customers and make strategic improvements to ensure those that depend on us have a great experience using data
What does the day-to-day look like?
- You should have experience building data processing platforms, supporting them at scale, and collaborating with other teams that depend on them.
- Experience building applications and managing infrastructure using one of the major cloud providers is preferred but not required.
- We value curiosity, passion, responsibility, and generosity of spirit.
Qualities that will help you thrive in this role are:
- You understand that being an effective software engineer is about communicating with people as much as it is about writing code.
- You are willing to work with and improve code you did not originally write.
- You are generous with your time and experience, and can mentor and learn from other engineers.
- You can tackle unconstrained problems and know when to seek help.
- You have strong domain expertise and backend skills (working in Scala, Java and Python)
- You have experience with the following:
- Advantages and limitations of distributed systems
- Writing ETL pipelines
- Building and monitoring cloud services (GCP services preferred) and infrastructure
- Using or maintaining data processing environments like Hadoop, Spark, PySpark, and Dataflow.
- Prefence in candidates that have experience working with Airflow
- You will be considered a staff engineer / tech lead with strong leadership and communication skills
- You will have experience collaborating with other engineers and product teams