Head of Data Engineering APPLY TO THIS JOB


About Us:

SelfDecode is a well-funded biotech startup in the personalized health space. We build software to help interpret people's genetics, lab tests, and symptoms in order to give personalized health recommendations. Our primary goal is to give people the tools they need to live healthier and better lives.

  • We are a flat organization and prioritize efficiency.
  • We work as a team and every input and suggestion is taken into account, no matter who it comes from.
  • We thrive on open communication and dedication. We are a meritocracy and people who show good abilities or skills can move up in the organization fast, get raises, etc…
  • We expect people to work full-time without side gigs.
  • We expect the applicant to have a long-term relationship with our company
  • We expect employees to be proactive and autonomous.
  • We do not micromanage.
  • Dishonesty is not tolerated at all, and we thrive on trust.
  • When you're working, we expect you to work.
  • We emphasize skills & abilities rather than formal education.

Job Description:

As Head of Data Engineering, you will lead a team of software engineers responsible for the design, development, and operation of all back-end services. This includes data integration and ingestion, processing, and the application of machine learning model algorithms on large, complex, biological data sets. Our product engineering teams use these back-end services to build and deliver cutting-edge genomic analysis to our customers. We have decided to modernize and re-architect our core data platform and move from a batch-based processing system to a continuous event-based streaming system. The core technologies of the platform will include AWS, Airflow, HDFS/Hadoop, Spark, Kafka, NoSQL (Hbase/Cassandra), and Clickhouse. We primarily use Python.

 

The role reports to the CTO, where you will have the opportunity to make a significant impact on the company's success at a critical stage in our growth. This is an incredible opportunity to discover the world of real-time data processing and the use of artificial intelligence at scale. You will play an active role in leading our next-generation platform's design, development, and deployment, which is critical to our success.

 

We are looking for a strong engineering leader who is a strategic and innovative problem-solver; someone with a passion for applying technology to solve real-world customer problems at scale. The ideal candidate is passionate about building high-performance teams who are focused on quality and innovation, and who demonstrates excellent organizational and communication skills with other engineers and leaders throughout the company.

The Role Is: 

  • Full-time
  • Fully remote
  • No agencies
  • Euro hours
  • Salary: $140k – $200k USD per year. Equity is also available.

Responsibilities:

  • Lead and develop a team of talented data/software engineers to design, plan, develop, and deploy improvements to back-end platform services related to data ingestion, data processing, and analytics.
  • Create a culture of working with big and sensitive data.
  • Design the architecture and then lead the implementation of scalable data processing systems.
  • Plan the development of a data platform as a SaaS product.
  • Collaborate broadly across the organization and with senior leadership to drive team and individual performance focused on clear outcomes and team OKRs.
  • Evaluate resource costs, determine the composition of the required team, top-level roadmap, and perform project risk assessment.
  • Foster the adoption of best engineering practices across all aspects of software development to build, deploy, test, and release large scale services with quality and agility, while maintaining our current platform to continue to meet customer commitments.
  • Facilitate overall technology strategy, quarterly, and yearly goals, drive engineering best practices, and take ownership of delivering on core outcomes.

Required Skills & Experience:

  • 7+ years of extensive experience in Data technologies across streaming and batch-oriented realms, cutting across data acquisition, storage, processing, and consumption patterns in operational and analytical domains, as well as expertise in cloud-related data services (AWS / Azure / GCP).
  • 5+ years leading highly technical and high performance engineering teams, with experience in people management (hiring and layoff) and performance management (coaching & mentoring). Have led technical Architecture, Design, and Delivery of Big Data and Cloud Data solutions (AWS, Azure, GCP) for multiple projects. Proven track record of architecting, designing, and delivering complex Big Data and Cloud Data projects (AWS, Azure, GCP) to solve problems at scale, especially distributed data platforms (Hadoop/Kafka).
  • Expert in distributed data processing frameworks like Spark, Storm, Flink, and Parquet across batch and streaming realms; expert in programming languages, preferably Scala, with Python secondary, and expert at distributed messaging/streaming frameworks like Kafka, Pulsar, Google Pub/Sub, Azure EventHub, and AWS Kinesis.
  • Experience with NoSQL databases (Cassandra/HBase/MongoDB/ElasticSearch/Neo4j) and scalable, analytical data stores like Snowflake, BigQuery, Redshift, and Teradata.
  • Professional experience with workflow management (Nextflow, Snakemake, Airflow, etc.).
  • Deep knowledge of scalable data models, queries, and operations that address various consumption patterns, including random-access and sequential-access, and necessary optimisations like bucketing, aggregating, and sharding.
  • Experience in performance-tuning, optimization, and scaling solutions from a storage/processing standpoint.
  • Experience with setting up data engineering practices across architecture, design, coding, quality assurance, and deployment of such, using industry-standard DevOps practices for CI/CD, and leveraging tools like Jenkins/Bamboo, Maven, Junit, SonarQube, Terraform (one-click infrastructure setup), Kubernetes, and containerisation.
  • Solid understanding of Data Governance, Data Security, Data Cataloguing, and Data Lineage concepts (experience with tools like Collibra in these areas is preferred).
  • Passion for recruiting, developing, mentoring, and retaining a world-class engineering team.
  • Lean-thinking mindset, comfortable with Agile planning and estimation rituals, flexible, and able to thrive in a fast-paced, innovative young company.
  • Excellent written and verbal English-language communication skills, with the ability to adapt the level of detail to various audiences, and able to concisely explain technical concepts to business stakeholders.

Plusses:

  • Knowledge of statistical techniques
  • Bioinformatics knowledge
  • Strong math ability

Your Time Zone:

  • Any

Important: Share your LinkedIn profile. Having an up-to-date LinkedIn profile will make you a more competitive applicant!  If you're up for the challenge, then we invite you to apply!

Hiring Process:

Our typical hiring process looks like this:

  • Round 1: One short logic puzzle which should take less than 30 minutes, in place of an interview
  • Round 2: If you pass the round 1 test and are selected to proceed, there will be a data engineering test (60 minutes) and an AWS test (12 minutes)
  • Round 3: If you pass the round 2 tests, there are typically one to two interviews

Our hiring process is heavily focused on screening tests instead of interviews. So instead of three to five interviews that are typical for many companies, we typically just have the screening tests and two interviews.

 

You will be prompted to take the round 1 logic puzzle as soon as you submit your application. Again, it should take less than 30 minutes to complete.

 

Note: You must pass the initial screening test in order to be considered for employment, so please be sure to fully complete it within 7 days of starting.

 

If you're up for the challenge, then we invite you to apply!

 

Questions? If you have any questions, you can email us at recruiting@selfdecode.com