Skip to content

AI/ML Data Engineer

  • On-site
    • Austin, Texas, United States
  • SNH.AI

Job description

SNH.AI is an early-stage startup at the forefront of agentic AI for operations functions. We are building a new category called the Autonomous Workforce Platform.

An Autonomous Workforce Platform delivers fully operational digital employees that perform entire job roles — not just tasks — with no hiring, HR burden, or human limitations.

We are building a world where every business can instantly scale its workforce, without hiring, overhead, or human limitations — by delivering digital employees who work faster, scale instantly, and operate 24/7.

We’re a highly collaborative, on-site team based in the heart of downtown Austin. Our founding team has a proven track record of taking SaaS businesses from 0 to 1, and we’re committed to solving hard problems, building straightforward products, and fostering a culture of trust, ownership, and optimism.

Why Join Us?

  • Be part of a foundational team building a product from the ground up

  • Solve meaningful problems that impact real industries and real people

  • Work closely with experienced operators and engineers who’ve done this before

  • Enjoy a collaborative, in-person culture in a vibrant Austin office

Key Responsibilities:

  • Design, build, and maintain robust data pipelines for ingestion, transformation, and storage of structured and unstructured data.

  • Collaborate with AI/ML teams to understand data requirements for model training, fine-tuning, inference in order to deploy AI/LLM-powered solutions to automate complex workflows.

  • Implement scalable ETL processes to handle large volumes of data across multiple modalities (e.g., text, images, audio).

  • Work with data labeling teams and platforms to create high-quality annotated datasets.

  • Build and maintain feature stores, data lakes, and data warehouses optimized for AI use cases.

  • Collaborate cross-functionally to ensure engineering solutions align with business goals and ensure data lineage, governance, and quality across all stages of the data lifecycle.

  • Optimize data flows for performance, cost, and latency in both batch and real-time environments.

  • Navigate and integrate with legacy systems and ensure compliance with regulatory standards also partner with DevOps/ML Ops to support model deployment pipelines and data monitoring infrastructure.

Job requirements

What We’re Looking For:

  • Startup Mindset & Initiative: Self-starter who thrives in fast-paced, ambiguous environments. Comfortable taking ownership and driving results without needing step-by-step instructions.

  • Technical Excellence: 3+ years of experience in data engineering, data infrastructure, or ML data pipelines. Experienced dealing with structured, semi-structured, and unstructured data types. Strong CS engineering fundamentals and sound decision-making. Proficient in Python and SQL, with solid experience in distributed data processing such as (Apache Spark, Kafka, Airflow, DBT).

  • Strong Background in Data Science: Hands-on ETL and ML Ops implementation. Experience building ETL pipelines, feature engineering. Experience working with cloud platforms (AWS, GCP, or Azure) and data services such as S3 or BigQuery.

  • Full-Stack Development Skills: Experience with REST APIs, web frameworks, and relational databases.  Familiarity with ML workflows and AI frameworks (e.g., TensorFlow, PyTorch, Hugging Face, MLflow). Nice to know our core tech stack is a plus: NodeJS / NestJS, Prisma, PostgreSQL, OpenAI / Gemini, Google Cloud Platform (GCP)

  • Collaborative Spirit: Strong communication skills and a team-oriented mindset. Able to brainstorm, problem-solve, and iterate with colleagues across functions.

or