
AI/ML Data Engineer
- On-site
- Austin, Texas, United States
- SNH.AI
Job description
SNH.AI is an early-stage startup at the forefront of agentic AI for operations functions. We are building a new category called the Autonomous Workforce Platform.
An Autonomous Workforce Platform delivers fully operational digital employees that perform entire job roles — not just tasks — with no hiring, HR burden, or human limitations.
We are building a world where every business can instantly scale its workforce, without hiring, overhead, or human limitations — by delivering digital employees who work faster, scale instantly, and operate 24/7.
We’re a highly collaborative, on-site team based in the heart of downtown Austin. Our founding team has a proven track record of taking SaaS businesses from 0 to 1, and we’re committed to solving hard problems, building straightforward products, and fostering a culture of trust, ownership, and optimism.
Why Join Us?
Be part of a foundational team building a product from the ground up
Solve meaningful problems that impact real industries and real people
Work closely with experienced operators and engineers who’ve done this before
Enjoy a collaborative, in-person culture in a vibrant Austin office
Key Responsibilities:
Design, build, and maintain robust data pipelines for ingestion, transformation, and storage of structured and unstructured data.
Collaborate with AI/ML teams to understand data requirements for model training, fine-tuning, inference in order to deploy AI/LLM-powered solutions to automate complex workflows.
Implement scalable ETL processes to handle large volumes of data across multiple modalities (e.g., text, images, audio).
Work with data labeling teams and platforms to create high-quality annotated datasets.
Build and maintain feature stores, data lakes, and data warehouses optimized for AI use cases.
Collaborate cross-functionally to ensure engineering solutions align with business goals and ensure data lineage, governance, and quality across all stages of the data lifecycle.
Optimize data flows for performance, cost, and latency in both batch and real-time environments.
Navigate and integrate with legacy systems and ensure compliance with regulatory standards also partner with DevOps/ML Ops to support model deployment pipelines and data monitoring infrastructure.
Job requirements
What We’re Looking For:
Startup Mindset & Initiative: Self-starter who thrives in fast-paced, ambiguous environments. Comfortable taking ownership and driving results without needing step-by-step instructions.
Technical Excellence: 3+ years of experience in data engineering, data infrastructure, or ML data pipelines. Experienced dealing with structured, semi-structured, and unstructured data types. Strong CS engineering fundamentals and sound decision-making. Proficient in Python and SQL, with solid experience in distributed data processing such as (Apache Spark, Kafka, Airflow, DBT).
Strong Background in Data Science: Hands-on ETL and ML Ops implementation. Experience building ETL pipelines, feature engineering. Experience working with cloud platforms (AWS, GCP, or Azure) and data services such as S3 or BigQuery.
Full-Stack Development Skills: Experience with REST APIs, web frameworks, and relational databases. Familiarity with ML workflows and AI frameworks (e.g., TensorFlow, PyTorch, Hugging Face, MLflow). Nice to know our core tech stack is a plus: NodeJS / NestJS, Prisma, PostgreSQL, OpenAI / Gemini, Google Cloud Platform (GCP)
Collaborative Spirit: Strong communication skills and a team-oriented mindset. Able to brainstorm, problem-solve, and iterate with colleagues across functions.
or
All done!
Your application has been successfully submitted!