LanceDB is a developer-friendly, open-source data lake for multimodal AI. From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large-scale AI datasets, LanceDB is the best foundation for your AI application, and powers some of the most groundbreaking applications and challenging requirements today.
Weβre seeking a seasoned Cloud Infrastructure Engineer with deep expertise in automation, infrastructure-as-code (IaC), and cloud platform management. Youβll design, deploy, and maintain robust cloud environments while collaborating with cross-functional teams to streamline CI/CD pipelines, enhance system reliability, and drive operational excellence.
As a Cloud Infrastructure Engineer at LanceDB, your responsibilities will include:
Design & Build Cloud Infrastructure: Architect and manage secure, scalable cloud environments (AWS, Azure, GCP) using IaC tools like Terraform and CloudFormation.
Automate Everything: Develop and maintain automation scripts to streamline deployments, monitoring, and system operations.
Systems Reliability: Implement monitoring/alerting solutions (Prometheus, Grafana, Datadog) to proactively address performance bottlenecks and ensure 99.9% uptime.
Security & Compliance: Enforce security policies, manage secrets (Vault, AWS KMS), and ensure compliance with industry standards (GDPR, SOC2).
Troubleshoot & Optimize: Resolve complex infrastructure issues and lead cost-optimization initiatives for cloud resources.
Collaborate & Mentor: Partner with software engineering teams to integrate DevOps practices into SDLC and mentor junior engineers on IaC and cloud best practices.
5+ years in DevOps, Cloud Infrastructure, or SRE roles, with hands-on experience in public cloud platforms (AWS, Azure, GCP, Heroku).
Expertise in IaC tools (Puppet, Terraform, Ansible, CloudFormation) and configuration management.
Experience designing and managing complex production environments using Kubernetes and Helm.
Deep understanding of networking, security, and cloud architecture best practices.
Experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk).
Strong knowledge of CI/CD tools (GitHub Actions) and containerization (Docker, Kubernetes).
You like working with a small, high-caliber team with a lot of autonomy and drive, and you can iterate fast
Youβve made substantial contributions to open-source projects (e.g., Puppet modules, Terraform providers).
You design and automate single-command deployments for complex, globally distributed systems to ensure consistency, reliability, and scalability across multi-cloud or hybrid environments.
You fearlessly challenge the status quo and dismiss mediocre engineering as unacceptable.
You have worked on distributed large-scale system, with a good understanding of how to using tracing tool to identify bottlenecks.
LanceDB was created by experts with decades of experience building tools for data science and machine learning. From co-authors of pandas to Apache PMC of HDFS, Arrow, Iceberg and HBase, the LanceDB team has created open source tools used by millions world-wide.