LanceDB logo

Backend Software Engineer, Cloud Infrastructure

LanceDB
Full-time
Remote
United States
$160,000 - $250,000 USD yearly
Software / Technology / IT

About LanceDB

LanceDB is a developer-friendly, open-source data lake for multimodal AI. From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large-scale AI datasets, LanceDB is the best foundation for your AI application, and powers some of the most groundbreaking applications and challenging requirements today.

About the role

We’re seeking a seasoned Cloud Infrastructure Engineer with deep expertise in automation, infrastructure-as-code (IaC), and cloud platform management. You’ll design, deploy, and maintain robust cloud environments while collaborating with cross-functional teams to streamline CI/CD pipelines, enhance system reliability, and drive operational excellence.

As a Cloud Infrastructure Engineer at LanceDB, your responsibilities will include:

  • Design & Build Cloud Infrastructure: Architect and manage secure, scalable cloud environments (AWS, Azure, GCP) using IaC tools like Terraform and CloudFormation.

  • Automate Everything: Develop and maintain automation scripts to streamline deployments, monitoring, and system operations.

  • Systems Reliability: Implement monitoring/alerting solutions (Prometheus, Grafana, Datadog) to proactively address performance bottlenecks and ensure 99.9% uptime.

  • Security & Compliance: Enforce security policies, manage secrets (Vault, AWS KMS), and ensure compliance with industry standards (GDPR, SOC2).

  • Troubleshoot & Optimize: Resolve complex infrastructure issues and lead cost-optimization initiatives for cloud resources.

  • Collaborate & Mentor: Partner with software engineering teams to integrate DevOps practices into SDLC and mentor junior engineers on IaC and cloud best practices.

Requirements:

  • 5+ years in DevOps, Cloud Infrastructure, or SRE roles, with hands-on experience in public cloud platforms (AWS, Azure, GCP, Heroku).

  • Expertise in IaC tools (Puppet, Terraform, Ansible, CloudFormation) and configuration management.

  • Experience designing and managing complex production environments using Kubernetes and Helm.

  • Deep understanding of networking, security, and cloud architecture best practices.

  • Experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk).

  • Strong knowledge of CI/CD tools (GitHub Actions) and containerization (Docker, Kubernetes).

  • You like working with a small, high-caliber team with a lot of autonomy and drive, and you can iterate fast

Nice to have:

  • You’ve made substantial contributions to open-source projects (e.g., Puppet modules, Terraform providers).

  • You design and automate single-command deployments for complex, globally distributed systems to ensure consistency, reliability, and scalability across multi-cloud or hybrid environments.

  • You fearlessly challenge the status quo and dismiss mediocre engineering as unacceptable.

  • You have worked on distributed large-scale system, with a good understanding of how to using tracing tool to identify bottlenecks.

About the LanceDB team:

LanceDB was created by experts with decades of experience building tools for data science and machine learning. From co-authors of pandas to Apache PMC of HDFS, Arrow, Iceberg and HBase, the LanceDB team has created open source tools used by millions world-wide.