Senior DevOps Engineer - Applied AI Engineering Group
Dream Security
Senior DevOps Engineer - Applied AI Engineering Group
- AI
- Tel Aviv
- Senior
- Full-time
Description
At Dream, we redefine cyber defense vision by combining AI and human expertise to create products that protect nations and critical infrastructure. This is more than a job; it’s a Dream job. Dream is where we tackle real-world challenges, redefine AI and security, and make the digital world safer. Let’s build something extraordinary together.
Dream's AI cybersecurity platform applies a new, out-of-the-ordinary, multi-layered approach, covering endless and evolving security challenges across the entire infrastructure of the most critical and sensitive networks. Central to our Dream's proprietary Cyber Language Models are innovative technologies that provide contextual intelligence for the future of cybersecurity.
At Dream, our talented team, driven by passion, expertise, and innovative minds, inspires us daily. We are not just dreamers, we are dream-makers.
The Dream Job
It starts with you - an engineer driven to build resilient, automated infrastructure that enables teams to move fast with confidence. You care about operational excellence, developer experience, and reliability at scale. You’ll architect and operate the compute and networking infrastructure that powers our AI platform - from CI/CD pipelines to Kubernetes clusters to observability systems - across cloud and on-prem environments.
If you want to build infrastructure that powers mission-critical AI systems at national scale, join Dream’s mission - this role is for you.
The Dream-Maker Responsibilities
- Architect and operate Kubernetes-based infrastructure across AWS and on-prem environments, ensuring high availability, security, and performance.
- Design and maintain CI/CD pipelines for application and service deployments with automated testing, security scanning, and rollback capabilities.
- Drive infrastructure-as-code practices for compute and networking - building reproducible, auditable, and version-controlled infrastructure.
- Own reliability and incident response - establish SLOs, build alerting systems, lead incident resolution, and drive post-incident improvements.
- Enable AI-native operations - support agentic deployment pipelines, self-healing infrastructure, and secure sandboxing for model experimentation.
- Build and maintain observability systems - metrics, logging, tracing, and dashboards that provide visibility into system health.
- Optimize infrastructure cost and performance - right-size resources, implement auto-scaling, and identify efficiency opportunities.
- Collaborate with Engineering, Data Platform, Data Engineering, and Security teams to align infrastructure with platform needs.
- Shape infrastructure characteristics that support data freshness, correctness, and low-latency pathways for AI training/inference, retrieval, and agentic workflows.
- Contribute paved-road tooling - reusable CI/CD patterns for services, IaC modules for compute and networking, and runbooks - that streamline delivery across teams.
- Collaborate with Engineering, Data Platform, Data Engineering, Security, Product, AI/ML, Data Science, and Analytics to anticipate and meet cross-functional needs.
The Dream Skill Set
- 6+ years in DevOps, SRE, or infrastructure engineering, with hands-on experience building and operating infrastructure at scale.
- Container orchestration - Kubernetes (EKS, on-prem), Helm, service mesh technologies like Istio or Linkerd
- Cloud & infrastructure - AWS services (EC2, EKS, S3, IAM, VPC, Lambda), hybrid cloud architectures, on-prem infrastructure
- Infrastructure-as-Code - Terraform, Pulumi, or CloudFormation; GitOps practices with ArgoCD or Flux
- CI/CD - GitHub Actions, GitLab CI, Jenkins, or similar; artifact management, deployment strategies (blue-green, canary)
- Observability - Prometheus, Grafana, ELK/OpenSearch, Datadog, or similar; distributed tracing, log aggregation, alerting
- Security & compliance - Secrets management (Vault, AWS Secrets Manager), network security, compliance automation
- Scripting & automation - Python, Bash, Go; configuration management with Ansible or similar
Never Stop Dreaming...
If you think this role doesn't fully match your skills but are eager to grow and break glass ceilings, we’d love to hear from you!