Cloud & DevOps

Atlas

Project Atlas is a production-grade infrastructure blueprint for deploying a secure Kubernetes cluster and a standalone observability stack on AWS. Built on a 'Portless, Serverless-Storage, and Automated-Discovery' philosophy.

Atlas

Technical Stack

TerraformAWS EC2AWS VPCAWS IAMAWS S3AWS SSMKubernetesAnsibleDockerFlannel CNIPrometheusGrafanaLokiTempoLinuxWSL

Executive Summary

Atlas automates the provisioning of a 3-node Kubernetes cluster (1 Controller, 2 Workers) and a dedicated Monitoring node on AWS. Architectural decisions prioritize a zero-trust perimeter, infrastructure reproducibility via IaC, and survivable observability independent of the K8s cluster itself.

Security Architecture

Built on a Zero-Public-Port model — no SSH (Port 22) is ever opened on any node.

  • All instance access managed exclusively via AWS SSM Session Manager
  • Identity-Based Access using IAM roles — no long-term AWS keys stored on hosts
  • Private subnets only — no EC2 instances carry public endpoints
  • IAM roles scoped per workload (controller, worker, monitoring node)
  • All administrative access is fully auditable via AWS CloudTrail
Security Architecture

Infrastructure as Code

All cloud resources are provisioned and managed via Terraform, ensuring reproducibility and controlled change management across environments.

  • AWS VPC with private subnets, route tables, NACLs, and security groups
  • IAM roles and instance profiles scoped per node type
  • S3 buckets for serverless log and trace persistence
  • SSM Parameter Store auto-populated by Terraform — consumed dynamically by Ansible for zero hardcoded IPs
  • EC2 instances: 1 Controller, 2 Workers, 1 Monitoring node

Kubernetes Orchestration

Ansible playbooks automate the full Kubernetes cluster lifecycle — from OS hardening to cluster bootstrapping — across all five sequential phases via a single Master Orchestrator playbook.

  • Automated Kubernetes v1.29 installation across all nodes
  • Production-grade OS hardening: swap management, kernel network optimizations (bridge-nf-call-iptables)
  • Flannel CNI for internal cluster pod networking
  • site.yaml Master Orchestrator executes all 5 phases in sequence
  • Dynamic inventory powered by SSM Parameter Store — no hardcoded configuration

Observability Stack

A standalone monitoring node runs the LGTM+P stack, engineered to survive independently of the Kubernetes cluster. Logs and traces are persisted to S3 for infinite scalability.

  • Loki: log aggregation and querying
  • Grafana: dashboarding and unified visualization
  • Tempo: distributed trace collection and storage
  • Prometheus: metrics scraping and alerting
  • S3-backed persistence for logs and traces — 99.999999999% durability
  • Mock log/trace generators deployed via Ansible to validate the full end-to-end telemetry pipeline
Observability Stack

Access & Operations

Private, encrypted access to the monitoring dashboard is provided through a custom SSM tunnel utility — no bastion host required.

  • atlas-console.sh: SSM Port Forwarding tunnel exposing Grafana at localhost:3000
  • No jump server or bastion host — SSM fully replaces traditional SSH access patterns
  • Operational access logs captured in AWS CloudTrail for full auditability