Skip to main content

Deploying a Scalable Monitoring Stack Lab on AWS using Terraform and Ansible

Deploying a Scalable Monitoring Stack Lab on AWS using Terraform and Ansible

Introduction

Effective monitoring is a cornerstone of cloud infrastructure management, ensuring high availability and performance. This guide provides a professional walkthrough on deploying Prometheus, Grafana, and Node Exporter on AWS using Terraform for infrastructure provisioning and Ansible for configuration management.

This lab will create a prometheus server and a grafana server, It will install node exporter on both server. You should be able to see the metrics in grafana, we already install a node exporter dashboard for the user.

The diagram below will give you an idea of what the architecture will look like



If you want to replicate this lab, you can find the complete code repository here: GitHub - MireCloud Terraform Infra


Infrastructure Setup with Terraform

1. Creating a Dedicated VPC

To ensure isolation, we define a VPC named Monitoring with a CIDR block of 10.0.0.0/16.

resource "aws_vpc" "Monitoring" {
  cidr_block = "10.0.0.0/16"
  tags = {
    Name = "Monitoring"
  }
}

2. Defining the Subnet

A subnet is created within this VPC to host monitoring components.

resource "aws_subnet" "Monitoring-subnet" {
  vpc_id            = aws_vpc.Monitoring.id
  cidr_block        = "10.0.1.0/24"
  availability_zone = "us-east-1a"
  map_public_ip_on_launch = true
  tags = {
    Name = "Monitoring-subnet"
  }
}

3. Deploying EC2 Instances for Prometheus and Grafana

Each instance is assigned a dedicated network interface within the monitoring subnet.

Network Interfaces

resource "aws_network_interface" "prometheus" {
  subnet_id   = aws_subnet.Monitoring-subnet.id
  private_ips = ["10.0.1.100"]
  security_groups = [aws_security_group.prometheus-security-group.id] 
  tags = {
    Name = "prometheus_network_interface"
  }
}

resource "aws_network_interface" "grafana" {
  subnet_id   = aws_subnet.Monitoring-subnet.id
  private_ips = ["10.0.1.101"]
  security_groups = [aws_security_group.grafana-security-group.id] 
  tags = {
    Name = "grafana_network_interface"
  }
}

Provisioning EC2 Instances

resource "aws_instance" "prometheus" {
  ami           = "ami-04b4f1a9cf54c11d0"
  instance_type = "t2.micro"
  network_interface {
    network_interface_id = aws_network_interface.prometheus.id
    device_index         = 0
  }
  key_name = aws_key_pair.Monitoring_key.key_name  
  tags = {
    Name = "Prometheus"
  }
}

resource "aws_instance" "grafana" {
  ami           = "ami-04b4f1a9cf54c11d0"
  instance_type = "t2.micro"
  network_interface {
    network_interface_id = aws_network_interface.grafana.id
    device_index         = 0
  }
  key_name = aws_key_pair.Monitoring_key.key_name  
  tags = {
    Name = "Grafana"
  }
}

Dynamic Inventory for Ansible

A dynamic inventory script is used to fetch real-time information about running EC2 instances and generate an inventory file for Ansible.

Dynamic Inventory Script

import boto3
import json

ec2 = boto3.client("ec2", region_name="us-east-1")
response = ec2.describe_instances(Filters=[{"Name": "instance-state-name", "Values": ["running"]}])

inventory = {"all": {"hosts": {}}}

for reservation in response["Reservations"]:
    for instance in reservation["Instances"]:
        instance_name = instance.get("Tags", [{}])[0].get("Value", instance["InstanceId"])
        public_ip = instance.get("PublicIpAddress", "")
        if public_ip:
            inventory["all"]["hosts"][instance_name] = {
                "ansible_host": public_ip,
                "ansible_user": "ubuntu",
                "ansible_ssh_private_key_file": "/home/asd/.ssh/github"
            }

with open("inventory.json", "w") as f:
    json.dump(inventory, f, indent=4)

This script ensures that Ansible dynamically retrieves IP addresses of all running EC2 instances, making the deployment process seamless and automated.


Configuration with Ansible

Once the infrastructure is provisioned, we configure Prometheus, Grafana, and Node Exporter using Ansible playbooks.

1. Prometheus Playbook

- hosts: Prometheus
  name: Install Prometheus
  roles:
    - prometheus.prometheus.prometheus
  vars:
    prometheus_targets:
      node:
        - targets:
            - "{{ hostvars['Prometheus']['ansible_host'] }}:9100"
            - "{{ hostvars['grafana']['ansible_host'] }}:9100"
          labels:
            env: monitoring

2. Node Exporter Installation

- hosts: all
  name: Install Node Exporter
  roles:
    - prometheus.prometheus.node_exporter

3. Grafana Playbook

- hosts: grafana
  name: Install Grafana
  roles:
    - grafana.grafana.grafana
  vars:
    grafana_security:
      admin_user: "admin"
      admin_password: "Password123#"
    grafana_datasources:
      - name: prometheus
        type: prometheus
        url: "http://{{ hostvars['Prometheus']['ansible_host'] }}:9090"

Conclusion

By leveraging Terraform for infrastructure provisioning and Ansible for configuration management, we achieve a scalable, automated monitoring setup on AWS. The inclusion of a dynamic inventory enhances flexibility by allowing real-time updates of server information without manual intervention.

🚀 Next Steps: Expand this setup with Loki for log management.

🔗 Find the complete code here: GitHub - MireCloud Terraform Infra

Comments

Popular posts from this blog

FastAPI Instrumentalisation with prometheus and grafana Part1 [Counter]

welcome to this hands-on lab on API instrumentation using Prometheus and FastAPI! In the world of modern software development, real-time API monitoring is essential for understanding usage patterns, debugging issues, and ensuring optimal performance. In this lab, we’ll demonstrate how to enhance a FastAPI-based application with Prometheus metrics to monitor its behavior effectively. We’ve already set up the lab environment for you, complete with Grafana, Prometheus, and a PostgreSQL database. While FastAPI’s integration with databases is outside the scope of this lab, our focus will be entirely on instrumentation and monitoring. For those interested in exploring the database integration or testing , you can review the code in our repository: FastAPI Monitoring Repository . What You’ll Learn In this lab, we’ll walk you through: Setting up Prometheus metrics in a FastAPI application. Instrumenting API endpoints to track: Number of requests HTTP methods Request paths Using Grafana to vi...

Join Ubuntu 20.04 to Active Directory with SSSD and SSH Access

Join Ubuntu 20.04 to Active Directory with SSSD and SSH Access  Overview This guide walks you through joining an Ubuntu 20.04 machine to an Active Directory domain using SSSD, configuring PAM for AD user logins over SSH, and enabling automatic creation of home directories upon first login. We’ll also cover troubleshooting steps and verification commands. Environment Used Component Value Ubuntu Client       ubuntu-client.bazboutey.local Active Directory FQDN   bazboutey.local Realm (Kerberos)   BAZBOUTEY.LOCAL AD Admin Account   Administrator Step 1: Prerequisites and Package Installation 1.1 Update system and install required packages bash sudo apt update sudo apt install realmd sssd libnss-sss libpam-sss adcli \ samba-common-bin oddjob oddjob-mkhomedir packagekit \ libpam-modules openssh-server Step 2: Test DNS and Kerberos Configuration Ensure that the client can resolve the AD domain and discover services. 2.1 Test domain name resol...

ExternalDNS on Kubernetes - mirecloud homelab part 4

MireCloud Home Lab · DevOps ExternalDNS on Kubernetes Automatic Sync with BIND via RFC2136 How to fully automate DNS management in a bare-metal Kubernetes homelab using Cilium, BIND, and HashiCorp Vault. 📅 February 22, 2026 ⏱ ~10 min read 🔧 ExternalDNS v0.20.0 ☸ Kubernetes v1.34 Kubernetes v1.34 ExternalDNS v0.20.0 Cilium Gateway API BIND (RFC2136) TSIG / HMAC-SHA256 HashiCorp Vault External Secrets Operator ArgoCD (GitOps) cert-manager When you run a Kubernetes homelab with multiple exposed services — Grafana, Keycloak, ArgoCD, PgAdmin — you quickly find yourself maintaining DNS entries in BIND manually . It's repetitive, prone to errors, and breaks the GitOps flow. The solution is ExternalDNS . This controller monitors your Services, Ingresses, and HTTPRoutes in real-time, automatically pushing DNS updates to BIND as soon as a route is created. No mo...