Skip to main content

Deploying a Scalable Monitoring Stack Lab on AWS using Terraform and Ansible

Deploying a Scalable Monitoring Stack Lab on AWS using Terraform and Ansible

Introduction

Effective monitoring is a cornerstone of cloud infrastructure management, ensuring high availability and performance. This guide provides a professional walkthrough on deploying Prometheus, Grafana, and Node Exporter on AWS using Terraform for infrastructure provisioning and Ansible for configuration management.

This lab will create a prometheus server and a grafana server, It will install node exporter on both server. You should be able to see the metrics in grafana, we already install a node exporter dashboard for the user.

The diagram below will give you an idea of what the architecture will look like



If you want to replicate this lab, you can find the complete code repository here: GitHub - MireCloud Terraform Infra


Infrastructure Setup with Terraform

1. Creating a Dedicated VPC

To ensure isolation, we define a VPC named Monitoring with a CIDR block of 10.0.0.0/16.

resource "aws_vpc" "Monitoring" {
  cidr_block = "10.0.0.0/16"
  tags = {
    Name = "Monitoring"
  }
}

2. Defining the Subnet

A subnet is created within this VPC to host monitoring components.

resource "aws_subnet" "Monitoring-subnet" {
  vpc_id            = aws_vpc.Monitoring.id
  cidr_block        = "10.0.1.0/24"
  availability_zone = "us-east-1a"
  map_public_ip_on_launch = true
  tags = {
    Name = "Monitoring-subnet"
  }
}

3. Deploying EC2 Instances for Prometheus and Grafana

Each instance is assigned a dedicated network interface within the monitoring subnet.

Network Interfaces

resource "aws_network_interface" "prometheus" {
  subnet_id   = aws_subnet.Monitoring-subnet.id
  private_ips = ["10.0.1.100"]
  security_groups = [aws_security_group.prometheus-security-group.id] 
  tags = {
    Name = "prometheus_network_interface"
  }
}

resource "aws_network_interface" "grafana" {
  subnet_id   = aws_subnet.Monitoring-subnet.id
  private_ips = ["10.0.1.101"]
  security_groups = [aws_security_group.grafana-security-group.id] 
  tags = {
    Name = "grafana_network_interface"
  }
}

Provisioning EC2 Instances

resource "aws_instance" "prometheus" {
  ami           = "ami-04b4f1a9cf54c11d0"
  instance_type = "t2.micro"
  network_interface {
    network_interface_id = aws_network_interface.prometheus.id
    device_index         = 0
  }
  key_name = aws_key_pair.Monitoring_key.key_name  
  tags = {
    Name = "Prometheus"
  }
}

resource "aws_instance" "grafana" {
  ami           = "ami-04b4f1a9cf54c11d0"
  instance_type = "t2.micro"
  network_interface {
    network_interface_id = aws_network_interface.grafana.id
    device_index         = 0
  }
  key_name = aws_key_pair.Monitoring_key.key_name  
  tags = {
    Name = "Grafana"
  }
}

Dynamic Inventory for Ansible

A dynamic inventory script is used to fetch real-time information about running EC2 instances and generate an inventory file for Ansible.

Dynamic Inventory Script

import boto3
import json

ec2 = boto3.client("ec2", region_name="us-east-1")
response = ec2.describe_instances(Filters=[{"Name": "instance-state-name", "Values": ["running"]}])

inventory = {"all": {"hosts": {}}}

for reservation in response["Reservations"]:
    for instance in reservation["Instances"]:
        instance_name = instance.get("Tags", [{}])[0].get("Value", instance["InstanceId"])
        public_ip = instance.get("PublicIpAddress", "")
        if public_ip:
            inventory["all"]["hosts"][instance_name] = {
                "ansible_host": public_ip,
                "ansible_user": "ubuntu",
                "ansible_ssh_private_key_file": "/home/asd/.ssh/github"
            }

with open("inventory.json", "w") as f:
    json.dump(inventory, f, indent=4)

This script ensures that Ansible dynamically retrieves IP addresses of all running EC2 instances, making the deployment process seamless and automated.


Configuration with Ansible

Once the infrastructure is provisioned, we configure Prometheus, Grafana, and Node Exporter using Ansible playbooks.

1. Prometheus Playbook

- hosts: Prometheus
  name: Install Prometheus
  roles:
    - prometheus.prometheus.prometheus
  vars:
    prometheus_targets:
      node:
        - targets:
            - "{{ hostvars['Prometheus']['ansible_host'] }}:9100"
            - "{{ hostvars['grafana']['ansible_host'] }}:9100"
          labels:
            env: monitoring

2. Node Exporter Installation

- hosts: all
  name: Install Node Exporter
  roles:
    - prometheus.prometheus.node_exporter

3. Grafana Playbook

- hosts: grafana
  name: Install Grafana
  roles:
    - grafana.grafana.grafana
  vars:
    grafana_security:
      admin_user: "admin"
      admin_password: "Password123#"
    grafana_datasources:
      - name: prometheus
        type: prometheus
        url: "http://{{ hostvars['Prometheus']['ansible_host'] }}:9090"

Conclusion

By leveraging Terraform for infrastructure provisioning and Ansible for configuration management, we achieve a scalable, automated monitoring setup on AWS. The inclusion of a dynamic inventory enhances flexibility by allowing real-time updates of server information without manual intervention.

🚀 Next Steps: Expand this setup with Loki for log management.

🔗 Find the complete code here: GitHub - MireCloud Terraform Infra

Comments

Popular posts from this blog

Observability with grafana and prometheus (SSO configutation with active directory)

How to Set Up Grafana Single Sign-On (SSO) with Active Directory (AD) Grafana is a powerful tool for monitoring and visualizing data. Integrating it with Active Directory (AD) for Single Sign-On (SSO) can streamline access and enhance security. This tutorial will guide you through the process of configuring Grafana with AD for SSO. Prerequisites Active Directory Domain : Ensure you have an AD domain set up. Domain: bazboutey.local AD Server IP: 192.168.170.212 Users: grafana (for binding AD) user1 (to demonstrate SSO) we will end up with a pattern like this below Grafana Installed : Install Grafana on your server. Grafana Server IP: 192.168.179.185 Administrator Privileges : Access to modify AD settings and Grafana configurations. Step 1: Configure AD for LDAP Integration Create a Service Account in AD: Open Active Directory Users and Computers. Create a user (e.g., grafana ). Assign this user a strong password (e.g., Grafana 123$ ) and ensure it doesn’t expire. Gather Required AD D...

Building a Static Website on AWS with Terraform

The Journey to a Fully Automated Website Deployment A few weeks ago, I found myself needing to deploy a simple static website . Manually setting up an S3 bucket, configuring permissions, and linking it to a CloudFront distribution seemed like a tedious process. As someone who loves automation, I decided to leverage Terraform to simplify the entire process. Why Terraform? Infrastructure as Code (IaC) is a game-changer. With Terraform, I could:  Avoid manual setup errors  Easily reproduce and  Automate security best practices Instead of clicking through AWS settings, I wrote a few Terraform scripts and deployed everything in minutes. Let me walk you through how I did it!  Architecture Overview The architecture consists of three main components: User:  The end user accesses the website via a CloudFront URL.  CloudFront Distribution:  Acts as a content delivery network (CDN) to distribute content efficiently, reduce latency, and enhance security. It ...