Deploying a Scalable Monitoring Stack Lab on AWS using Terraform and Ansible
Introduction
Effective monitoring is a cornerstone of cloud infrastructure management, ensuring high availability and performance. This guide provides a professional walkthrough on deploying Prometheus, Grafana, and Node Exporter on AWS using Terraform for infrastructure provisioning and Ansible for configuration management.
This lab will create a prometheus server and a grafana server, It will install node exporter on both server. You should be able to see the metrics in grafana, we already install a node exporter dashboard for the user.
The diagram below will give you an idea of what the architecture will look like
If you want to replicate this lab, you can find the complete code repository here: GitHub - MireCloud Terraform Infra.
Infrastructure Setup with Terraform
1. Creating a Dedicated VPC
To ensure isolation, we define a VPC named Monitoring
with a CIDR block of 10.0.0.0/16
.
resource "aws_vpc" "Monitoring" {
cidr_block = "10.0.0.0/16"
tags = {
Name = "Monitoring"
}
}
2. Defining the Subnet
A subnet is created within this VPC to host monitoring components.
resource "aws_subnet" "Monitoring-subnet" {
vpc_id = aws_vpc.Monitoring.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = true
tags = {
Name = "Monitoring-subnet"
}
}
3. Deploying EC2 Instances for Prometheus and Grafana
Each instance is assigned a dedicated network interface within the monitoring subnet.
Network Interfaces
resource "aws_network_interface" "prometheus" {
subnet_id = aws_subnet.Monitoring-subnet.id
private_ips = ["10.0.1.100"]
security_groups = [aws_security_group.prometheus-security-group.id]
tags = {
Name = "prometheus_network_interface"
}
}
resource "aws_network_interface" "grafana" {
subnet_id = aws_subnet.Monitoring-subnet.id
private_ips = ["10.0.1.101"]
security_groups = [aws_security_group.grafana-security-group.id]
tags = {
Name = "grafana_network_interface"
}
}
Provisioning EC2 Instances
resource "aws_instance" "prometheus" {
ami = "ami-04b4f1a9cf54c11d0"
instance_type = "t2.micro"
network_interface {
network_interface_id = aws_network_interface.prometheus.id
device_index = 0
}
key_name = aws_key_pair.Monitoring_key.key_name
tags = {
Name = "Prometheus"
}
}
resource "aws_instance" "grafana" {
ami = "ami-04b4f1a9cf54c11d0"
instance_type = "t2.micro"
network_interface {
network_interface_id = aws_network_interface.grafana.id
device_index = 0
}
key_name = aws_key_pair.Monitoring_key.key_name
tags = {
Name = "Grafana"
}
}
Dynamic Inventory for Ansible
A dynamic inventory script is used to fetch real-time information about running EC2 instances and generate an inventory file for Ansible.
Dynamic Inventory Script
import boto3
import json
ec2 = boto3.client("ec2", region_name="us-east-1")
response = ec2.describe_instances(Filters=[{"Name": "instance-state-name", "Values": ["running"]}])
inventory = {"all": {"hosts": {}}}
for reservation in response["Reservations"]:
for instance in reservation["Instances"]:
instance_name = instance.get("Tags", [{}])[0].get("Value", instance["InstanceId"])
public_ip = instance.get("PublicIpAddress", "")
if public_ip:
inventory["all"]["hosts"][instance_name] = {
"ansible_host": public_ip,
"ansible_user": "ubuntu",
"ansible_ssh_private_key_file": "/home/asd/.ssh/github"
}
with open("inventory.json", "w") as f:
json.dump(inventory, f, indent=4)
This script ensures that Ansible dynamically retrieves IP addresses of all running EC2 instances, making the deployment process seamless and automated.
Configuration with Ansible
Once the infrastructure is provisioned, we configure Prometheus, Grafana, and Node Exporter using Ansible playbooks.
1. Prometheus Playbook
- hosts: Prometheus
name: Install Prometheus
roles:
- prometheus.prometheus.prometheus
vars:
prometheus_targets:
node:
- targets:
- "{{ hostvars['Prometheus']['ansible_host'] }}:9100"
- "{{ hostvars['grafana']['ansible_host'] }}:9100"
labels:
env: monitoring
2. Node Exporter Installation
- hosts: all
name: Install Node Exporter
roles:
- prometheus.prometheus.node_exporter
3. Grafana Playbook
- hosts: grafana
name: Install Grafana
roles:
- grafana.grafana.grafana
vars:
grafana_security:
admin_user: "admin"
admin_password: "Password123#"
grafana_datasources:
- name: prometheus
type: prometheus
url: "http://{{ hostvars['Prometheus']['ansible_host'] }}:9090"
Conclusion
By leveraging Terraform for infrastructure provisioning and Ansible for configuration management, we achieve a scalable, automated monitoring setup on AWS. The inclusion of a dynamic inventory enhances flexibility by allowing real-time updates of server information without manual intervention.
🚀 Next Steps: Expand this setup with Loki for log management.
🔗 Find the complete code here: GitHub - MireCloud Terraform Infra
Comments
Post a Comment