Skip to content

Commit 9aa03f2

Browse files
Merge pull request #4 from prajwalakhuj/monitoring
Added CW CPU and memory alerts Added sns topic resource with lambda code. Added support for the cw alerts integration with slack.
2 parents 5595741 + 161e3c1 commit 9aa03f2

15 files changed

+481
-6
lines changed

README.md

+27-1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ This Terraform module provisions an Amazon RDS PostgreSQL database on AWS. Amazo
1717
6. Supports encryption at rest using AWS Key Management Service (KMS) for enhanced security.
1818
7. Enables fine-grained control over network access through security groups and VPC settings.
1919
8. Offers customizable tags for resource categorization and management.
20+
9. CloudWatch Alerts: Set up CloudWatch alarms to monitor the health and performance of your Redis cluster. Integrate these alarms with AWS Simple Notification Service (SNS) to receive real-time alerts. Use AWS Lambda functions to customize your alerting logic, and send notifications to Slack channels for immediate visibility into your RDS POstgresql status.
2021

2122
## Usage Examples
2223
```hcl
@@ -41,9 +42,15 @@ module "rds-pg" {
4142
deletion_protection = false
4243
allowed_security_groups = ["sg-013cbf880"]
4344
final_snapshot_identifier_prefix = "final"
45+
cloudwatch_metric_alarms_enabled = true
46+
alarm_cpu_threshold_percent = 70
47+
disk_free_storage_space = "10000000" # in bytes
48+
slack_username = "John"
49+
slack_channel = "skaf-dev"
50+
slack_webhook_url = "https://hooks/xxxxxxxx"
4451
}
4552
```
46-
Refer [examples](https://github.com/squareops/terraform-aws-rds-postgresql/tree/main/example/complete) for more details.
53+
Refer [examples](https://github.com/squareops/terraform-aws-rds-postgresql/tree/main/examples/complete) for more details.
4754

4855
## IAM Permissions
4956
The required IAM permissions to create resources from this module can be found [here](https://github.com/squareops/terraform-aws-rds-postgresql/blob/main/IAM.md)
@@ -60,21 +67,31 @@ The required IAM permissions to create resources from this module can be found [
6067

6168
| Name | Version |
6269
|------|---------|
70+
| <a name="provider_archive"></a> [archive](#provider\_archive) | 2.4.0 |
6371
| <a name="provider_aws"></a> [aws](#provider\_aws) | 3.43.0 |
6472

6573
## Modules
6674

6775
| Name | Source | Version |
6876
|------|--------|---------|
77+
| <a name="module_cw_sns_slack"></a> [cw\_sns\_slack](#module\_cw\_sns\_slack) | ./lambda | n/a |
6978
| <a name="module_db"></a> [db](#module\_db) | terraform-aws-modules/rds/aws | ~> 3.0 |
7079
| <a name="module_security_group_rds"></a> [security\_group\_rds](#module\_security\_group\_rds) | terraform-aws-modules/security-group/aws | ~> 4 |
7180

7281
## Resources
7382

7483
| Name | Type |
7584
|------|------|
85+
| [aws_cloudwatch_metric_alarm.cache_cpu](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/resources/cloudwatch_metric_alarm) | resource |
86+
| [aws_cloudwatch_metric_alarm.disk_free_storage_space_too_low](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/resources/cloudwatch_metric_alarm) | resource |
87+
| [aws_kms_ciphertext.slack_url](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/resources/kms_ciphertext) | resource |
88+
| [aws_kms_key.this](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/resources/kms_key) | resource |
89+
| [aws_lambda_permission.sns_lambda_slack_invoke](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/resources/lambda_permission) | resource |
7690
| [aws_security_group_rule.cidr_ingress](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/resources/security_group_rule) | resource |
7791
| [aws_security_group_rule.default_ingress](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/resources/security_group_rule) | resource |
92+
| [aws_sns_topic.slack_topic](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/resources/sns_topic) | resource |
93+
| [aws_sns_topic_subscription.slack-endpoint](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/resources/sns_topic_subscription) | resource |
94+
| [archive_file.lambdazip](https://registry.terraform.io/providers/hashicorp/archive/latest/docs/data-sources/file) | data source |
7895
| [aws_availability_zones.available](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/data-sources/availability_zones) | data source |
7996
| [aws_region.current](https://registry.terraform.io/providers/hashicorp/aws/3.43.0/docs/data-sources/region) | data source |
8097

@@ -83,16 +100,21 @@ The required IAM permissions to create resources from this module can be found [
83100
| Name | Description | Type | Default | Required |
84101
|------|-------------|------|---------|:--------:|
85102
| <a name="input_additional_tags"></a> [additional\_tags](#input\_additional\_tags) | A map of additional tags to apply to the AWS resources | `map(string)` | <pre>{<br> "automation": "true"<br>}</pre> | no |
103+
| <a name="input_alarm_actions"></a> [alarm\_actions](#input\_alarm\_actions) | Alarm action list | `list(string)` | `[]` | no |
104+
| <a name="input_alarm_cpu_threshold_percent"></a> [alarm\_cpu\_threshold\_percent](#input\_alarm\_cpu\_threshold\_percent) | CPU threshold alarm level | `number` | `75` | no |
86105
| <a name="input_allocated_storage"></a> [allocated\_storage](#input\_allocated\_storage) | The allocated storage capacity for the database in gibibytes (GiB) | `number` | `20` | no |
87106
| <a name="input_allowed_cidr_blocks"></a> [allowed\_cidr\_blocks](#input\_allowed\_cidr\_blocks) | A list of CIDR blocks that are allowed to access the database | `list(any)` | `[]` | no |
88107
| <a name="input_allowed_security_groups"></a> [allowed\_security\_groups](#input\_allowed\_security\_groups) | A list of Security Group IDs to allow access to the database | `list(any)` | `[]` | no |
89108
| <a name="input_apply_immediately"></a> [apply\_immediately](#input\_apply\_immediately) | Specifies whether any cluster modifications are applied immediately or during the next maintenance window | `bool` | `false` | no |
90109
| <a name="input_backup_retention_period"></a> [backup\_retention\_period](#input\_backup\_retention\_period) | The number of days to retain backups for | `number` | `5` | no |
91110
| <a name="input_backup_window"></a> [backup\_window](#input\_backup\_window) | The preferred window for taking automated backups of the database | `string` | `""` | no |
111+
| <a name="input_cloudwatch_metric_alarms_enabled"></a> [cloudwatch\_metric\_alarms\_enabled](#input\_cloudwatch\_metric\_alarms\_enabled) | Boolean flag to enable/disable CloudWatch metrics alarms | `bool` | `false` | no |
92112
| <a name="input_create_random_password"></a> [create\_random\_password](#input\_create\_random\_password) | Whether to create a random password for the RDS primary cluster | `bool` | `true` | no |
93113
| <a name="input_create_security_group"></a> [create\_security\_group](#input\_create\_security\_group) | Whether to create a security group for the database | `bool` | `true` | no |
114+
| <a name="input_cw_sns_topic_arn"></a> [cw\_sns\_topic\_arn](#input\_cw\_sns\_topic\_arn) | The username to use when sending notifications to Slack. | `string` | `""` | no |
94115
| <a name="input_db_name"></a> [db\_name](#input\_db\_name) | The name of the automatically created database on cluster creation | `string` | `""` | no |
95116
| <a name="input_deletion_protection"></a> [deletion\_protection](#input\_deletion\_protection) | Specifies whether accidental deletion protection is enabled | `bool` | `true` | no |
117+
| <a name="input_disk_free_storage_space"></a> [disk\_free\_storage\_space](#input\_disk\_free\_storage\_space) | Alarm threshold for the 'lowFreeStorageSpace' alarm | `string` | `"10000000000"` | no |
96118
| <a name="input_enable_ssl_connection"></a> [enable\_ssl\_connection](#input\_enable\_ssl\_connection) | Whether to enable SSL connection to the database | `bool` | `false` | no |
97119
| <a name="input_engine"></a> [engine](#input\_engine) | The name of the database engine to be used for this DB cluster | `string` | `"postgres"` | no |
98120
| <a name="input_engine_version"></a> [engine\_version](#input\_engine\_version) | The database engine version. Updating this argument results in an outage | `string` | `""` | no |
@@ -106,11 +128,15 @@ The required IAM permissions to create resources from this module can be found [
106128
| <a name="input_master_username"></a> [master\_username](#input\_master\_username) | The username for the RDS primary cluster | `string` | `""` | no |
107129
| <a name="input_multi_az"></a> [multi\_az](#input\_multi\_az) | Enable multi-AZ for disaster recovery | `bool` | `false` | no |
108130
| <a name="input_name"></a> [name](#input\_name) | The name of the RDS instance | `string` | `""` | no |
131+
| <a name="input_ok_actions"></a> [ok\_actions](#input\_ok\_actions) | The list of actions to execute when this alarm transitions into an OK state from any other state. Each action is specified as an Amazon Resource Number (ARN) | `list(string)` | `[]` | no |
109132
| <a name="input_port"></a> [port](#input\_port) | The port number for the database | `number` | `5432` | no |
110133
| <a name="input_publicly_accessible"></a> [publicly\_accessible](#input\_publicly\_accessible) | Specifies whether the RDS instance is publicly accessible over the internet | `bool` | `false` | no |
111134
| <a name="input_random_password_length"></a> [random\_password\_length](#input\_random\_password\_length) | The length of the randomly generated password for the RDS primary cluster (default: 10) | `number` | `10` | no |
112135
| <a name="input_replicate_source_db"></a> [replicate\_source\_db](#input\_replicate\_source\_db) | Specifies that this resource is a replicate database, and uses the specified value as the source database identifier | `string` | `null` | no |
113136
| <a name="input_skip_final_snapshot"></a> [skip\_final\_snapshot](#input\_skip\_final\_snapshot) | Determines whether a final DB snapshot is created before the DB instance is deleted. If set to true, no DB snapshot is created. If set to false, a DB snapshot is created before the DB instance is deleted, using the value from final\_snapshot\_identifier | `bool` | `true` | no |
137+
| <a name="input_slack_channel"></a> [slack\_channel](#input\_slack\_channel) | The Slack channel where notifications will be posted. | `string` | `""` | no |
138+
| <a name="input_slack_username"></a> [slack\_username](#input\_slack\_username) | The username to use when sending notifications to Slack. | `string` | `""` | no |
139+
| <a name="input_slack_webhook_url"></a> [slack\_webhook\_url](#input\_slack\_webhook\_url) | The Slack Webhook URL where notifications will be sent. | `string` | `""` | no |
114140
| <a name="input_snapshot_identifier"></a> [snapshot\_identifier](#input\_snapshot\_identifier) | Specifies whether to create the database from a snapshot. Use the snapshot ID found in the RDS console, e.g., rds:production-2015-06-26-06-05 | `string` | `null` | no |
115141
| <a name="input_storage_encrypted"></a> [storage\_encrypted](#input\_storage\_encrypted) | Specifies whether to enable database encryption | `bool` | `true` | no |
116142
| <a name="input_subnet_ids"></a> [subnet\_ids](#input\_subnet\_ids) | A list of subnet IDs used by the database subnet group | `list(any)` | `[]` | no |

example/complete/README.md renamed to examples/complete/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ No providers.
2424

2525
| Name | Source | Version |
2626
|------|--------|---------|
27-
| <a name="module_rds-pg"></a> [rds-pg](#module\_rds-pg) | squareops/postgresql-rds/aws | n/a |
27+
| <a name="module_rds-pg"></a> [rds-pg](#module\_rds-pg) | squareops/rds-postgresql/aws | n/a |
2828

2929
## Resources
3030

example/complete/main.tf renamed to examples/complete/main.tf

+10-4
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
locals {
22
region = "us-east-2"
33
name = "postgresql"
4-
vpc_id = "vpc-00ae5571c1"
4+
vpc_id = "vpc-06861ba817a8cda10"
55
family = "postgres15"
6-
subnet_ids = ["subnet-0d9a8193d2a6e","subnet-0fd263dc9e73d"]
6+
subnet_ids = ["subnet-09e8f6ea27b7e36d0","subnet-0b070110454617a90"]
77
environment = "prod"
8-
kms_key_arn = "arn:aws:kms:us-east-2:22222222:key/73ff9e84-83e1-fe29623338a9"
8+
kms_key_arn = ""
99
engine_version = "15.2"
1010
instance_class = "db.m5d.large"
11-
allowed_security_groups = ["sg-0a680afd35"]
11+
allowed_security_groups = ["sg-0ef14212995d67a2d"]
1212
additional_tags = {
1313
Owner = "Organization_Name"
1414
Expires = "Never"
@@ -38,4 +38,10 @@ module "rds-pg" {
3838
allowed_security_groups = local.allowed_security_groups
3939
major_engine_version = local.engine_version
4040
deletion_protection = false
41+
cloudwatch_metric_alarms_enabled = true
42+
alarm_cpu_threshold_percent = 70
43+
disk_free_storage_space = "10000000" # in bytes
44+
slack_username = ""
45+
slack_channel = ""
46+
slack_webhook_url = ""
4147
}
File renamed without changes.
File renamed without changes.
File renamed without changes.

lambda/README.md

+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
## Lambda for SNS
2+
![squareops_avatar]
3+
4+
[squareops_avatar]: https://squareops.com/wp-content/uploads/2022/12/squareops-logo.png
5+
6+
### [SquareOps Technologies](https://squareops.com/) Your DevOps Partner for Accelerating cloud journey.
7+
<br>
8+
9+
Here is Lambda that calls the Slack webhook and passes the alarm message as the payload.
10+
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
11+
## Requirements
12+
13+
No requirements.
14+
15+
## Providers
16+
17+
| Name | Version |
18+
|------|---------|
19+
| <a name="provider_aws"></a> [aws](#provider\_aws) | 5.17.0 |
20+
21+
## Modules
22+
23+
No modules.
24+
25+
## Resources
26+
27+
| Name | Type |
28+
|------|------|
29+
| [aws_cloudwatch_log_group.lambda](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group) | resource |
30+
| [aws_iam_role.lambda_exec_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
31+
| [aws_iam_role_policy.lambda_cwl_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource |
32+
| [aws_lambda_function.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function) | resource |
33+
| [aws_iam_policy_document.lambda_cwl_access](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
34+
| [aws_iam_policy_document.lambda_exec_role_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
35+
36+
## Inputs
37+
38+
| Name | Description | Type | Default | Required |
39+
|------|-------------|------|---------|:--------:|
40+
| <a name="input_artifact_file"></a> [artifact\_file](#input\_artifact\_file) | The path to the function's deployment package within the local filesystem | `string` | `null` | no |
41+
| <a name="input_cwl_retention_days"></a> [cwl\_retention\_days](#input\_cwl\_retention\_days) | The retention time in days for the CloudWatch Logs Stream. | `number` | `30` | no |
42+
| <a name="input_description"></a> [description](#input\_description) | Description of what the Lambda Function does. | `string` | `null` | no |
43+
| <a name="input_environment"></a> [environment](#input\_environment) | The Lambda environment's configuration settings. | `map(string)` | `{}` | no |
44+
| <a name="input_handler"></a> [handler](#input\_handler) | The function entrypoint in the code. | `string` | `"index.handler"` | no |
45+
| <a name="input_memory_size"></a> [memory\_size](#input\_memory\_size) | Amount of memory in MB your Lambda Function can use at runtime. | `number` | `128` | no |
46+
| <a name="input_name"></a> [name](#input\_name) | A unique name for the Lambda Function. | `string` | n/a | yes |
47+
| <a name="input_runtime"></a> [runtime](#input\_runtime) | The Runtime used in the Lambda Function. | `string` | n/a | yes |
48+
| <a name="input_tags"></a> [tags](#input\_tags) | A mapping of tags to assign to the module resources. | `map(string)` | `{}` | no |
49+
| <a name="input_timeout"></a> [timeout](#input\_timeout) | The amount of time your Lambda Function has to run in seconds. | `number` | `6` | no |
50+
51+
## Outputs
52+
53+
| Name | Description |
54+
|------|-------------|
55+
| <a name="output_arn"></a> [arn](#output\_arn) | The ARN identifying the Lambda Function. |
56+
| <a name="output_exec_role_id"></a> [exec\_role\_id](#output\_exec\_role\_id) | The ID of the Function's IAM Role. |
57+
| <a name="output_invoke_arn"></a> [invoke\_arn](#output\_invoke\_arn) | The ARN to be used for invoking Lambda Function from API Gateway. |
58+
| <a name="output_name"></a> [name](#output\_name) | The name of the Lambda Function. |
59+
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->

lambda/data.tf

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Lambda Assume Role policy
2+
data "aws_iam_policy_document" "lambda_exec_role_policy" {
3+
statement {
4+
sid = "LambdaExecRolePolicy"
5+
effect = "Allow"
6+
principals {
7+
identifiers = [
8+
"lambda.amazonaws.com",
9+
]
10+
type = "Service"
11+
}
12+
actions = [
13+
"sts:AssumeRole",
14+
]
15+
}
16+
}
17+
18+
# Lambda CloudWatch Logs access
19+
data "aws_iam_policy_document" "lambda_cwl_access" {
20+
statement {
21+
sid = "LambdaCreateCloudWatchLogGroup"
22+
effect = "Allow"
23+
actions = [
24+
"logs:PutLogEvents",
25+
"logs:CreateLogStream",
26+
"logs:CreateLogGroup"
27+
]
28+
resources = [
29+
"arn:aws:logs:*:*:log-group:/aws/lambda/*:*:*"
30+
]
31+
}
32+
}

lambda/iam.tf

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
resource "aws_iam_role" "lambda_exec_role" {
2+
name = "${replace(title(var.name), "-", "")}LambdaExecRole"
3+
assume_role_policy = data.aws_iam_policy_document.lambda_exec_role_policy.json
4+
}
5+
6+
resource "aws_iam_role_policy" "lambda_cwl_policy" {
7+
name = "${replace(title(var.name), "-", "")}LambdaCWLogsPolicy"
8+
role = aws_iam_role.lambda_exec_role.id
9+
policy = data.aws_iam_policy_document.lambda_cwl_access.json
10+
}

lambda/main.tf

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
resource "aws_cloudwatch_log_group" "lambda" {
2+
name = "/aws/lambda/${var.name}"
3+
retention_in_days = var.cwl_retention_days
4+
tags = var.tags
5+
}
6+
7+
resource "aws_lambda_function" "this" {
8+
function_name = var.name
9+
description = var.description
10+
filename = var.artifact_file
11+
source_code_hash = var.artifact_file != null ? filebase64sha256(var.artifact_file) : null
12+
role = aws_iam_role.lambda_exec_role.arn
13+
handler = var.handler
14+
runtime = var.runtime
15+
memory_size = var.memory_size
16+
timeout = var.timeout
17+
18+
dynamic "environment" {
19+
for_each = (length(var.environment) > 0 ? [1] : [])
20+
content {
21+
variables = var.environment
22+
}
23+
}
24+
25+
tags = var.tags
26+
}

lambda/outputs.tf

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
output "name" {
2+
description = "The name of the Lambda Function."
3+
value = aws_lambda_function.this.function_name
4+
}
5+
6+
output "arn" {
7+
description = "The ARN identifying the Lambda Function."
8+
value = aws_lambda_function.this.arn
9+
}
10+
11+
output "invoke_arn" {
12+
description = "The ARN to be used for invoking Lambda Function from API Gateway."
13+
value = aws_lambda_function.this.invoke_arn
14+
}
15+
16+
output "exec_role_id" {
17+
description = "The ID of the Function's IAM Role."
18+
value = aws_iam_role.lambda_exec_role.id
19+
}

lambda/sns_slack.py

+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import json
2+
import re
3+
import os
4+
import boto3
5+
import urllib3
6+
7+
# Lambda global variables
8+
region = os.environ["AWS_REGION"] # from Lambda default envs
9+
slack_url = os.environ["SLACK_URL"]
10+
slack_channel = os.environ["SLACK_CHANNEL"]
11+
slack_user = os.environ["SLACK_USER"]
12+
13+
14+
http = urllib3.PoolManager()
15+
def format_cloudwatch_alarm_message(event):
16+
alarm_data = json.loads(event['Records'][0]['Sns']['Message'])
17+
18+
alarm_name = alarm_data["AlarmName"]
19+
alarm_description = alarm_data["AlarmDescription"]
20+
new_state = alarm_data["NewStateValue"]
21+
reason = alarm_data["NewStateReason"]
22+
metric_name = alarm_data["Trigger"]["MetricName"]
23+
threshold = alarm_data["Trigger"]["Threshold"]
24+
25+
message = f"*:exclamation: CloudWatch Alarm Alert :exclamation:*\n\n"
26+
message += f" *Alarm Name:* {alarm_name}\n"
27+
message += f" *Description:* _{alarm_description}_\n"
28+
message += f" *New State:* {new_state}\n"
29+
message += f" *Reason:* _{reason}_\n"
30+
message += f" *Metric Name:* {metric_name}\n"
31+
message += f" *Threshold:* {threshold}\n"
32+
33+
return message
34+
35+
def lambda_handler(event, context):
36+
url = slack_url
37+
msg = {
38+
"channel": slack_channel,
39+
"username": slack_user,
40+
"text": format_cloudwatch_alarm_message(event),
41+
"icon_emoji": ":cloudwatch:"
42+
}
43+
44+
encoded_msg = json.dumps(msg).encode('utf-8')
45+
resp = http.request('POST', url, body=encoded_msg)
46+
47+
print({
48+
"message": msg,
49+
"status_code": resp.status,
50+
"response": resp.data
51+
})

0 commit comments

Comments
 (0)