Amazon CloudWatch is a monitoring and observability service that enables the collection and tracking of metrics, monitoring of log files, setting of alarms, and automatic reactions to changes in AWS resources.
This service was built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers.
Amazon CloudWatch is also a key service for monitoring in real time.
CloudWatch provides users with data and actionable insights to monitor their respective applications, stimulate system-wide performance changes, and optimize resource utilization.
Why Amazon CloudWatch?
It monitors applications for:
Performance
Health
Resource Use
How Amazon CloudWatch Works
CloudWatch is configured to the resources you want to monitor.
Agents collect logs from resources on-premises or on AWS.
CloudWatch provides an overall view of the resources and helps troubleshoot issues via a dashboard.
The service makes operational changes, like auto-scaling AWS resources, based on changes to the resources.
CloudWatch performs real-time analysis based on the logs it receives.
Components of CloudWatch
EC2: You must have an EC2 instance running on your machine to use CloudWatch.
Resource Groups: Organize all resources, such as EC2 instances, RDS databases, and S3 buckets, into groups using tags.
CloudWatch Logs: View logs as a single stream of time-ordered events, visualized using graphs.
CloudWatch Alarms: Set alarms that notify you if a metric exceeds a certain threshold.
Amazon CloudWatch Features
Metrics: Represents a time-ordered set of data points published to Amazon CloudWatch.
Each data point is marked with a timestamp.
A metric is a variable being monitored; data points are the values of that variable over time.
Uniquely defined by a name, namespace, and zero or more dimensions.
Metric math queries multiple CloudWatch metrics and uses mathematical expressions to create new time series based on them.
Dimensions: Name/value pairs that uniquely identify a metric.
Each unique name/value pair added to a metric creates a new variation of that metric.
Statistics: Metric data aggregations over specified periods of time.
Available statistics include maximum, minimum, sum, average.
Alarm: Automatically initiates actions.
Watches a metric over time and performs actions based on the value of that metric.
Used to monitor AWS charges.
Percentiles: Represent the relative weight of data in a dataset.
Used to understand the distribution of metric data.
CloudWatch Dashboard: A console for monitoring resources in a single view.
No limit to the number of dashboards you can create.
Dashboards are global and not region-specific.
CloudWatch Agent: Collects logs and system-level metrics from EC2 instances and on-premises servers; must be installed.
CloudWatch Events: A set of rules that match events, such as the stopping of an EC2 instance.
Events can be routed to AWS Lambda functions, Amazon SNS topics, Amazon SQS queues, and other targets.
CloudWatch Events continuously monitor for changes to the state of an event. When a change occurs, notifications are sent, AWS Lambda functions are activated, etc..
An event indicates a change in the AWS environment. Events are generated whenever AWS resources change state.
Rules match events and route them to targets.
Targets process events and receive them in JSON format.
CloudWatch Logs: Used to store, monitor, and access files from various AWS resources.
Help troubleshoot system errors and maintain logs in durable storage.
Use Cases for CloudWatch
Monitor the performance of AWS resources, applications, and infrastructure in real time.
Set up alarms to trigger notifications or actions in response to resource state changes.
Store, search, and analyze log data from AWS services, applications, and infrastructure components.
Trigger automatic scaling events by monitoring the performance of EC2 instances, RDS databases, etc..
Benefits of Amazon CloudWatch
Organizes the large amount of data produced by web applications into a dashboard.
Reduces the total cost of ownership by using alarms and taking automated actions to correct errors.
Optimizes applications and resources by examining logs and metric data.
Provides detailed application insights, like CPU utilization, capacity utilization, and memory utilization.
Provides a platform for comparing data produced by different AWS services.
Drawbacks of Amazon CloudWatch
Can be expensive, especially for large-scale needs.
May not be able to handle large amounts of data, especially during usage spikes.
Monitoring and logging processes can consume significant system resources, potentially impacting application performance.
Integration with other AWS services and third-party tools can be difficult.
Can be complex to set up and manage.
Challenges of CloudWatch
Complexity in Setup: Configuring monitoring and alarms can be difficult for new users, and interpreting metrics requires familiarity with AWS services and best practices.
Limited Visibility and Granularity: Provides metrics and logs at a high level that may not be granular enough for detailed analysis and troubleshooting.
Cost Management: Costs can accumulate quickly, especially when monitoring many resources or enabling detailed logging and retention settings.
Amazon CloudWatch Pricing
Free Tier: Includes up to 7 metrics, 3 alarms, 500 custom dashboards, and 5 GB of log storage per month.
Pay-as-you-go: Charges based on the base charge for each metric, price per GB of log storage, and price per dashboard.
CloudWatch vs. CloudTrail
CloudWatch: A service for monitoring and observing AWS resources, collecting and tracking metrics, and managing alarms.
Used for monitoring performance metrics, logs, and events to troubleshoot issues, optimize resource utilization, and maintain application health.
Offers features like metric collection, dashboards, alarms, logs, and events for real-time monitoring and automated responses.
CloudTrail: An auditing and logging service that captures API activity and provides an AWS API call history for governance, compliance, and security analysis.
Used to track API activity and resource changes, audit user activity, and generate insights for security analysis and compliance auditing.
Logs API calls, including caller identity, time of call, source IP address, and request parameters. Used for analysis, compliance reporting, and troubleshooting.