AWS CloudWatch Summary

What is AWS CloudWatch?

  • Amazon CloudWatch is a monitoring and observability service that enables the collection and tracking of metrics, monitoring of log files, setting of alarms, and automatic reactions to changes in AWS resources.
  • This service was built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers.
  • Amazon CloudWatch is also a key service for monitoring in real time.
  • CloudWatch provides users with data and actionable insights to monitor their respective applications, stimulate system-wide performance changes, and optimize resource utilization.

Why Amazon CloudWatch?

  • It monitors applications for:
    • Performance
    • Health
    • Resource Use

How Amazon CloudWatch Works

  1. CloudWatch is configured to the resources you want to monitor.
  2. Agents collect logs from resources on-premises or on AWS.
  3. CloudWatch provides an overall view of the resources and helps troubleshoot issues via a dashboard.
  4. The service makes operational changes, like auto-scaling AWS resources, based on changes to the resources.
  5. CloudWatch performs real-time analysis based on the logs it receives.

Components of CloudWatch

  • EC2: You must have an EC2 instance running on your machine to use CloudWatch.
  • Resource Groups: Organize all resources, such as EC2 instances, RDS databases, and S3 buckets, into groups using tags.
  • CloudWatch Logs: View logs as a single stream of time-ordered events, visualized using graphs.
  • CloudWatch Alarms: Set alarms that notify you if a metric exceeds a certain threshold.

Amazon CloudWatch Features

  • Metrics: Represents a time-ordered set of data points published to Amazon CloudWatch.
    • Each data point is marked with a timestamp.
    • A metric is a variable being monitored; data points are the values of that variable over time.
    • Uniquely defined by a name, namespace, and zero or more dimensions.
    • Metric math queries multiple CloudWatch metrics and uses mathematical expressions to create new time series based on them.
  • Dimensions: Name/value pairs that uniquely identify a metric.
    • Each unique name/value pair added to a metric creates a new variation of that metric.
  • Statistics: Metric data aggregations over specified periods of time.
    • Available statistics include maximum, minimum, sum, average.
  • Alarm: Automatically initiates actions.
    • Watches a metric over time and performs actions based on the value of that metric.
    • Used to monitor AWS charges.
  • Percentiles: Represent the relative weight of data in a dataset.
    • Used to understand the distribution of metric data.
  • CloudWatch Dashboard: A console for monitoring resources in a single view.
    • No limit to the number of dashboards you can create.
    • Dashboards are global and not region-specific.
  • CloudWatch Agent: Collects logs and system-level metrics from EC2 instances and on-premises servers; must be installed.
  • CloudWatch Events: A set of rules that match events, such as the stopping of an EC2 instance.
    • Events can be routed to AWS Lambda functions, Amazon SNS topics, Amazon SQS queues, and other targets.
    • CloudWatch Events continuously monitor for changes to the state of an event. When a change occurs, notifications are sent, AWS Lambda functions are activated, etc..
    • An event indicates a change in the AWS environment. Events are generated whenever AWS resources change state.
    • Rules match events and route them to targets.
    • Targets process events and receive them in JSON format.
  • CloudWatch Logs: Used to store, monitor, and access files from various AWS resources.
    • Help troubleshoot system errors and maintain logs in durable storage.

Use Cases for CloudWatch

  • Monitor the performance of AWS resources, applications, and infrastructure in real time.
  • Set up alarms to trigger notifications or actions in response to resource state changes.
  • Store, search, and analyze log data from AWS services, applications, and infrastructure components.
  • Trigger automatic scaling events by monitoring the performance of EC2 instances, RDS databases, etc..

Benefits of Amazon CloudWatch

  • Organizes the large amount of data produced by web applications into a dashboard.
  • Reduces the total cost of ownership by using alarms and taking automated actions to correct errors.
  • Optimizes applications and resources by examining logs and metric data.
  • Provides detailed application insights, like CPU utilization, capacity utilization, and memory utilization.
  • Provides a platform for comparing data produced by different AWS services.

Drawbacks of Amazon CloudWatch

  • Can be expensive, especially for large-scale needs.
  • May not be able to handle large amounts of data, especially during usage spikes.
  • Monitoring and logging processes can consume significant system resources, potentially impacting application performance.
  • Integration with other AWS services and third-party tools can be difficult.
  • Can be complex to set up and manage.

Challenges of CloudWatch

  • Complexity in Setup: Configuring monitoring and alarms can be difficult for new users, and interpreting metrics requires familiarity with AWS services and best practices.
  • Limited Visibility and Granularity: Provides metrics and logs at a high level that may not be granular enough for detailed analysis and troubleshooting.
  • Cost Management: Costs can accumulate quickly, especially when monitoring many resources or enabling detailed logging and retention settings.

Amazon CloudWatch Pricing

  • Free Tier: Includes up to 7 metrics, 3 alarms, 500 custom dashboards, and 5 GB of log storage per month.
  • Pay-as-you-go: Charges based on the base charge for each metric, price per GB of log storage, and price per dashboard.

CloudWatch vs. CloudTrail

  • CloudWatch: A service for monitoring and observing AWS resources, collecting and tracking metrics, and managing alarms.
    • Used for monitoring performance metrics, logs, and events to troubleshoot issues, optimize resource utilization, and maintain application health.
    • Offers features like metric collection, dashboards, alarms, logs, and events for real-time monitoring and automated responses.
  • CloudTrail: An auditing and logging service that captures API activity and provides an AWS API call history for governance, compliance, and security analysis.
    • Used to track API activity and resource changes, audit user activity, and generate insights for security analysis and compliance auditing.
    • Logs API calls, including caller identity, time of call, source IP address, and request parameters. Used for analysis, compliance reporting, and troubleshooting.