SOC 2 Infrastructure Monitoring Process

Learn how to implement Infrastructure Monitoring for SOC 2 compliance under CC7.1, including alerts, dashboards, and auditor-ready evidence.

Controls: CC7.1 Frequency: Ongoing Owner: Engineering Lead Complexity: High

Overview

Infrastructure Monitoring is the continuous observation and alerting of system performance, availability, and resource utilization to detect and respond to operational issues. This process supports SOC 2 CC7.1 by ensuring potential system failures or security-impacting events are identified and addressed in a timely manner.

Step-by-Step Process

Define monitoring scope
The Engineering Lead identifies all in-scope infrastructure components, including servers, databases, containers, and cloud services that support SOC 2 scoped systems. The output is a documented monitoring scope list aligned to the system inventory.
Role: Engineering Lead
Configure core metrics
Engineering configures baseline metrics such as CPU utilization, memory usage, disk space, network throughput, and service uptime for each in-scope component. The output is an active set of metrics visible in the monitoring tool.
Role: Engineering Lead
Set alert thresholds
Engineering defines alert thresholds and severity levels based on operational risk (e.g., CPU > 85% for 5 minutes). The output is a documented and enabled alert configuration tied to escalation rules.
Role: Engineering Lead
Enable alert notifications
Engineering configures alert notifications to route to approved channels such as email, Slack, or PagerDuty. The output is verified alert delivery to on-call personnel.
Role: Engineering Lead
Review monitoring dashboards
Engineering reviews infrastructure dashboards on an ongoing basis to identify anomalies or degradation trends. The output is operational awareness and early identification of issues.
Role: Engineering Lead
Respond to alerts
On-call engineers investigate triggered alerts, remediate underlying issues, and document actions taken. The output is resolved alerts with timestamps and resolution notes.
Role: Engineering Lead
Periodically validate monitoring
Engineering performs periodic checks to confirm all in-scope systems are still monitored and alerts are functioning as intended. The output is an updated monitoring validation record.
Role: Engineering Lead

What You Need Before Starting

Approved system inventory identifying SOC 2 in-scope infrastructure
Administrative access to monitoring tools (Datadog, New Relic, or CloudWatch)
On-call rotation and escalation contact list
Documented alerting and incident response expectations

Evidence Your Auditor Expects

Dated screenshot of active infrastructure dashboard showing in-scope systems (with timestamp visible)
Alert configuration export or screenshot showing thresholds and notification channels with last modified date
Sample alert log demonstrating a triggered alert and resolution timestamp within the audit period
Monitoring scope document mapped to system inventory with last review date

How This Looks In Your Tools

Datadog

Log in to Datadog and navigate to Infrastructure > Host Map to verify all production hosts are reporting metrics. Go to Metrics > Summary to confirm CPU, memory, disk, and network metrics are actively collecting.

Navigate to Monitors > New Monitor and select the monitor type (e.g., Infrastructure or Metric). Configure thresholds (such as system.cpu.user > 85%) and set alert conditions, then assign notification channels under Notify your team.

Access Dashboards > Dashboard List to review or create an infrastructure dashboard. Confirm dashboards are updated in real time and save any changes with a clear name indicating production scope.

New Relic

Log in to New Relic and go to Infrastructure > Hosts or Infrastructure > Kubernetes to confirm all in-scope entities are reporting data. Review the entity list to ensure no critical systems are missing.

Navigate to Alerts & AI > Alert conditions and create or review alert conditions for key metrics such as CPU, memory, and disk utilization. Assign alert policies and verify notification channels under Alerts & AI > Notification channels.

Go to Dashboards and open the Infrastructure dashboard to review real-time performance trends. Save any updates and confirm dashboards reflect current production infrastructure.

CloudWatch

Log in to the AWS Console and navigate to CloudWatch > Metrics. Review namespaces such as EC2, RDS, and ECS to confirm metrics are being collected for all in-scope resources.

Go to CloudWatch > Alarms and create or review alarms for critical metrics (e.g., CPUUtilization, FreeStorageSpace). Configure alarm thresholds and set notifications using an SNS topic tied to on-call contacts.

Navigate to CloudWatch > Dashboards to review or create dashboards that display infrastructure health. Save dashboards and ensure widgets display current data with correct time ranges.

Common Audit Findings

Incomplete monitoring coverage: This occurs when new infrastructure is deployed without being added to monitoring. Prevent this by linking monitoring configuration checks to deployment or provisioning workflows.
Alert thresholds not defined: Auditors often find metrics collected without actionable alert thresholds. Establish documented standards for thresholds and review them periodically.
Alerts not routed to on-call staff: Alerts may exist but notify inactive or incorrect channels. Regularly test alert notifications and validate the on-call contact list.
Lack of evidence for ongoing review: Teams may monitor systems but fail to retain proof of review. Preserve dashboard screenshots and alert logs with timestamps to demonstrate ongoing monitoring.

SOC 2 Infrastructure Monitoring Process

Overview

Step-by-Step Process

Define monitoring scope

Configure core metrics

Set alert thresholds

Enable alert notifications

Review monitoring dashboards

Respond to alerts

Periodically validate monitoring

What You Need Before Starting

Evidence Your Auditor Expects

How This Looks In Your Tools

Common Audit Findings

Related Processes

Key Roles