Skip to content

Instantly share code, notes, and snippets.

@alivarzeshi
Last active June 21, 2024 21:55
Show Gist options
  • Save alivarzeshi/6cb96acddd0fd07227cfef5e687c5e38 to your computer and use it in GitHub Desktop.
Save alivarzeshi/6cb96acddd0fd07227cfef5e687c5e38 to your computer and use it in GitHub Desktop.
Date/Time: 2023-06-22 10:23:54.32
Log Level: Information
Source: Always On
Message: The state of the availability group 'AG1' has changed from 'RESOLVING' to 'PRIMARY'.
Date/Time: 2023-06-22 10:25:03.78
Log Level: Warning
Source: Always On
Message: An automatic failover occurred in availability group 'AG1'. The primary replica is now 'Replica
Date/Time: 2023-06-22 10:26:54.89
Log Level: Error
Source: Always On
Message: The availability replica 'Replica3' in availability group 'AG1' encountered a network connectivity issue.
Date/Time: 2023-06-22 10:28:37.54
Log Level: Warning
Source: Always On
Message: High synchronization latency detected for the availability replica 'Replica2' in availability group 'AG1'.
Date/Time: 2023-06-22 10:30:59.78
Log Level: Error
Source: Always On
Message: Lease renewal failed for availability group 'AG1'. The lease is expired, and the availability group is in a resolving state.
@alivarzeshi
Copy link
Author

Tip

In-Depth Analysis of Always On Availability Groups Events in SQL Server Error Logs

Overview

Always On Availability Groups (AGs) in SQL Server provide high availability and disaster recovery solutions. Events related to AGs are crucial for monitoring the health, performance, and failover activities of the AGs. These events include replica state changes, failover operations, connectivity issues, and other critical status updates.

Structure of Always On Availability Groups Event Log Entries

Each log entry for Always On Availability Groups events typically includes the following components:

  1. Date/Time: The exact timestamp when the event occurred.
  2. Log Level: The severity of the event (e.g., Information, Warning, Error).
  3. Source: The component or module that generated the log entry (e.g., Always On).
  4. Message: A detailed description of the event.
  5. Error Number: A unique identifier for any errors encountered during the event.
  6. Additional Data: Information such as the availability group name, replica name, and the state transition details.

Common Always On Availability Groups Events

1. Replica State Changes

  • Description: Indicates changes in the state of an availability replica (e.g., PRIMARY, SECONDARY, RESOLVING).
  • Internal Process:
    • SQL Server monitors the health and connectivity of replicas.
    • State changes are triggered by manual actions, failover events, or network issues.
    • The event is logged to provide insights into the availability and role of each replica.
  • Example Log Entry:
    Date/Time: 2023-06-22 10:23:54.32
    Log Level: Information
    Source: Always On
    Message: The state of the availability group 'AG1' has changed from 'RESOLVING' to 'PRIMARY'.
    

2. Failover Events

  • Description: Logs details of manual or automatic failover operations, including the reason and the involved replicas.
  • Internal Process:
    • Failover can be initiated manually by an administrator or automatically due to a failure condition.
    • The failover process involves promoting a secondary replica to primary and redirecting client connections.
    • The event is logged to track the failover occurrence and its cause.
  • Example Log Entry:
    Date/Time: 2023-06-22 10:25:03.78
    Log Level: Warning
    Source: Always On
    Message: An automatic failover occurred in availability group 'AG1'. The primary replica is now 'Replica2'.
    

3. Connectivity Issues

  • Description: Logs events related to network connectivity problems between primary and secondary replicas.
  • Internal Process:
    • SQL Server continuously monitors the network connectivity of availability replicas.
    • Connectivity issues can trigger alerts and automatic failovers if configured.
    • The event is logged to provide details about the connectivity problem and its impact.
  • Example Log Entry:
    Date/Time: 2023-06-22 10:26:54.89
    Log Level: Error
    Source: Always On
    Message: The availability replica 'Replica3' in availability group 'AG1' encountered a network connectivity issue.
    

4. Synchronization State Changes

  • Description: Logs changes in the synchronization state of replicas (e.g., SYNCHRONIZING, SYNCHRONIZED, NOT SYNCHRONIZING).
  • Internal Process:
    • SQL Server tracks the data synchronization status between primary and secondary replicas.
    • Synchronization state changes are logged to ensure data consistency and to diagnose issues.
  • Example Log Entry:
    Date/Time: 2023-06-22 10:28:37.54
    Log Level: Warning
    Source: Always On
    Message: High synchronization latency detected for the availability replica 'Replica2' in availability group 'AG1'.
    

5. Lease Status

  • Description: Logs the status of the lease mechanism between the SQL Server resource DLL and the SQL Server instance to prevent split-brain scenarios.
  • Internal Process:
    • The lease mechanism ensures that only one replica is primary at any time.
    • Lease status changes and issues are logged to prevent data corruption and ensure high availability.
  • Example Log Entry:
    Date/Time: 2023-06-22 10:30:59.78
    Log Level: Error
    Source: Always On
    Message: Lease renewal failed for availability group 'AG1'. The lease is expired, and the availability group is in a resolving state.
    

Additional Relevant Information

Error Number Details

  • Error: 35216 (Failover)

    • Severity: 16
    • State: Provides additional context about the failover condition.
    • Description: Indicates an automatic or manual failover event.
  • Error: 35217 (Connectivity Issues)

    • Severity: 16
    • State: Details about the network connectivity problem.
    • Description: Indicates a connectivity issue between replicas, which can affect synchronization and availability.

Importance of Monitoring Always On Availability Groups Events

  • High Availability: Ensures that the SQL Server environment is highly available and can recover from failures promptly.
  • Data Consistency: Monitors synchronization states to ensure data consistency across replicas.
  • Proactive Issue Resolution: Detects and addresses connectivity issues, failover events, and other critical problems before they impact the end users.

Background Processes and Mechanisms

State Monitoring

  • Health Checks: SQL Server performs regular health checks on each replica to determine its status.
  • State Transitions: Based on the health checks, replicas can transition between states like PRIMARY, SECONDARY, and RESOLVING.

Failover Mechanism

  • Automatic Failover: Triggered by failure conditions such as network issues or hardware failures, promoting a secondary replica to primary.
  • Manual Failover: Initiated by an administrator to switch the primary role to a different replica, often for maintenance or load balancing.

Synchronization Process

  • Data Movement: SQL Server continuously replicates transactions from the primary to secondary replicas.
  • Synchronization States: Tracks the state of data replication, ensuring that secondary replicas are up-to-date with the primary.

Efficient Reading of Always On Availability Groups Logs

  1. Using SQL Server Management Studio (SSMS):

    • Log File Viewer: Access and filter logs using the built-in Log File Viewer.
    • Filtering: Filter logs by date, log level, source, or specific keywords (e.g., "failover", "synchronization").
  2. Using T-SQL Commands:

    • xp_readerrorlog: Use this system stored procedure to query Always On events directly.
      EXEC xp_readerrorlog 0, 1, 'Always On';
  3. Using Extended Events:

    • Setup: Create an Extended Events session to capture Always On Availability Groups events.
      CREATE EVENT SESSION [AlwaysOnEvents] ON SERVER
      ADD EVENT sqlserver.availability_replica_state_change(
          ACTION(sqlserver.sql_text, sqlserver.database_id, sqlserver.username))
      ADD TARGET package0.event_file(SET filename=N'AlwaysOnEvents.xel');
      GO
      ALTER EVENT SESSION [AlwaysOnEvents] ON SERVER STATE = START;
      GO
  4. Automating Log Monitoring:

    • SQL Server Alerts: Configure SQL Server Agent alerts to notify you of critical Always On Availability Groups events.
      USE msdb;
      GO
      EXEC sp_add_alert
          @name = N'Always On Failover',
          @message_id = 35216,
          @severity = 16,
          @enabled = 1,
          @delay_between_responses = 0,
          @include_event_description_in = 1,
          @notification_message = N'Automatic failover occurred in Always On Availability Group.',
          @job_name = N'NotifyDBA';
      GO
      EXEC sp_add_notification
          @alert_name = N'Always On Failover',
          @operator_name = N'DBA',
          @notification_method = 1;

Conclusion

Always On Availability Groups events in SQL Server logs provide critical insights into the health, performance, and failover activities of the AGs. Understanding the structure of these logs, common errors, and the internal processes behind them allows administrators to effectively monitor, troubleshoot, and manage SQL Server availability groups. Regularly reviewing these logs ensures high availability, data consistency, and proactive issue resolution, maintaining optimal performance and reliability of SQL Server instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment