[Post Mortem] Solstice Incident - December 3, 2025 (fusaka update)

Incident Post Mortem

Date and Time of Incident: December 3, 2024, 23:15 CET
Duration of Incident: Approximately 6 hours and 35 minutes

Incident Summary

On December 3, 2024, at 23:15 CET, one validator client (VC#18) running Lighthouse v8.0.1 began rejecting attestations with “invalid signature” errors across all three connected node pairs (Nethermind, Nimbus, MEV-Boost). The issue was caused by a failed Docker Compose restart during a prior automated Ansible update on December 1, leaving the validator client running an outdated software version. This created a version mismatch between the validator client and the rest of the updated stack. The incident was resolved on December 4 at 05:50 CET by restarting the affected Docker stack. Moving forward, the automated update process has been hardened with additional restart validation and version checks to prevent undetected failed restarts.

Incident Timeline

  • December 3, 2024, 22:49 CET: Fusaka update deployed

  • December 3, 2024, 23:15 CET: First error messages appeared in the central monitoring solution; incident reported with 100-500 validators offline requiring urgent action

  • December 4, 2024, 00:47 CET: Solstice team responds, making phone calls to reach the Zurich team

  • December 4, 2024, 05:54 CET: Docker container restarted, service restored

  • December 4, 2024, 06:00 CET: Recovery confirmed

Root Cause Analysis

The incident was caused by a failure in the automated restart sequence of validator VC#18 following a scheduled update cycle. The infrastructure uses an Ansible-based automation process that performs rolling updates, restarting one validator client every 30 minutes to avoid fleet-wide impact. During the update run on December 1, the Docker Compose stack for VC#18 failed to restart correctly, resulting in:

  • The intended updated container image not being loaded

  • An older version of the validator client continuing to run

  • A version mismatch between the validator client and the rest of the updated stack (Nethermind v1.35.3, Nimbus v25.11.1, MEV-Boost v1.10.1)

This state was not detected by existing health checks, allowing the outdated process to remain in production until it began causing protocol-level errors (“invalid signature” and HTTP 400 errors). This was the first observed failure of this automation process in the 24 months since its introduction.

Actions Taken

  • Restarted the affected validator’s Docker Compose stack to restore the correct software version and normal attestation processing

  • Hardened the automated update process with additional restart validation and version checks

  • Increased DevOps team coverage to ensure 24×7 availability during travel, illness, and other unplanned team outages

Impact

Impact: One validator client (VC#18) experienced attestation failures, resulting in missed or rejected attestations for approximately 100-500 validators during the incident window of 6 hours and 35 minutes.

Financial Impact: The estimated impact to the protocol was calculated to be 1.0772 ETH.

This amount will be reimbursed by Solstice.

Follow-up Actions

  • Implement stricter health checks, version verification, and alerting for failed container restarts

  • Add immediate post-deployment restart and version checks to all validator update workflows

  • Strengthen automation resilience by introducing staged rollouts with automatic rollback and enhanced observability

Report Prepared By: Solstice Staking AG
Date: January 2026

5 Likes

Appreciate the transparency here, thank you Marcus!

2 Likes