Incident Post Mortem
Date and Time of Incident: December 3, 2024, 23:15 CET
Duration of Incident: Approximately 6 hours and 35 minutes
Incident Summary
On December 3, 2024, at 23:15 CET, one validator client (VC#18) running Lighthouse v8.0.1 began rejecting attestations with “invalid signature” errors across all three connected node pairs (Nethermind, Nimbus, MEV-Boost). The issue was caused by a failed Docker Compose restart during a prior automated Ansible update on December 1, leaving the validator client running an outdated software version. This created a version mismatch between the validator client and the rest of the updated stack. The incident was resolved on December 4 at 05:50 CET by restarting the affected Docker stack. Moving forward, the automated update process has been hardened with additional restart validation and version checks to prevent undetected failed restarts.
Incident Timeline
-
December 3, 2024, 22:49 CET: Fusaka update deployed
-
December 3, 2024, 23:15 CET: First error messages appeared in the central monitoring solution; incident reported with 100-500 validators offline requiring urgent action
-
December 4, 2024, 00:47 CET: Solstice team responds, making phone calls to reach the Zurich team
-
December 4, 2024, 05:54 CET: Docker container restarted, service restored
-
December 4, 2024, 06:00 CET: Recovery confirmed
Root Cause Analysis
The incident was caused by a failure in the automated restart sequence of validator VC#18 following a scheduled update cycle. The infrastructure uses an Ansible-based automation process that performs rolling updates, restarting one validator client every 30 minutes to avoid fleet-wide impact. During the update run on December 1, the Docker Compose stack for VC#18 failed to restart correctly, resulting in:
-
The intended updated container image not being loaded
-
An older version of the validator client continuing to run
-
A version mismatch between the validator client and the rest of the updated stack (Nethermind v1.35.3, Nimbus v25.11.1, MEV-Boost v1.10.1)
This state was not detected by existing health checks, allowing the outdated process to remain in production until it began causing protocol-level errors (“invalid signature” and HTTP 400 errors). This was the first observed failure of this automation process in the 24 months since its introduction.
Actions Taken
-
Restarted the affected validator’s Docker Compose stack to restore the correct software version and normal attestation processing
-
Hardened the automated update process with additional restart validation and version checks
-
Increased DevOps team coverage to ensure 24×7 availability during travel, illness, and other unplanned team outages
Impact
Impact: One validator client (VC#18) experienced attestation failures, resulting in missed or rejected attestations for approximately 100-500 validators during the incident window of 6 hours and 35 minutes.
Financial Impact: The estimated impact to the protocol was calculated to be 1.0772 ETH.
This amount will be reimbursed by Solstice.
Follow-up Actions
-
Implement stricter health checks, version verification, and alerting for failed container restarts
-
Add immediate post-deployment restart and version checks to all validator update workflows
-
Strengthen automation resilience by introducing staged rollouts with automatic rollback and enhanced observability
Report Prepared By: Solstice Staking AG
Date: January 2026