On April 28, 2026, one of our Ethereum validator clusters servicing Lido experienced an execution-layer failure caused by corruption within the Geth database. As a result, the node entered an optimistic state and validators temporarily stopped attesting until recovery was completed.
The issue was isolated to a single validator instance. No slashing occurred, and no validator keys or withdrawal credentials were affected.
We reimbursed the protocol for the estimated missed rewards associated with this incident, totaling 0.9778 ETH.
Impact
- One validator cluster stopped attesting temporarily.
- Validators entered optimistic mode due to execution-layer inconsistency.
- No key material was compromised.
Timeline (UTC)
- 10:49 β Geth exited unexpectedly with a failure status.
- 12:12 β Geth and beacon services were restarted, but Geth entered a persistent backfill failure loop caused by corrupted execution-layer data.
- ~12:30 β Investigation confirmed the node had entered optimistic mode, preventing validator attestations.
- ~13:00 β Recovery procedures began using a PBSS-compatible Geth snapshot matching the nodeβs
pathstate configuration. - ~17:30 β The execution and consensus layers were successfully restored and synchronized. Validators resumed normal attestation duties.
- ~17:57 β Full health checks confirmed the validator cluster was fully recovered and operating normally.
Investigation
- Logs showed repeated:
retrieved hash chain is invalid: missing parentBeacon backfilling failed
- The beacon node marked the execution layer as optimistic, preventing validator attestations.
Recovery
- We initiated a restore using a PBSS-compatible Geth snapshot matching the nodeβs
pathstate configuration. - Recovery was extended by several operational issues, including:
- disk-space exhaustion during extraction,
- automatic service restarts during restoration,
- and snapshot consistency issues affecting ancient receipt indexes.
Final Recovery
- The execution layer was successfully rebuilt and synchronized.
- Beacon synchronization completed shortly afterward.
- Validators resumed normal attestation duties.
Root Cause
The incident was caused by corruption within the Geth execution-layer database, which resulted in invalid chain reconstruction and backfill failures. We also identified a historical out-of-memory event on the instance, which may have contributed to database instability.
Preventative Actions
We are implementing the following improvements:
- Improving recovery SOPs, including disabling automatic service restarts during restoration. Migration should be considered if the proposed fix takes longer than 2-3 hours.
- Adding additional monitoring for optimistic execution states and abnormal Geth behavior.
- Expanding snapshot validation and disk-space prechecks before recovery operations.
Incedent on 05/02
On May 2, 2026, one of our Ethereum validator clusters servicing Lido experienced an outage after the underlying bare metal instance was inadvertently terminated during an infrastructure operation.
The incident was caused by human error and was isolated to a single validator instance. Validator duties resumed after a replacement server was provisioned and validator services were migrated to the new instance.
Impact
- One validator cluster temporarily stopped attesting.
- Validator duties resumed after migration to a replacement server.
Timeline (UTC)
- 05:28 β The underlying bare metal server was inadvertently terminated during an infrastructure operation.
- 06:03 β Investigation determined that the validator outage was caused by accidental server termination. The team began provisioning a replacement server and preparing for validator migration.
- 12:54 β The authorized key management system became available, allowing the migration process to proceed.
- 13:59 β Validator credentials were successfully accessed and prepared for migration.
- 14:50 β Migration was completed and all validators returned to normal operation.
Root Cause
The incident was caused by an operational error that led to the accidental termination of the underlying bare-metal server hosting the validator cluster. The duration of the outage was extended by the temporary unavailability of the authorized key management system required to complete the migration process.
Preventative Actions
We are implementing the following improvements:
- Introducing additional safeguards and approval requirements for bare metal decommissioning operations.
- Updating operational runbooks for validator migration and infrastructure recovery scenarios.
- Expanding communication channels and emergency on-call coverage to improve response coordination during infrastructure incidents.
Reimbursement
- The reimbursement transaction for the 04/28 incident, 0.9778 ETH has been completed.
Transaction: Ethereum Transaction Hash: 0xcff5d56204... | Etherscan - The reimbursement transaction for the 05/02 incident, 0.825 ETH has been completed. Transaction: Ethereum Transaction Hash: 0x20d92b3a10... | Etherscan