Post-Mortem Stakely June 16, 2025 Incident

Date: June 16th, 2025
Affected Epochs: 372962 to 372996
Duration: 3h 37min (04:37 to 08:14 UTC+2)
Validators Affected: 834
Validator Client: Teku v25.5.0
Consensus Layer: Teku 25.5.0
Execution Layer: Erigon 3.0.4

TL;DR:

A misconfiguration in Stakely’s alerting system led to a delayed response to a dual failure: an unhealthy Beacon Node and its cascading effect on two Validator Clients. As a result, 834 validators remained inactive for over 3 hours until manual intervention.

Summary

On June 16th, 2025, 834 Ethereum validators operated by Stakely experienced degraded performance and ultimately missed almost all attestations for over 3 hours. The incident spanned from epoch 372962 to 372996. The validators were running across two Teku Validator Clients (VCs), both located on the same server:

Root Cause

Although no common internal error was detected initially, the root cause appears related to the connection with an unhealthy Beacon Node (Erigon + Teku), which both affected VCs had listed as a secondary endpoint (failover). This particular BN was experiencing Java memory issues and inconsistent behavior during the incident:

According to the Teku team, the unhealthy BN likely saturated the executor thread pool on the affected VCs, leading to failures in processing attestations. VCs that did not include this BN in their fallback list were not impacted.

Relevant GitHub issue opened by the Teku team: Github Link

Contributing Key Factor

The incident duration was significantly extended due to a misconfiguration in Stakely on-call alerting system . As a result, the incident went unnoticed for over 3 hours and was only resolved after manually restarting the affected VCs.

Remediation Steps

  • On-call procedures and configuration have been reviewed and updated to prevent alerting failures in the future.
  • The faulty Beacon Node was removed from all Validator Client fallback configurations, and new software was updated on the Validator Clients.

Compensation

To compensate for the missed rewards, Stakely has sent 0.3265 ETH to the Lido Execution Rewards Vault:

:link: Transaction on Etherscan

7 Likes

Thanks guys. Great professional standards as always. Thanks for the update, the reimbursement and most of all great job on finding the client issue as well as improving your monitoring and procedures.

5 Likes