Summary
On June, 24th 2026, our Lido-2 cluster experienced an outage caused by an infrastructure issue affecting an OVH Cloud instance. The incident impacted our Vouch node and the our OVH MikroTik router node.
While the primary root cause was an external infrastructure failure, a combination of operational circumstances prevented faster detection and automatic recovery.
Root Cause
The outage was triggered by an OVH Cloud instance failure that affected the availability of both:
· Vouch node
· MikroTik node
Because both components were unavailable simultaneously, our current failover strategy could not operate as designed.
Detection
Unfortunately, one day before the incident we had muted infrastructure alerts while performing maintenance on an unrelated infrastructure component. Due to an operational oversight, the alert suppression also affected the monitoring for the Lido-2 environment, although this was not intended.
As a result, the outage was not detected through the usual alerting channels during the night.
Instead, an emergency call was triggered later by our secondary monitoring system. At the beginning of the regular working day, one of our engineers identified the issue immediately and started the recovery process.
Resolution
During the investigation we determined that the failure of the MikroTik node also prevented the existing Vouch failover mechanism from functioning correctly. This exposed a weakness in our current architecture.
To restore service, we deployed fresh instances of both:
· Vouch
· MikroTik
The cluster returned to normal operation immediately after the new deployments became available.
Lessons Learned
This incident highlighted that our current Vouch failover strategy requires further improvement. As an immediate follow-up action, we will begin testing Vouch’s multi-instance capability in our Hoodi test environment. The goal is to operate a secondary active Vouch instance in a different datacenter and on separate provider infrastructure, allowing automatic takeover if the primary instance becomes unavailable.
Once validated, this architecture will be rolled out to production to eliminate this single point of failure.
Testing of this improved setup will begin immediately.
Compensation
Blockscape has already compensated Lido stakers for the rewards lost during the outage: See on Etherscan