[Post-Mortem] Blockscape - OVH Infrastructure Incident

Summary

On June, 24th 2026, our Lido-2 cluster experienced an outage caused by an infrastructure issue affecting an OVH Cloud instance. The incident impacted our Vouch node and the our OVH MikroTik router node.

While the primary root cause was an external infrastructure failure, a combination of operational circumstances prevented faster detection and automatic recovery.

Root Cause

The outage was triggered by an OVH Cloud instance failure that affected the availability of both:
· Vouch node
· MikroTik node

Because both components were unavailable simultaneously, our current failover strategy could not operate as designed.

Detection

Unfortunately, one day before the incident we had muted infrastructure alerts while performing maintenance on an unrelated infrastructure component. Due to an operational oversight, the alert suppression also affected the monitoring for the Lido-2 environment, although this was not intended.
As a result, the outage was not detected through the usual alerting channels during the night.
Instead, an emergency call was triggered later by our secondary monitoring system. At the beginning of the regular working day, one of our engineers identified the issue immediately and started the recovery process.

Resolution

During the investigation we determined that the failure of the MikroTik node also prevented the existing Vouch failover mechanism from functioning correctly. This exposed a weakness in our current architecture.

To restore service, we deployed fresh instances of both:
· Vouch
· MikroTik

The cluster returned to normal operation immediately after the new deployments became available.

Lessons Learned

This incident highlighted that our current Vouch failover strategy requires further improvement. As an immediate follow-up action, we will begin testing Vouch’s multi-instance capability in our Hoodi test environment. The goal is to operate a secondary active Vouch instance in a different datacenter and on separate provider infrastructure, allowing automatic takeover if the primary instance becomes unavailable.

Once validated, this architecture will be rolled out to production to eliminate this single point of failure.

Testing of this improved setup will begin immediately.

Compensation

Blockscape has already compensated Lido stakers for the rewards lost during the outage: See on Etherscan

2 Likes

Thank you for the explanation regarding this incident and also for compensating the missed rewards.

2 Likes

Thank you for the transparency and the swift compensation to the stakers, Blockscape. While infrastructure incidents are part of operating in this space, your commitment to clear reporting and immediate remediation is appreciated. Moving toward a multi-instance Vouch architecture in different datacenters is a solid improvement to eliminate this single point of failure. Keep up the good work :rocket:

1 Like