Slashing Incident involving Launchnodes Validators - Oct 11, 2023

On October 11, 2023 20 validator slashings related to validators operators by Launchnodes as a part of the Lido protocol were observed. The node operator has shut down the impacted node(s) and is investigating the root cause.

The Node Operator is currently assessing the full impact and working on a plan to bring back the remaining validators together with the assistance of Lido DAO NOM contributors.

More information and a detailed incident report will follow.

https://twitter.com/LidoFinance/status/1712142945783013393

9 Likes

Update:

At this time, the situation is stable as the Launchnodes team worked through the night to slowly bring the rest of the validator infrastructure back up. The root cause of the slashing incident is understood, with small details still being investigated.

Slashed validator penalties currently amount to 20.04 ETH and are estimated to increase to 23.06 ETH by time of withdrawal. Infra downtime penalties and overall missed rewards (excl. EL rewards) amount to 5.663 ETH (including slashed validators).

Launchnodes immediately expressed their desire to compensate stakers at the soonest possible time, and so disbursed 25.663 ETH prior to today’s rebase, which means that stakers will see no reduced rewards for the day resulting from the outage and slashing.

Once future penalties and missed rewards for the slashed validators are finalised (i.e. the validators are withdrawn), a final calculation will be made and Launchnodes have indicated that they will also cover this amount resulting in no lost rewards for stETH holders.

The compensation transaction for the day’s reduced rewards can be found here.

A full post-mortem will be posted here in the coming days.

7 Likes

The blog with the post mortem which includes the full incident report by Launchnodes has been published.

Note: In the previous slashing incident, there was a DAO vote to lower the operator’s key limit as a precautionary measure to prevent additional stake being allocated.

In this case, while exploring options and considering functionality introduced in LoP V2, Launchnodes elected to remove an unused key (below their limit at the time, which was 4000 keys) which reset their vetted keys (i.e. validator limit) to their current number of used keys (2582) (details and relevant transaction are in the post mortem).

As a result, there is no current need to set a validator limit for the NO by explicit DAO vote.

EDIT:
Also including the relevant tweets here for posterity https://twitter.com/LidoFinance/status/1712898983998214645

6 Likes

An update on the slashing incident: as at Nov 16, 2023 the 20 validators in question are now withdrawable (and have thus stopped accumulating penalties).

The breakdown of the total penalties and missed rewards can be found in the below-attached image. Actual total penalties have been calculated as the change in balance of the relevant validators as of the slashing epoch for each validator up until the epoch that each became withdrawable. Missed rewards have been calculated in the same manner that they were in the post mortem. The actual amount of penalties was less than the initially projected amount per the post mortem.

The total amount of

Total Penalties & Missed Rewards (initial estimate) Total Penalties & Missed Rewards (actual) Initially Compensated on day of incident Remainder
28.677 ETH 28.463 ETH 25.663 ETH 28.463 - 25.663 = 2.8 ETH

For those who wish to check or re-perform the calculations it can be done here:

Note that penalties for specific duties (attestations, proposals, etc.) are based on calculation and cannot be queried directly from chain data.

As indicated by Launchnodes, they will be compensating stakers for the remainder of this amount (i.e. less the compensation already sent). The compensation will occur by sending the ETH amount to the Lido Execution Layer Rewards Vault.

3 Likes

Many thanks to the whole Lido DAO community and the broader ecosystem of Operators and Stakers.

We really appreciate your support with this event, and with your diligence in helping to ensure that all Lido stakers have been made whole.

The Launchnodes team sent the outstanding balance to the Lido Execution Layer Rewards Vault today:

We’re looking forward to supporting and participating in the projects already underway to help ensure the robustness and security of Lido.

:pray:

4 Likes

Hi All

We wanted to provide an update on progress and plans. We have been running our nodes in a backup environment while restoring and hardening our bare metal infrastructure in Africa.

The new infrastructure removes key dependencies and risks, and is being assessed for vulnerabilities by a leading web3 security body this week. After taking on board any of their findings and recommendations, we aim to switch back to our original data center as soon as possible, ideally from next week onwards.

We have been running test nodes on the new infrastructure, and are looking forward to being back with production nodes soon too.

In terms of a migration plan, we have very well rehearsed processes for safely moving between environments and avoiding double attestations or other issues. Key steps are included below, with feedback or questions very welcome.

We’ll keep you posted, thank you as always for your support. :pray:

1 Like

I wanted to update you on plans for Launchnodes to migrate back to its bare metal infrastructure in Africa, following the outage and issues experienced last year.

We have carefully rebuilt our infrastructure with an improved architecture designed to further reduce points of failure, and provide additional robustness and security.

Over the past 2 months, in addition to the feedback and support we have had from Lido operators and the broader community, we have had support from 3rd parties to examine our setup prior to go-live, and highlight areas for improvement. To date, the following have been flagged:

  • 10 ‘High Severity’ risk issues
  • 11 ‘Medium Severity’ risk issues
  • 11 ‘Low Severity’ risk issues
  • 2 ‘Informational’ issues

As examples:

One ‘High Severity’ issue flagged was traffic from our internal Grafana servers not being SSL encrypted.

A ‘Medium Severity’ issue was not using SSO across all of our infrastructure.

A ‘Low Severity’ issue was not logging all traffic and connections, which would be useful for diagnostics and forensics.

(These and the majority of other recommendations have been implemented).

Resolutions have been implemented for all ‘High’ severity issues. Resolutions for 8 of the ‘Medium’ severity issues have been implemented, with the remaining 3 acknowledged by Launchnodes for further investigation and future planned updates. Resolutions have been applied for 10 of the 11 ‘Low’ Severity issues, with 1 issue acknowledged.

Issues have typically been ‘acknowledged’ for non-critical issues, where the recommendation is understood, however further investigation or planning/testing is required for the recommendation to be fully implemented. For example, we are cautious in implementing some automation of processes/server re-starts, and to set hard limits for some parameters until we have monitored production systems for some weeks at least. Where a resolution might have an impact on node performance or uptime, we have been particularly careful.

We are conscious that our environments and software will continue to change and evolve, and we regard this as a journey of continous improvement. We are planning further reviews and optimizations with our 3rd party specialists in the coming days and weeks. We have also been successfully running multiple nodes on Holesky in our updated environments.

The team is confident in moving back to our bare metal data centres, and aiming to start this process from 20th March 2024. We plan to move across a small number of nodes, monitor these for 7 days, before migrating our remaining nodes to the new environment. The old environment and web3signer will be destroyed before nodes are migrated and re-started.

This timing has been chosen to provide a good period of stability after Dencun and a decent period of monitoring unaffected by major events or public holidays.

Our monitoring and alerting will continue to operate 24x7, with our global team on standby throughout.

Please reach out if you have any feedback or would like more information, happy to discuss in more detail.

Many thanks as always for your patience and support - We’ll update you with progress as this migration takes place and ongoing. :pray:

2 Likes

Hey Rajesh. Thanks for sharing detailed updates on the infrastructure review you’ve performed and the plans for moving ack to bare metal infra.

Look forward to the updates on progress of the migration efforts.

1 Like

Hi all

Sharing an update on our Africa Infrastructure for Lido nodes here:

Please reach out with any questions or feedback. Many thanks.

1 Like