Lido on Ethereum Node Operator (Numic) Security Incident Disclosure - May 21, 2024

On May 14th, Lido DAO contributors were made aware of a security breach that affected an active Node Operator using the Lido on Ethereum protocol (Numic). The security breach had occurred a few days prior and affected a developer machine that had access to an encrypted key material backup for mainnet validators. It is unclear if the encrypted key material was accessed, copied, or otherwise manipulated, nor if the decryption material for that data was found, or if the encryption was otherwise broken.

In response to the identification that the encrypted backups may have been accessed, and for safety reasons, the Node Operator decided to:

  1. set their depositable keys relating to the Lido Protocol to zero to avoid receiving any further deposits, and
  2. perform voluntary exits of all possibly affected keys in a staggered manner of the following days.

As of a few hours ago, all of the operator’s validators have been exited (and fully withdrawn). Validator operations were not affected as a result of the incident, and no user funds have been affected.

Some Lido DAO contributors have been involved in helping support the Node Operator in investigating the incident to understand its full scope and potential impact.

The disclosure was not made public immediately as to not call unneeded attention to the breach until the validators had been fully exited.
While the Node Operator is performing a more extensive review of their security & backup processes, the DAO may deliberate as to whether the operator should continue in the active set or not.

12 Likes

Public Incident Disclosure by Numic

This post addresses a potential vulnerability identified and resolved in May 2024. The vulnerability eventuated when a developer computer at Numic got infected with malware.
Node operation was not affected.

Timeline

  • 11th of May 2024: Due to a malware from a compromised freeware download, most files on the affected computer must have been indexed on this day. There was no targeted attack.
  • 12th of May 2024: The infection was discovered when a suspicious login attempt to an online account was prevented by 2FA. The machine using this account was immediately disconnected and its drive removed. All online passwords were then changed and an investigation started.
  • 14th of May 2024: Upon further investigation of the infected drive, in particular the “NTFS Last Access Time Stamp”, we found indications that the malware had indexed or scanned most of the text, image and archive files.
    All these files were last accessed within a 2-minute time window on 11th of May. In the encrypted backup, which was mounted at the time of the incident, files showed the same pattern. This indicates they had been indexed by the malware.
    As this encrypted backup contains cryptographic material related to Numic usage of the Lido protocol, we informed Lido DAO contributors in the NOM workstream after the discovery.

Impact Assessment

  • We couldn’t be sure what exactly the indexing meant and if the attacker could have downloaded any files or might even be able to break their encryption. Consequently, together with advice from Lido DAO contributors, we decided to start rotating all validators.
  • Meanwhile, node operations continued normally as the validator nodes are separate systems. And as all Lido related withdrawals are sent to the withdrawal vault, none of the ETH staked with Lido could have been withdrawn by potential attackers.

Remediation

  • A disclosure of this potential vulnerability was not made immediately for security reasons. Instead, a process of validator rotation was started by preventing new deposits to Numic and by broadcasting exit messages beginning on 14th of May over the course of 3 days.
  • A reassessment of our security and backup processes is ongoing. We are in contact with a company specialising in information security according to ISO 27001 to assist us in this process.
11 Likes

Thanks for the public incident disclosure and also for quickly disclosing the incident to contributors as soon as you were aware that sensitive material may have been accessed, as well as taking prompt action to safeguard validators against potential malicious attack.

I’m guessing the community and other Node Operators may have some questions, so would appreciate your continued transparency in fielding those.

4 Likes

Am I correct in understanding that an encrypted backup of keys was kept locally on a machine that was used daily for development and normal internet access? And, the decryption material was stored on the same machine?

If so, what was the logic for why you did so - I’m assuming this is a conscious decision? I want to check I’m not missing some reason for why it would be stored in this manner.

2 Likes

Hi @ccitizen. The backup wasn’t stored on the machine but in a protected drive on our backup server. This drive can only be accessed by team members directly responsible for staking operations. Unfortunately, it was mounted on this machine at the time of the incident.

The reason for this construction is that we use one active Dirk instance per cluster, which means that in the event of a failure, the keys need to be reasonably accessible so that they can be moved to a failover key server. To avoid the risk of slashing, they were not kept on the failover key server. A possible alternative would be threshold signing with multiple Dirk instances. Since one instance is allowed to fail, the keys wouldn’t need to be readily accessible and could be kept permanently offline.

Please note that the evidence in the access log does not support that the files were downloaded, but probably just indexed. And even if they were downloaded, we’re confident the attacker wouldn’t be able to decrypt them in a reasonable amount of time. Nevertheless, we wanted to be proactive and our response put the DAO’s and staker’s interests first.

8 Likes

This incident is a good example of why the adoption DVT is critical.

It should be noted that DVT by itself doesn’t really solve the problem here, only if DKG was involved for key creation and a full key was never reconstituted somewhere, which is why for Simple DVT a requirement for all integrations was that validator keys were created using DKG ceremonies where all cluster participants were participating was a requirement.

A similar approach in terms of “sharding” the key can be employed using Attestant’s Dirk, which Numic also identifies above as a a possible solution, and doesn’t require DVT.

3 Likes

Hi everyone, this is a progress update. We met with the information security consultancy msdd to identify potential vulnerabilities in our architecture and processes as well as to implement changes.
In total, msdd made 18 recommendations and helped us to realise them, the most important ones being:

  • Distributed key manager:
    A problem in the past was that the signing keys needed to be accessible in the event of a key server failure. In our new architecture, we use a distributed setup. Each of the key servers only holds shards of the keys and an attacker would need to control several of them in order to reconstruct the original keys.

  • Fully offline signing key backups:
    This distributed setup also improves redundancy and means that some key servers are allowed to fail without affecting validation operations. As a result, the signing keys can be kept permanently offline, greatly reducing security risks.

  • Security information and event management system:
    A software that helps recognize and address potential security threats and vulnerabilities before they have a chance to disrupt operations. When new vulnerabilities are discovered or anomalies (such as attacks) are detected, this software would notify us.

  • Other improvements include, e.g., generally closer alignment with ISO27001, stronger passwords and encryption keys, use of biometrics where possible, further anti-malware protections.
    Looking ahead, we have taken steps to keep up to date with cybersecurity and plan a follow-up security review.

On the advice of msdd, we can only provide a general overview as to not undermine our security measures. Overall, we are confident that the vulnerabilities that led to the incident have been addressed. And that robustness and security of our infrastructure have been strengthened.

3 Likes