Discussion - Treatment of Potentially Harmful Incidents

To revive this thread a little… I will work with the NOM workstream will try to put together a discussion document for interested DAO stakeholders & NOs to discuss to roughly outline what a “guardrails” approach to treatment of potentially harmful incidents (or things like sustained malperformance) in order to advance discussion on having a more concrete approach. I will aim to have this draft ready the week of July 3rd.

I would love to see more initiative/collaboration amongst NOs for something like this. I’m not sure what the best way to foster this (grants? putting everyone in a (virtual) room to workshop it out?) is, but open to suggestions.

Because the above might take some time and there is a question of the actual recent slashing, I think the DAO should have a more concrete conversation about how to treat these specific types of incidents and what we should do in this specific case. It’s been quite a while since the event, and all stakeholders would benefit from clarity.

I think slashings should be (and have been) rare enough that decisions can be made ad hoc, but would benefit from a general framework for them. With that in mind, my thinking is below (trying to establish a set of static constraints, but with enough allowance for context). IMO it’s incredibly important that we get input from as many parties as possible here (especially NOs), so the below is just my 0.02 to serve as a starting point.

Factors around the incident itself that might be considered:

  • was the act malicious or not
  • the proximate cause of the event
  • whether any best practices, infrastructure setups or configurations, common safety measures, or reasonable processes / mechanisms could have prevented the slashing from happening
  • how quickly was the issue identified and by whom
  • how quickly was the issue resolved

Consequences, could depend on:

  • Which “module” did this happen in; or, more broadly, what are the trust assumptions associated with the affected validators (e.g. are the validators unbonded)
  • impact of the event (in case of slashing, finance impact can be small but damage to trust can be high, etc. for example: how does the event affect the trust assumptions between stakers, the DAO, and the NO)
  • extenuating circumstances (what are the pros/cons of this decision, and what other substantial things may need to be taken into account – e.g. does the NO somehow bring key value to the protocol?)
  • what other options there are for the node operator to participate
  • what is the status of remediation of the issue
  • is there a way to gauge likelihood of something like this happening again, and if so what is the assessed likelihood
  • how can remediation be assessed and if it can, is the remediation deemed satisfactory

For example, in the current “curated operator set” on Lido on Ethereum, the options may be:

  • Do nothing
  • Warning (do nothing w/ the condition that the next time the consequence is one/any of the below)
  • Limit the Node Operator’s key count for a certain period of time
  • Decrease the Node Operator’s key count (by prioritizing those keys for exit)
  • Offboard the operator (with the ability to rejoin the permissioned set at a later time)
  • Offboard the operator (without the ability to rejoin the permissioned set at a later time)
6 Likes