Withdrawals. On validator exiting order

TL; DR

This post describes the current background behind validator exiting order design. We, Lido dev team, want to provoke public discussion and gather feedback and stakeholders’ sentiment; especially looking forward to hearing from Node Operators representatives.

The current dev team preference is Approach 1: eject first the validator from a Node Operator with the largest amount of active validators.

Intro

When withdrawals in Ethereum become feasible Lido stakers will be able to withdraw ETH in exchange for their stETH via Lido protocol. To satisfy the withdrawal requests Lido will have to eject validators to get the users’ funds back. In what order should Lido choose validators of which Node Operators to eject? The purpose of this post is to initiate a public discussion on the topic.

The question of how the already chosen validator gets ejected technically is an important and non-trivial one. Especially while withdrawal credentials initiated exits are not implemented in Ethereum and Node Operators are in charge of signing the exit messages. But this question is proposed to be left outside of the scope of this discussion.

There are multiple possible considerations regarding ejection ordering.

  • must be aligned with Lido goals / scorecard
  • provide Node Operators with equal opportunities to earn with Lido (don’t favor or disadvantage any Node Operators in some unclear, “unfair” manner)
  • take into account the current validator distribution
  • be consistent with the choice of the next Node Operator whose validator to activate
  • be technically feasible/reasonable from the point of view of the dev team

Let’s consider them one by one, explore potential criteria for ejection ordering and consider a few approaches.

Lido goals

Lido has published Decentralization Roadmap where the preferred properties of the validator set are specified (also see the scorecard).

Validator ejection ordering algorithm might be considered as means of helping to achieve these properties. Here are some of them:

  • no operators with >1% of the total stake
  • good performance
  • operators earn well enough to build a profitable, dependable business on staking
  • an emphasis on improving client diversity in Ethereum
  • operations are distributed geographically and jurisdictionally
  • distributed variation of on-premise infra and different cloud providers

Provide Node Operators with equal opportunities to earn with Lido

When a Node Operator joins Lido, it has to invest in infrastructure, human resources, etc. And only after it gains the capability to earn. The more active validators it has, the more ROI it can get over time.
From this point of view, it is not preferable to eject validators of the Node Operator who had significantly less capability to earn with Lido.

Current validator distribution

As of Oct 7 2022, the distribution of active validators between the Node Operators looks like this:

index name active validators network share rewards earned
0 Staking Facilities 7391 1.68% 613.0
2 P2P.ORG - P2P Validator 7391 1.68% 697.6
4 stakefish 7391 1.68% 676.6
5 Blockscape 7391 1.68% 625.5
6 DSRV 7391 1.68% 490.7
8 SkillZ 7390 1.68% 583.2
9 RockX 7390 1.68% 383.5
11 Allnodes 7390 1.68% 388.4
15 ChainLayer 7390 1.68% 308.0
17 BridgeTower 7390 1.68% 294.0
3 Chorus One 7000 1.59% 643.9
10 Figment 7000 1.59% 346.1
16 Simply Staking 6500 1.48% 264.2
7 Everstake 6000 1.37% 435.2
19 InfStones 6000 1.37% 224.0
14 Stakin 5664 1.29% 204.2
18 Stakely 5000 1.14% 199.2
20 HashQuark 3768 0.86% 132.6
13 Blockdaemon* 3346 0.76% 193.2
12 Anyblock Analytics* 2300 0.52% 164.0
21 ConsenSys Codefi 1447 0.33% 50.7
1 Certus One 1000 0.23% 153.3
23 CryptoManufaktur 674 0.15% 4.2
24 Kukis Global 673 0.15% 4.3
26 ChainSafe 673 0.15% 3.5
28 Sigma Prime 673 0.15% 3.7
22 RockLogic GmbH 633 0.14% 4.0
25 Nethermind 633 0.14% 3.9
27 Prysmatic Labs 100 0.02% 0.1

* Blockdaemon and Anyblock Analytics are the same entity.

Column index denotes the internal Node Operator index in the Node Operators registry contract. It starts from 0 and gets incremented each time a new Node Operator is onboarded.

The data sources: Node Operators registry contract, Dune request.

Relation with the activation algorithm

The algorithm which chooses new validators for activation is implemented in function assignNextSigningKeys of the NodeOperatorsRegistry contract. The algorithm chooses the next validator from the Node Operator with the least stake (within its limit). Thus the activation algorithm flattens the distribution of validators between Node Operators.

The design of the ejection ordering algorithm should consider how both algorithms working together influence the validator distribution. For example, it might not be suitable to eject just the newly activated validator.

Technical considerations

There are two basic ways the ejection ordering algorithm could be implemented: on-chain and off-chain. The capability to implement it on-chain depends significantly on its simplicity and on-chain availability of the data it needs.

Also, even for the off-chain algorithm, it’s preferred to be simple and straightforward for security and reliability. It would also help to make it trustless and ossify the more significant part of the Lido protocol when possible.

Potential criteria for ejection

Here we explore the potential metrics of a Node Operator for use in the ejection ordering algorithm.

Number of the active validators

This property of a Node Operator can be observed directly on-chain by a call to getNodeOperator and calculation of usedSigningKeys - stoppedValidators.

Amount of the stETH rewards earned so far

This metric can be considered as a measure of the opportunity for a Node Operator to earn with Lido.

This property of a Node Operator can be obtained from a blockchain indexer service. Here is an example of a Dune request: steth distribution
It seems there is no way to obtain the metric from the current chain state, without accessing archive data.

This metric does not reflect the instant state of the validator distribution. For example, if the ejection ordering algorithm chooses a validator from the Node Operator with the largest amount of rewards today, tomorrow, the amount of rewards won’t change much, and the same Node Operator will get chosen. But this metric can potentially be updated to account only the rewards earned by the current active validators to reflect the changes instantly.

Total active validators age

This metric can also be considered as a measure of the opportunity of a Node Operator to earn with Lido. To calculate it for a Node Operator, sum ages of all its active validators (age = time passed since the activation till now). This metric gets updated once the validator stops being active.

Node Operator performance

Monitor penalties and slashings of a Node Operator on the beacon chain. Take it into account in the ejection ordering algorithm as means to disadvantage those who performed worse.

“Soft” properties of validators

Here are some examples of such properties:

  • geographic location diversity
  • client diversity
  • diversity of on-premise infra and cloud providers

The ejection ordering algorithm might take this into account as means of shaping the validator set.

These properties of a validator cannot be trustlessly identified, and thus are not good as ejection criteria.

Potential approaches

Approach 1: eject from the Node Operator with the largest number of active validators

This one is a simple and easy-to-implement algorithm. To choose a validator for ejection, choose a Node Operator with the most active validators count. If there are multiple such NOs, choose the one with the highest validator index to reduce the probability of ejecting the newly added validator, because the activation algorithm chooses the one with the lowest.

To choose a validator of the Node Operator, choose any.

For example, the seven validators ejection Node Operators order looks like this:

DSRV, index 6: 7391 --> 7390
Blockscape, index 5: 7391 --> 7390
stakefish, index 4: 7391 --> 7390
P2P.ORG - P2P Validator index 2: 7391 --> 7390
Staking Facilities, index 0: 7391 --> 7390
P2P.ORG - P2P Validator index 2: 7390 --> 7389
Staking Facilities, index 0: 7390 --> 7389

Approach 2: eject from the Node Operator with the largest total active validators age

To choose a validator for ejection, select the Node Operator with the largest total validators age.
To calculate the total validators age of a Node Operator, iterate over all its validators and sum time duration since the validator activation till now.

Combined approach

Use different strategies depending on the current state of the validator set. Here is a potential solution, which accelerates movement towards the “less than 1% stake” goal while trying to keep even opportunities to earn with Lido between the largest Node Operator.

Conditions:

  1. if there are Node Operators with higher than 1% of the network stake, use strategy A
  2. otherwise, strategy B

Strategy A. Choose the validator for exit among Node Operators with a larger than 1% stake. Choose the Node Operator with the highest total age of the active validators.

Strategy B. Choose the Node Operator with the largest number of active validators (see Solution 1).

5 Likes

The issue with Blockdaemon and Anyblock Analytics not treated as a single Node Operator should be addressed specifically. It is the best to update Node Operators registry contract data. But if it’s not the case the ejection ordering algorithm, despite the approach, must handle it.
Thanks to @Izzy for pointing this out in DM

5 Likes

Thanks for putting this together @arwer13 !

My personal view is that we should take the combined approach here (after fixes that need to be implemented to consider BD+Anyblock Analytics as one operator regardless of which approach is taken), but that the approach we discuss and put forth should additionally:

  1. Define for what timeframe we are discussing and what potential evolutions of the algo may there be in the future,
  2. clearly identifies what the goals for the exit algo are,
  3. acknowledges the “being early” advantage some Node Operators have benefitted from.*
  • If you take Approach #1, you will end up exiting validators from NOs who have a similar number of currently active validators but which validators have different “historic rewards” weight (i.e. have been running for different numbers of time).

To that end, a summary of my proposal would be something like:

  1. This exit algo implemented as a result of this discussion/exercise should be planned to be in use until such a time that triggerable exits are available and then should be revised to re-evaluate to what extent we can (re-)allocate stake as per additional criteria mentioned in the OP (Withdrawals. On validator exiting order) which are from our LIdo Operator Set Strategy.
  2. The exit algo should aim to pragmatically further the decentralization criteria we’ve outlined in our Operator Set Strategy, but given the technical complexity of doing dynamic and verifiable multi-attribute stake re-allocation before (a) the technical functionality of the protocol allows us to do so and (b) the research for how to do this is completed and a mechanism agreed upon, the only real thing we can do in the near future is improve the distribution of across the set (while continuing our efforts to increase the set).
  3. While multiple operators maintain a number of active validators above the ideal target as defined in Lido’s Ethereum Operator Set Strategy (i.e. no operator should have more than 1% of total (network-wide) ETH staked via Lido), validators which have a “heavier” rewards history should be prioritized in the exit process over “lighter” validators. This should allow newer operators with a large number of validators some time to catch up in terms of total rewards instead of being exited at the same pace as older operators. For validators below this threshold, Strategy B can be employed (i.e. just exit by total number of validators). Does this reduce the “fairness” a bit? Yes, but the primary objective is the distribution of stake, and the secondary objective is to do it as fairly as possible.
7 Likes

Thanks for writing this down.

I agree with Izzy and I like the combined approach, which will help a little bit with smoothing out the yield differences among the node operators.

disclaimer: We are clearly biased as one of the newly onboarded operators :slight_smile: .

3 Likes

As a member of the dev team, I really want to keep an algorithm for picking the next validator for ejection simple but see a value in smoothing of staking rewards between node operators.

We plan to run this algorithm off-chain (probably as part of lido oracles). So we can easily extract any data from execution and consensus layers and feed it to the ejection algorithm. But in the future, it would make sense to move this into trustless ZK oracle, which would only use the data from the CL state (which does not contain the validator’s age) and on-chain node operator registry. For now, it’s ok, but in the future will need to reevaluate this.

The only thing that confuses me is that the algorithm respecting validator lifetime would work differently depending on whether there is an “operator that has more than 1 percent of the total stake” or not in our validator set. Can we reformulate this in terms of the average amount of stake under the management?

Next ejected validator algorithm:

  1. Calculate the average stake
  2. Pick a set of node operators that stake is bigger than average + some threshold
  3. If the set is not empty, go to next step; otherwise, eject from the Node Operator with the largest number of active validators
  4. Build a list of all validators that belong to the set of NO calculated in step 2
  5. Sort the list so that the oldest validators are at the beginning. The first validator in the list would be next to the exit

At first glance, this algorithm would have the same property of respecting the validator lifetime, but even if we don’t have NO, which operates more than 1 percent of total ETH staked. What do you think?

4 Likes

Thanks! I think this is a great start and I would like to give my perspective from a genesis validator in Lido.

First of all, I think rotating oldest keys first makes sense. This way, hopefully longterm we can migrate all keys to newer technologies and higher security standards - for example by slowly moving the validator set to use DVT.
Evening out the stake distribution between validators also makes sense to me as it benefits Lido and works toward further decentralising Ethereum. Especially talking about the commitment of having “no operators with >1% of the total stake”, I think some kind of factor of what node operators run outside of Lido should be accounted for. Some operators have their own offerings through which they also run a certain percentage of the Ethereum network. I think it is only fair when talking about network share of a validator to also consider non-Lido stake. This is probably overkill in the near future, but long-term I would like to see this.

I agree with Izzys point 1 & 2. What I would disagree a bit is @Izzy’s take on 3: “1. acknowledges the “being early” advantage some Node Operators have benefitted from.”:

Yes, first validators benefitted from earning more rewards with their validators.
First of all, this is true because they had their validators also running for a longer time which means they had higher operating and server costs etc. This also means they have been committing to Lido earlier when risks were higher and withdrawals have been much further away. From my experience they also paid much more for injecting their keys when gas prices were insane and the DAO desperately needed more keys to keep up with demand.

When gas prices were around 100Gwei and there was a limit of 20 keys per transaction, I remember roughly paying 0.013 ETH per validator key. For newer operators, the limit was 50 keys per transactions and looking at some newer injections, it roughly costs 0.0007 ETH per validator key.

Just saying that early validators benefitted more so now others should be preferred doesn’t paint a correct picture to me. I believe it would be much fairer to not differentiate between “lighter” and “heavier” rewards history and just do the exit process same as the entry process, meaning:

Exit the oldest validator keys from the validator with the highest amount of total keys. If there are multiple validators with the highest amount, equally reduce them going round-robin from NOs index 0 to N. It’s only fair that Staking Facilities with NO #0 gets the first validator exited since we were also benefitting from being the first index to receive one at all times (if there was no validator if a lower amount of total keys).

7 Likes

You bring up some really good points!

In general I think the color you provide around “not factored” costs into the running of the validators is something that is indeed missing and could/should somehow be accounted for (i.e. avg rewards / validator could be off-set by avg cost / validator). The idea of the 3rd point was to try to work around a large disparity if we take the simple approach, and the question is if the disparity is still there once costs are taken into account or not.

First of all, this is true because they had their validators also running for a longer time which means they had higher operating and server costs etc.

This is generally true but the running assumption here is that costs/validator do not go up at the same rate as revenue/validator. If this is incorrect then we should definitely re-think, but I don’t think that’s the case. There are also operators who obviously have higher opex costs than others may (e.g. due to running on local machines vs cloud), which would ideally also be taken into account, but I’m not sure there’s a great way to proxy for that (currently).

This also means they have been committing to Lido earlier when risks were higher and withdrawals have been much further away.

This is true and should be taken into account. We should also consider though this is somewhat offset by stETH rewards being liquid and the generally strong stETH/ETH exchange rate (combined with the fact that ETH went from ~1K USD at launch of Lido to > 4K USD), as well as the potential benefit of being an early Lido Node Operator in terms of bringing in new business for staking orgs given the level of exposure and proof of their excellence at a large scale.

Exit the oldest validator keys from the validator with the highest amount of total keys. If there are multiple validators with the highest amount, equally reduce them going round-robin from NOs index 0 to N.

What happens in this case is that e.g. Old Operator X (with ~7.3K keys) which has an avg reward/validator of ~0.1 ETH, while one of the newer operators at the same number of keys currently has an avg reward/validator ~0.04 ETH. The question is thus: is the ratio of avg cost/validator for both these operators also ~2x? If so, then I think this mechanism can work. Otherwise, I think the Combined Approach makes sense to try out.

Ultimately, though, what matters most is whether the rate of exits. If e.g. we end up exiting 40K validators for whatever reason within a relatively short amount of time (3 months), I think the ordering will not have a very large impact. So we also need to do some thinking and understanding about what we think the rate of redemptions will be in what timeframes as to not over-engineer something that may be moot.

4 Likes

This is generally true but the running assumption here is that costs/validator do not go up at the same rate as revenue/validator. If this is incorrect then we should definitely re-think, but I don’t think that’s the case. There are also operators who obviously have higher opex costs than others may (e.g. due to running on local machines vs cloud), which would ideally also be taken into account, but I’m not sure there’s a great way to proxy for that (currently).

Costs/validator do not go up at the same rate as revenue/validator. But you are comparing total rewards over time to costs per month, not costs over time. Let’s assume we are looking at the same Node Operator if he joined earlier or later:

The only difference I can see between them are the initial spin up costs. Assuming they run in the cloud (so no initial purchase of hardware), the only difference between NO_OLD and NO_NEW for spin up is that NO_NEW paid 18 times less for their validator keys (0.013ETH/0.0007ETH) at times where the price per ETH was much higher than recently. I know this probably an extreme comparison, but it is at least true to some degree.
In terms of operating costs, if it’s the same NO with the same amount of keys, it should also be the same monthly costs.

combined with the fact that ETH went from ~1K USD at launch of Lido to > 4K USD

The price of ETH also went back below 1k USD after that and we don’t know how it will evolve in the future. If it goes to 10k USD next, those who joined later would also be at benefit then if we would prioritize them over older validators. I think making an argument around how an asset performed or will perform in the future should not be made in this case. :slight_smile:

Maybe I am missing on why you would like newer operators to catch up in terms of total rewards. If monthly costs are the same, initial costs are now lower than before and rewards per month right now are the same, why would the get a privilege for joining late? This is not meant hostile or anything, I’m just not understand what the “disparity” is this is trying to solve? :slight_smile:

3 Likes

Maybe I am missing on why you would like newer operators to catch up in terms of total rewards. If monthly costs are the same, initial costs are now lower than before and rewards per month right now are the same, why would the get a privilege for joining late? This is not meant hostile or anything, I’m just not understand what the “disparity” is this is trying to solve? :slight_smile:

The disparity is that generally most older operators were running 4K+ validators for quite a long period of time whereas the newest operators who “caught up fast” during 2022/H1 have been running a large set for a comparatively shorter period, which generally would give the older operator a longer period of time to break even or profit based on the costs vs newer operators. For these operators who are in this group (i.e. who have validators in the high thousands) the question is if it’s fair to exit them at the same rate given that their lifetime cost:revenue for these validators is very different.

The only difference I can see between them are the initial spin up costs. Assuming they run in the cloud (so no initial purchase of hardware), the only difference between NO_OLD and NO_NEW for spin up is that NO_NEW paid 18 times less for their validator keys (0.013ETH/0.0007ETH) at times where the price per ETH was much higher than recently. I know this probably an extreme comparison, but it is at least true to some degree.

I think all of the operators that we’re talking about here (operators who are running 4-5K+ keys) basically submitted keys during these high-gas periods. Only the really really new operators have submitted keys during the super low gas period, but they’re not really in the scope of this discussion.

I’m only suggesting that while we’re exiting validators in this initial period and only in the cases where for total validators per operator above a certain threshold (whether that’s 1% of total stake or some threshold above the Lido set average as suggested by Eugene) we should prioritize exiting the “heaviest” validators; not that newer operators shouldn’t have their validators exited until they catch up in terms of rewards.

4 Likes

Support your concerns regarding 1% strict pinning, and understand that it’s a bit tricky metric to maintain since Lido’s market share is not proportional to the number of NOs in general.

I really like your algorithm. It’s worth noting that step 5 sorting actually requires to implement not the full-fledged sort, but just a merge step for the already sorted per-NO validator indexes (i.e., older validators always have smaller validator indexes).

I also see some corner cases need to be addressed, one of them:

Step 4 states that we need to build a list of all validators belonging to the chosen subset of NOs; if we needed to exit a huge amount of validators, then it would lower the stake amount of the selected NOs far below the average stake

Straightforward fix: select not all validators from the NOs subset, but only exceeding the average amount.

As for the threshold, something between 20%…50% of the average stake amount would be suitable, I guess.

2 Likes

We’re in agreement with using the mixed approach to this situation. Taking the keys from the NOs with the largest number of keys / oldest keys might help have a better overall distributed NO set.

Thanks @arwer13 and @Izzy for the insights

2 Likes

Hello everyone, I’m Ann from Everstake. Thanks for the proposal and for opening this discussion.

To achieve the goals of the decentralization roadmap, especially the goal of no operators with >1% of the total stake, we need a transparent algorithm of ejection.

To take into account reward history can be somehow questionable because I think it is natural that NOs who have joined earlier earned bigger reward profits because the running time is longer. When you hire a new worker, you don’t decrease the salary of the older worker to equalize their total profit.

But of course, the difference 0.1Eth and 0.04Eth per validator is quite significant taking into account that initial cost/expenses might be not so different.

But as @Izzy has mentioned the speed of validators’ ejection will be defined by the rate of withdrawal requests.

In the case of the short period, the effect of ejection order will be quite small so we can implement just simple Approach 1.

If it takes a while I think we need to consider some intermediate milestones goals.

For example, the first milestone is to achieve no operators with > 1.6% (or 1.X). At this milestone, we can reconsider our strategy depending on current validators’ distribution between a set of NOs, and knowing the rate of withdrawal requests we can more precisely predict total rewards difference for old and new NOs and discuss how to implement most fair algorithm.

5 Likes

One thing to bring up here is that there is a chance that some of the operators will not conform to Lido policies or will steal MEV or will run operations badly enough to that Lido will want to reduce exposure to that operator. These are priority candidates for withdrawals and should be processed first. Technically, I think it could be handled by the DAO setting the operators’ validator limit lower than current amount of validators, and prioritize exits from operators where current # of validators is > limit until it’s at the limit.

5 Likes

I like the idea of the limits as on-chain param that can be taken into account by an off-chain algorithm to pick the next validator. The issue is that currently, limits are editable by Node Operators using easy tracks. We need to think about how to approach this. Maybe the current limits and limits that you propose should be different parameters.

1 Like

Sticking to the single limit, the implementation could be:

  • Option 1. Update the IncreaseNodeOperatorStakingLimit factory within Easy Tack to support pausing particular NOs on limits lifting motions (could lower the possibility of motions spam through Easy Tack as a bonus)
  • Option 2. Use Aragon ACL parameters interpretation to disable such motions for particular NOs (as it was implemented for payments before, see LIP-13 for the usage of grantPermissionP)

Otherwise, it would require different parameters as you said, though, having two parameters sounds a bit more complicated by means of external visibility and clearness (i.e., more branching in explanations and algorithms).

Hi all, I’m Gandalf from RockX, and I think it’s important to stick by many of the goals set out by Lido above. As such, I believe Approach 3 is the most suitable, of course combining Blockdaemon and Anyblock Analytics as a single entity viewed by the algorithm.

I don’t think it necessary that all the decision-making and algorithm needs to be on-chain as the process will be clear and thus, easy to verify given the data that is already on-chain publicly.
One thing I would like to confirm my understanding of is that the algorithm calculates where to exit the validators once per validator? i.e. if 320 ETH is withdrawn, then 10 validators will need to exit, does the algorithm calculate this 10 times thus potentially pulling from multiple operators or does this come from one node operator?

Thanks Lido for proactively bringing this discussion forward as it is not something any of us want to be deciding in haste reactively!

1 Like

It should calculate potentially pulling from multiple.

3 Likes

While I like the combined approach (and lean a little bit more to Izzy vs Flo), I think it has benefits to keep this simple for a start (assuming iterations/upgrades are possible with little negative side effects). Two reasons: in my experience, KISS is a sensible approach in most cases in general, and specifically, we should keep in mind the developments with regards to “withdrawal credentials initiated exits”.

If this happens, I’d guess the LIDO protocol could (and likely should) initiate exits directly, rather than relying on node operators?
Does anyone have insights with regards to an implementation timeline? My basic understanding (from before the Merge) was that this feature would be developed/released together with withdrawals, but I might be mistaken.

Separately, but related, I want to share that some of institutional clients are demanding some sort of solution to manage the risk of NO bankruptcy (“not responding at all”) but still be able to retrieve their stake. Industry-standard seems to be pre-signed exit transactions.
I do agree that mass storage of these for LIDO has more risks than benefits, but the general thought might be worth considering. But depending on above timeline with regards to “withdrawal credentials initiated exits”, LIDO could certainly opt for ignoring it.

2 Likes

Yep!

I know there’s Lido contributor involvement in working on this as an EIP-to-be, but I do not think the chances of it landing together with withdrawals are high (though I wish it would). Probably/hopefully in the following hardfork.

1 Like

It’s likely we’ll keep node operators initiated exits anyway - these will probably be much cheaper in ops costs.

1 Like