Surplus Management Framework: Discussion and Draft Proposal

Many thanks to the author, @equanimiti, and special thanks to Lido DAO Analytics contributors for the awesome analysis on slashing and operational risks.

Motivation

  • While slashing and operational risk mitigation should be a core part of node operator management strategies for the DAO, there is additional benefit to codifying a surplus strategy that can effectively backstop the solvency of the protocol beyond any operational risk mitigation node operators can undertake
  • To execute on this goal, DAO token holders would need to determine an optimal surplus allocation strategy in line with its priorities
  • This document provides a framework for how Lido DAO could think about its surplus management and puts forward an initial set of proposals and discussion points:
    • Reserve ~0.13% of circulating stETH (~11k ETH) as a liquidity buffer to minimize withdrawal delays to approximately 0 for the average withdrawal size. As a reminder, stETH is always non-custodial - the buffer is about making withdrawals faster for users only, at a manageable opportunity cost to the DAO.
    • Reserve ~0.315% of circulating stETH (~25k stETH) as a last-line backstop against slashing and operational risk


Source: https://dune.com/steakhouse/lido-safu

Introduction

Lido DAO token holders could support a surplus strategy with sustainable, long-term impact in mind. A generalized allocation framework such as the below (source) may be a useful starting point for the discussion.

In analyzing the capital allocation of 1,400 large cap public companies across the US and Europe, the researchers found that reinvesting in organic growth was by far the highest ROI form of capital allocation.

DAOs are obviously distinct from corporations – not only are they organized differently but they usually have goals unlike any corporate objective. For e.g. Lido DAO has ratified a vibe alignment to Make staking simple, secure, and decentralized). But we can nonetheless draw from the historic allocation outcomes of companies to offer interesting insights into how token holders could structure their priorities as it relates to the surplus.

Surplus priorities for the DAO (in decreasing order of impact) could be generically described as follows in order of priority:

  1. Reinvest in organic protocol development initiatives or grants (highly impactful)
  2. Right size surplus structure (moderately impactful through risk mitigation)
  3. Unwind residual surplus (generally neutral to potentially negative impact)

Grant requests and similar proposals are currently scoped out, budgeted and decided on by token holders, in response to requests enacted by external third-parties or token holder sub-committees (such as LEGO, or reWARDs/LOL). Therefore, for the purpose of this proposal, we are interested in 2 and 3 (i.e. provided there is surplus post reinvestment, optimizing Lido’s surplus structure and thinking about potential surplus unwinds).

Lido DAO’s illustrative balance sheet

Lido DAO’s balance sheet as of Aug 1, 2023 is shown below.


Source: https://dune.com/steakhouse/lido-safu

The above balance sheet is illustrative in nature–neither the protocol nor the DAO control the assets or the liabilities in a traditional sense. However, the view is useful as a way of demonstrating the integrity of the protocol for stakeholders such as stETH holders.

As a quick summary (at time of writing):

  • stETH in circulation is matched by staked ETH and the omnibuffer (essentially a “working capital” account consisting of execution layer rewards, the withdrawals vault and the deposit buffer).
  • In addition, Lido DAO has 39,717 ETH worth of stETH in surplus assets, including 5,581 stETH set aside as a static slashing provision and an additional surplus of 34,136 (this is being incremented by protocol income and staking rewards).

Any protocol that routes user Ether to coordinate activities involving a level of risk (such as staking) ought to consider liquidity and slashing and operational resilience as part of balance sheet management.

A) Liquidity

The purpose of this section is to discuss the merits and parameters of a potential liquidity buffer to increase withdrawal liquidity for stETH users. Should the DAO set aside capital to enhance the withdrawal experience? If so, by how much?

Lido stETH holders can redeem stETH for ETH in one of three ways:

  1. Via withdrawals through the omnibuffer, immediately provided there is enough ETH
  2. Via withdrawals through validator exits (unique among liquid staking protocols to offer this possibility)
  3. Via AMMs/CEX’s (with some slippage + fees)

The fee drag of 3) and the time delay of 2) are primarily outside of the protocol parameter’s control. However, Lido DAO has the capacity to affect its ETH buffer size based on what level of user service the token holders want to aim for.

The current state: The omnibuffer as mentioned above can be thought of simplistically as a “working capital” account. At the moment, there is no protocol parameter capturing a specific liquidity target for users wanting to withdraw.

The analytics team has previously done a study of the expected withdrawal time based on the likely size of the omnibuffer. The conclusion was that in most cases, withdrawals through the buffer should be quicker than those of a vanilla Ethereum staker. In fact, since Shapella, because user deposits have outpaced withdrawals (i.e. Ethereum’s rising staking ratio), the omnibuffer has been remarkably effective at expediting withdrawals. Only ca.7% of withdrawal requests since May 2023 have been met via direct Beacon chain withdrawals, with the rest being routed through the omnibuffer. However, this is unlikely to be the case when the Ethereum staking ratio reaches equilibrium and new deposits into the protocol slow.

As Lido DAO’s surplus grows, token holders are in a position to go a step further to ensure that stETH holders’ withdrawal experience is as seamless as possible. Near instant withdrawals would enhance convenience, particularly in relation to centralized staking provider options. To that end, we can model a target liquidity buffer based on the level of service (high/medium/low) Lido DAO token holders believe would be suitable for stETH holders.

Methodology: We calculate target liquidity buffers for high/medium/low scenarios by assuming a level of withdrawal requests and subtracting the expected liquidity in the omnibuffer. We rely on the historic distributions of these variables as input.

Historic Data (to July 31, 2023):

Daily Withdrawal Request % of ETH staked with Lido Lido EL Rewards APR
Min 0.00% Min 0.36%
Max 7.03% Max 7.30%
Median 0.05% Median 1.54%
Mean 0.17% Mean 1.86%
68th percentile 0.09% 68th percentile 1.88%
95th percentile 0.24% 95th percentile 4.34%

(Note the distributions of daily withdrawals and EL Rewards both have a heavy right skew due to the large stakers for the former and MEV for the latter. The corollary is that their mean > median.)

Source: https://dune.com/queries/2475298, https://dune.com/LidoAnalytical/lido-execution-layer-rewards

Model assumptions:

Target Liquidity Daily Withdrawal Request (-) EL Rewards (-) CL Rewards (-) Deposits (=) Target Liquidity Buffer
High 95th percentile Median Deterministic None ?
Medium Mean Mean Deterministic None ?
Low Median Mean Deterministic None ?

Result: Though the numbers fluctuate, but it would seem that at current stETH levels, aiming for “medium” liquidity to withdrawers would involve upping ETH in the buffer to ~11k (0.13% of total stETH) or ~18k for “high” liquidity (0.21% of total stETH).


Source: https://dune.com/queries/2814917

Of course, there is an opportunity cost of enabling a better stETH user experience, specifically the drag on staking rewards from the incremental ETH retained in the buffer. Below is a sensitivity table showing the lost staking rewards on an annualized basis to the protocol at varying levels of the liquidity buffer and staking yields.

In time, the direction of travel ideally would be to programmatically implement a dynamic liquidity buffer as some percentage of the total amount of Ether routed through Lido. For the purpose of the initial conversation, we invite the DAO to discuss the following proposal.

Recommendation: Maintain a minimum liquidity threshold of 10k ETH, which would be roughly in line with the “medium” target level (~0.13% of deposits). This would provide instant 1:1 stETH withdrawals on average at a manageable opportunity cost to the protocol. A reasonable rebalancing frequency could be further refined.

B) Last-line Slashing and Operational Risk

Recap: Ether staked through the Lido protocol is subject to slashing and other operational risks, as with any other Ethereum validator. To guard the protocol against the impact of potential long-tail slashing events, Lido DAO token holders previously approved the purchase of expensive slashing insurance that cost ~25% of the DAO’s annual protocol fees. In July 2021, Lido DAO token holders elected to stop buying the cover and voted instead in favor of exploring self-cover. The analytics team subsequently conducted an examination of offline and slashing risks, which concluded that self-cover could be a reasonable alternative to mitigate solvency risk.

Quantifying slashing & offline risk for self-insurance: The analytics team’s model has since been updated using two different states of the network: current and predicted state after the 60k validators queue is resolved.

General Data Current state After queue
Total staked 23,782,515 23,782,515
Lido deposited 8,250,118 8,250,118
Total active validators 743,232 803,938
Lido active validators 240,513 257,816
Others active validators 502,719 546,122
Avg lido effective balance 32 32
1 year rewards (assume 50 ETH daily) 18,250 18,250

By this analysis, the below scenarios are the most probable outcomes with increasing levels of severity. In the most extreme case where 100% of a single big operator’s validators are slashed, probabilistically ~0.098% of the Ether routed through the Lido protocol could be at risk. This damage could hypothetically be covered by Lido DAO’s 1-year income pre operating expenses. Note this calculation is based on only the existing Curated Node Operator Registry, without restaking and other possible solutions which would increase risks and APR, and without DVT/permissionless staking. Progress on both fronts will bring with them new opportunities for decentralization but also new slashing and operational risks, which will need to be accounted for in time.

Slashing & Offline Risks
Current state
Scenario total_loss loss_offline loss_slashed % of deposits % of 1Y rewards
Single big operator, 100% validators offline for 7 days 104 104 0 0.001% 0.57%
Single big operator, 30% validators slashed, 100% validators offline for 7 days 2506 233 2273 0.030% 13.73%
Single big operator, 100% validators slashed 8112 533 7579 0.098% 44.45%
After queue
Scenario total_loss loss_offline loss_slashed % of deposits % of 1Y rewards
Single big operator, 100% validators offline for 7 days 100 100 0 0.001% 0.55%
Single big operator, 30% validators slashed, 100% validators offline for 7 days 2497 224 2273 0.030% 13.68%
Single big operator, 100% validators slashed 8093 514 7579 0.098% 44.35%

Quantifying other operational risks for self-insurance: Beyond offline penalties and slashing, Ether routed through the Lido protocol faces other operational risks. The analytics team has specified two additional scenarios which reflect risks emanating from the concentration of client and server type (see data here).

  • Consensus layer client risk: 37% of Lido validators use Prysm. Despite greater client diversity compared to the Ethereum network itself (46%), a hypothetical critical bug in Prysm - assuming it takes 2 days to fix - could result in a loss of 0.209% of total Ether routed through the protocol.
  • Infrastructure risk: 48% of all Lido validators use public cloud. If all cloud providers (such as AWS, GCP, etc) refused to provide infrastructure to Lido node operators- assuming it takes 3 days for node operators to transfer their validators to new infrastructure - 0.008% of total Ether routed through the protocol would be at risk.
Other Operational Risks
Current state
Scenario total_loss loss_offline loss_slashed % of deposits % of 1Y rewards
36% are offline for 2 days (Prysm critical bug) 17219 17219 0 0.209% 94.35%
48.3% offline for 3 days (Infra risk) 682 682 0 0.008% 3.74%
After queue
Scenario total_loss loss_offline loss_slashed % of deposits % of 1Y rewards
36% are offline for 3 days (Prysm critical bug) 18476 18476 0 0.224% 101.24%
48.3% offline for 3 days (Infra risk) 731 731 0 0.009% 4.00%

This list of potential operational risks is non-exhaustive and the evaluation of other material risks could be included down the road (e.g. risks related to geographic/jurisdictional concentration or poor execution layer client diversity).

Recommendation: The provision for slashing could be defined dynamically once a day or once a week based on the amount of Ether routed through the protocol. Its role is to capture a realistic probabilistic amount of Ether that could be at risk at any given time for a broad range of risks. With a simple linear combination model, the slashing and operational risk self-insurance limit would come out to 0.315% (=0.098%+0.209%+0.008%) of total Ether routed through the protocol. At time of writing that would work out to 25,608 stETH rather than the current slashing provision (5,581 on August first).

Putting it all together

  • Stacking the above-mentioned priorities and comparing against the current protocol surplus may seem to suggest there could be some “unallocated” surplus (indicated in green) beyond providing expedited withdrawals and self-securing the protocol
    • However, Lido DAO’s operational risks are not very well-understood at this stage, especially true as the protocol is set to undergo node operator diversification
    • As such, it seems prudent to retain as much “unallocated” surplus as possible for the time being, at least until there is a wider rollout of the new staking router modules
  • Eventually, protocol’s reserve management could be automated. For instance, EasyTrack motions like selling or moving stETH might halt if they risk dropping the reserve below the programmed threshold
  • In any case, this should only serve as a starting point to think about liquidity and slashing / operational risk exposures and mitigants
  • This model would have to be refined when staking router modules with collateral requirements for validators get approved by governance, as they will reduce the risk exposure to the DAO surplus and could reduce the amount of operational and slashing risk reserve required
  • Other reserves that we have not considered but could be possible include reserves for ecosystem grants, education initiatives or other broadly positive proposals such as the Launchnodes Impact Staking proposal, passed recently by DAO token holders

DAO Discussion Points & Next Steps

  1. Should the DAO use this framework for reserving its surplus for various core constraints, notably liquidity and slashing?
  2. Should the DAO consider automating a liquidity reserve to expedite withdrawals?
    a. Should the threshold be defined as per the above, i.e. initially 10k unstaked ETH?
  3. Should the DAO consider a formulaic method for reserving a provision for slashing and other risks in the DAO’s provision for slashing wallet (prior to identifying other meaningful risks)?
    a. Should the threshold be initially defined as per the above, i.e. 0.32% of total deposits or 25,608 stETH
  4. What other risks should the protocol be conscious of and allocate surplus against?
  5. Should token holders include impact and ecosystem grants in a hypothetical reserve approach to the DAO surplus?
13 Likes

Only ETH in omnibuffer can’t cover the withdrawal requests, ETH in liquidity reserve will be used and will not contribute to APR dilution?

In general I think this is a really great piece of analytical work and important in trying to drive a substantive discussion on how the DAO can support the robustness of the protocol. However, I am in strong disagreement with a few of the points made. In the interest of brevity, I won’t go through and highlight all the sections I think are great (most of it basically) but will just focus on areas where I think there’s going to be contention.

This is a very strong statement to make. In the case of catastrophic (or even just substantial) slashing events, there’s really no way to ever be able to make all users “whole”, so as a headline this is at best misleading. The framing is also couched in traditional terms that make things very confusing (solvency => an expectation that the protocol is somehow obligated to make people whole to begin with, and implies a lot of things of an almost custodial nature). I really wish we’d stop using loaded traditional financial terms to describe new system paradigms. This applies to the “working capital” description used as well. If we want to show that staking protocols are a new form of common good / infrastructure or even utility, I think we’d do better to move away from using this traditional terminology. I acknowledge the explanatory utility of the phrasing, but ultimately I think we can have these discussions without relying on this as a crutch.

I’m really against reservation of buffer for a few reasons, but the main ones are these:

  • depending on how “quickly” you try to keep the buffer topped up at all times you may actually exacerbate cycling stake even more than it is currently (and with things like the proposed limits to churn rate basically being a done deal already, make it a lot worse)
  • having a “readily available” buffer ends up only benefitting people during fair weather, and even then it will always be utilized by arbitrageurs / large players / bots before anyone else (and obviously especially during non fair-weather conditions it gets insta-zapped by some bot)
  • permanently set aside buffer can basically cause meaningful and difficult to calculate rewards drag due to compounding effects

Most importantly, though:

  • I don’t think these kinds of “economic mechanisms” belong at the protocol layer, but rather should go on top of it. The base layer will always be less nimble and able to reason about economic effects of things happening on top of it, attempting to codify things into the core protocol adds a) complexity and b) potential exploitability and I don’t think the net benefit of doing it “in protocol” vs “atop protocol” is substantial. My opinion is that if there’s demand for “always available withdrawals” then it can be built atop the protocol and incentivized (if necessary) accordingly, but not in-protocol. In fact if you manage to do this in an abstract way then you can create a market out of different approaches to this, where different actors can compete, as opposed to building an ultimately less efficient and agile mechanism at the root.
  • Making an explicit mechanism that calculates and then allocates capital about “how much should be staked and how much should not be” (or other things like reinvested or position as an LP, like Frax does) almost turns the protocol into a capital management mechanism versus a staking mechanism, which IMO is definitely the wrong direction. The simpler and purer the base mechanism, the better. ETH gets submitted, unless there’s actual real withdrawal need, then it gets staked. I.e. it should do what it says on the tin, and the tin says “stake”.

I agree with this but I don’t think it’s smart to try to do it at a “whole-protocol” level, but rather at on a per-module basis. The risk profiles of the different modules will be too disparate and modules will be independent enough that attempting to aggregate and manage this in aggregate is going to cause a lot of inefficiencies. I think there should be a larger effort here to understand to create a risk analysis framework on a per module basis, and identify what (if any) additional risk mitigation measures can be made (e.g. for the curated set understanding from NOs which explicitly insure validators they run, including through using the Lido protocol, to what extent, etc.) and identifying if there are useful mechanisms to use these risk profiles (per operator, per module) when driving staking allocation decisions (i.e. in line with what we’re researching with Nethermind).

For short term I agree an increase in a “risk reserve” is prudent, but just from a rough reading the numbers seem a bit off. If the protocol has a surplus of ~34K stETH as at Aug 1, and you want to shift to like 25608 stETH (so ~ + 20K stETH), does that even leave enough for a decent runway for the next 1-2 years? The risk/reward seems off here.

5 Likes

We appreciate this thoughtful post and the valuable insights shared on the liquidity buffer to minimise withdrawal delays and the last-line backstop against slashing and operational risk in the context of Lido’s protocol. It’s crucial to have an open and constructive dialogue to improve the protocol’s functionality and safety.

1. Liquidity Buffer

We agree with the points raised by @Izzy and share the concerns about increased protocol complexity and buffer management. Here are some additional considerations:

  • Managing the Buffer: It is essential for the community to have a clear plan for how the liquidity buffer would be topped up. Such a plan should address the DAO funds being exposed to additional risks.

  • Risk of Penalties or Slashing: Addressing this risk is critical. Any additional ETH that is unstaked is exposed to slashing and penalty losses accounted for by the Lido oracle. If losses occur, the DAO should have a well-defined plan for addressing them.

  • Sharing the Risk and Aligning Incentives: In this current version, this proposal requires LDO holders to potentially absorb penalties or slashing while stETH holders benefit from frontrunning ETH. It’s essential to strike a fair balance in risk-sharing, ensuring all stakeholders have aligned incentives.

2. Operational Risk Backstop

Your suggestions about having a risk reserve are thoughtful and worth considering. Here are some thoughts:

  • Sharing the Risk: It would be interesting to explore ways to ensure the risk is shared across all stakeholders so that it is not 100% subsidized by the LDO holders. DAO funds in the Aragon agent are still available in case of large slashing events through normal governance discussion and decision-making.

  • Protocol Fees: Leveraging protocol fees at a staking module level enables tailored mitigation measures that align with broader sustainability objectives. It allows the protocol to be prepared for unforeseen events without requiring immediate locking of funds.

We encourage further discussion and collaboration among community members and the Treasury Management Committee to brainstorm alternative solutions that strike the right balance between risk management and protocol simplicity.

3 Likes

Thank you Izzy! Some really good points.

You could argue the liquidity of the stETH token is part of its utility and appeal - i.e. users choose stETH not just because of the ease with which one can stake but also the liquidity of the receipt token too. If arbitrageurs ‘benefit’, arguably so does the stability of the market rate of stETH to ETH.

But quite a good point you make is that although the playing field is ostensibly ‘fair’ between shrimps and whales, this would be predominantly a facility only really usable by whales. Non-fair-weather use would similarly likely get zapped by whales before it could benefit other users.

There’s a reasonable counter in that a higher threshold to the depositor bot would help stabilize the market rate for all participants in non-fair-weather conditions, but it’s a fair point.

I think this is the strongest argument against either of these whole-protocol proposals. A reasonable counter point is that you could make the above proposals ‘in-protocol’ too, it just depends on how to choose to evaluate it. When new node operator modules get added by token holders, they may have specific risk considerations built ‘in-protocol’, for example.

It’s true that catastrophic network-wide slashing events may well likely overwhelm the ability of the surplus to mitigate against slashing. However, the fact that the validator sets that participate within Lido are demonstrably quite diversified and decentralized, suggests there may be eventualities of non-correlated slashing events that could be appropriately covered. This is of course, just an opinion, and may well never be satisfactorily sized.

In general though, completely agree, as well as that, in our view, the “simple and pure protocol” framing should weigh more often than not on considerations that token holders decide to include in the protocol as it more accurately describes its function at the moment and is a more appealing end-state goal.

In that light, the above analysis and considerations are all in a strictly narrow-protocol view, and are not intended to interact at all with the Treasury Management Committee.

By its foundational principles, this committee is a temporary (its ultimate objective is to automate itself and disband) community-driven initiative to bootstrap minimalist programmatic and autonomous governance policies over surplus that might come after, in some sense, the protocol.

5 Likes

Can you explain why the expected loss for 36% of validators being offline 2-days is more than with 48.3% offline for 3-days? Are you presuming that all Prysm validators are offline in that scenario, but only Lido validators are offline in the infrastructure case?

Hello, yes in the first scenario (36% being offline), we assume that all validators which use Prysm are offline (both Llido and non-Lido), Lido has 36% Prysm validators (which equals 11% out of the whole network), but since whole network Prysm share is 46%, such bug would cause inactivity leak, and total losses will be really big.
In the second scenario (48.3% being offline) we take into account only Lido validators (i.e. around 16% of total validators), such a scenario wouldn’t cause an inactivity leak, but still would bring a lot of penalties to Lido.

2 Likes

My understanding is that the Lido DAO receives 5% of the rewards from Lido and there is currently a surplus that has built over years. Your proposal is to take a large majority of that existing surplus to increase the provision for slashing.

Going forward, you seem to suggest that the provision should always be 0.315% of the ETH in the protocol. To achieve that the DAO would need to continue to send more funds to the slashing provision as the amount of ETH in the protocol increases.

What happens when the ETH in Lido grows? The DAO gets 0.2% per year for each ETH in the protocol (presume stETH gets 4% rewards per year, DAO gets 5% of that 4%). It also earns the same 4% on its stETH.

Except for 3-months, the DAO has operated with negative income since inception. Provisioning for 0.315% of ETH in the protocol is the same as adding a % cost to the DAO revenue every month.

In August the DAO had 1,509 ETH in revenue it seems. The problem is that from July to August, Lido added ~600,000 ETH. Provisioning 0.315% of that would mean adding 1,890 ETH to the slashing provision in August, which is more than the revenue of the DAO for that month.

How can the DAO provision more funds for slashing than it receives in income? I don’t see how a fixed 0.315% provision can be achieved when Lido is growing. Ignoring the investment income to the DAO, which appears small compared to the 5% fee, the DAO gets 0.2% from each ETH added to Lido, how can you allocate 0.315% per ETH in that case?

1 Like

It’s a very good point, indeed there can be times when protocol ETH grows or contracts faster than its ability to reach a threshold level of surplus. The overarching principle here was to think of threshold levels as setpoints that the protocol converges towards over time.

Incidentally the DAO may well have reduced its surplus by approving grants for eg but the protocol has always operated at a positive rewards rate other than when it was sending protocol treasury rewards to the slashing provision address.

1 Like

Hey there!
Firstly, thank you and all contributors for a brilliant analysis and structure.

Regarding provision for slashing and other risks i would suggest exploring possibility of expanding suggested approach a little bit more.
Formulating provision as a function of total deposits is fantastically straightforward, but may be too much in terms of reducing complexity.
Reframing this as a function of Lido validator set and network state could be more transparent in terms of actual connection between provisions and risks they could mitigate.
E.G. provisions are sufficient to mitigate consequent events of:

  • Slashing of all validators of biggest Node Operator
  • Critical bug in most used consensus client within network with no slashings involved and 2 day period for switching
  • All cloud providers denying their service to Lido NO, leading to 3 day switching period

That would still be the same 0.32% numerically (based on the current state) but:

  • Directly sensitive to initiatives within the validator set (e.g. increasing diversification)
  • More transparent to communicate within DAO - e.g. separate events could be added/excluded or transformed (like covering only 50% of one of the effects or expanding slashing coverage to 2,3,… NO).

Thank you, amazing discussion!

3 Likes