Introducing NO Bonding and Increasing Stakeholder Incentive Alignment

TL;DR: The launch of Lido v2 will help diversify, decentralize, and scale Lido’s Node Operator Set.
This proposal aims to catalyze a discussion re. increasing incentive alignment between the Lido DAO and new/existing NOs. I propose introducing NO bonding, which increases overall protocol safety as well as stakeholder incentive alignment. I also propose revisiting the current fee split between the Lido DAO and NOs in favor of altering the current split (50/50) in favor of the DAO.

Issues

  1. NOs don’t take enough protocol risk relative to their earnings (have no meaningful capital at stake)

    • Running a validator on behalf of the DAO is highly profitable. Current costs associated with operating an Ethereum Validator cluster at scale are around 600k USD per year. The average Lido NO manages ~$360m worth of staked Ethereum and earns roughly $900k per year (assuming 5% staking yield). Estimated profit margins are around 40%.
    • NOs who run validators subject to slashing events are currently allowed to continue operating on behalf of the DAO. Historically, the Lido DAO has covered all user losses. NOs also have no liquid capital at risk through the protocol.
    • As the recent RockLogic slashing incident demonstrated, it requires significant coordination by the DAO to resolve slashing events. I believe a more straightforward, economically-aligned approach is warranted.
  2. NO/Lido DAO fee split

    • Fees are currently split evenly between NOs and Lido DAO (5% of staking yield each). This number was arbitrarily chosen and should be revisited and reevaluated using historical empirical data, especially since the DAO is losing ~$15m a year (h/t @hasu) and given that NO profit margins are currently >40% as outlined above.
  3. Lido’s Insurance Fund is too small to effectively support continued growth

    • Lido’s insurance fund currently holds 6.1k $stETH (0.09% of TVL). This number is arbitrary and potentially too low to accommodate Lido’s growing TVL. I suggest altering the insurance fund minimum to be a function of the ratio of: Insurance Fund value/TVL (Perhaps 0.2 - 1% as @monet suggested)

Proposed Solutions

  1. Introduce NO Bonding

    • On average, each NO is responsible for managing $360m worth ETH. I propose introducing a bond between 0.2% - 1% of committed stake. This bond will be posted by NOs and act as insurance in case they face a slashing event
      • Bonding in this context is defined as NOs putting up collateral as a ”Bond” which gets slashed if they don’t follow a set of rules defined by the Lido DAO.
    • Bonds can be posted in any of the following assets: $ETH, $LDO, $stETH, $ETH-stETH LP
    • Bonds can be posted progressively, but NOs have been operating profitably since the launch of the Lido DAO and should be willing to put up capital to continue to attract more stake .
  2. Revisit NO/Lido DAO fee split

    • The 50:50 split between NOs and DAO is arbitrary and not based on any empirical data.
    • Lido shouldn’t be overpaying NOs and more work should be done to come up with a more equitable split in fees. As the market leader, Lido has the power to price this split.
    • I propose contracting a research entity to assist the DAO with revising this parameter
  3. Introduce reputation scoring for NOs

    • A reputation score system should be implemented which analyzes NOs performance and assigns scores which can be used to manage allocation among NOs within each type of NO set.
    • Bonding, alongside parameters like historical performance, can alter NO reputation scores and user deposit allocations
    • $LDO staking can possibly be used to boost someone’s score and show skin in the game from NOs.
    • Newer NOs or DVTs (SSV for ex.) won’t have any historical performance data. I propose having them stake $LDO to align protocol/DVT cluster incentives. This post states that limits will be set on new NO allocations and thus those willing to stake more for reputation should get preference over incoming ETH vs. those who do not.

Next Steps

After collecting feedback from the community, I’d like to introduce concrete proposals for:

  1. Contracting a research firm to re-evaluate the Lido DAO/NO fee split
  2. Introducing NO Bonding (temp check)
  3. Reputation Scoring (temp check)

My quick 2 cents, I may do a longer one latter. I agree with Hasu’s point Proposal: Introducing $LDO Staking - #25 by Hasu of LIDO attracting a quality NO’s set partly thanks to the low friction. I tend to agree that some bonding would be positive, but I’m not sure if it compensates it negative externalities. Note that if a meaningful bond was required I don’t think client teams could afford to stay in the set.

Posting a small bond on the range of fractions of a percent addresses the most recent incident yes. But it may be woefully insufficient for future events. If the entire key set of an NO is slashed, the penalty ranges from a min of 3.12% (since bellatrix) to 100% of offending stake.

6 Likes

If the concern is slashing, then this can be handled in ways other than bonding. Requiring a bond that is on the order of yearly revenue (0.2%) or 5x that (1%) will change the NO makeup to remove smaller players in the permissioned NO pool, centralizing stake further to fewer NOs. The opposite of what Lido aims to do with its NO selection.

Reputation scoring through on-chain performance sounds great, it unfortunately also exerts a regionally centralizing force. Let me unpack that. The DAO have encouraged us to run in APAC, and do so cross-region, for reasons of regional diversity and resilience. We have CL:EL servers in Sydney and Singapore, and threshold key servers “all through” the APAC region.

This comes with a slightly lower effectiveness as measured by rated.network. If the DAO now signals that it values on-chain performance far higher than regional diversity, then everything moves into one data center in EMEA. All the components are one hop away from each other, and the relays are closer. Relay latency from APAC is something else :sweat_smile:

The fee split I have no detailed thoughts on, other than saying that right now I feel quite tied to the success of the protocol by the 50:50 split, and will happily do whatever it takes to act in the protocol’s best interest, such as region diversity, client diversity, and highly resilient operation with 0 slashing risk. As well as keeping keys to 1,000 per environment. We don’t run one highly resilient node cluster, we run one per 1,000 keys: For the benefit of the protocol.

I can see another centralizing force there, this time for “number of keys per environment”, if cost optimization is something NOs have to think harder about.

We have not ever relied on the insurance fund, and have no intention of ever needing to. We do this by “never moving keys” as a rule, and when we do need to move fragments in our 3/5 threshold signer - which happened yesterday to improve latencies in the first APAC enviroment we stood up for the DAO - we move them one fragment at a time, nuke the old server, wait 20 minutes, and move the next fragment. This keeps slashing risk to 0.

8 Likes

Obviously I cannot speak for all contributors or the entire NOM workstream, but I’m happy to share my personal thoughts on this specific proposal.

TL;DR: No; not right now, and not like this. Directionally this proposal has merits; I agree (in part) that the only way for the protocol to grow permissionlessly in a scalable manner is through bonding models, but IMO this proposal falls short of tractability and practicality given economic and network constraints, and it’s worded in a way that to me reads “flip the table and do this now” without realizing that a) most of this stuff is already chugging along, and b) is of the “multiple years to design and implement correctly” flavor.

Anyone is invited to participate (this is a DAO after all) in the work. Join the work being done on creating a marketplace for validator sets via the staking router and figuring out economic models for viable permissionless staking, figuring out how to staking allocation mechanisms that are fair and sybil resistant and how to develop short-term and long-term NO scoring mechanisms.


Issues with the proposed approach as I see it:

  • it’s short-termist
  • its objective function is solving for the wrong thing
  • its objectives are not aligned with the long-term sustainability or health of the protocol
  • it ignores the importance of certain design and economic decisions made early on (which are still active) that are large components of why the protocol is so successful to begin with (and a large part of the reason the DAO is in a position to have this discussion now at all)

The tactics/ideas described in the proposal seem to be driven by the desire to maximize DAO profit, when instead protocols like Lido are all about minimizing risk (smaller margins are an acceptable tradeoff). Protocols like Lido should be more like common goods than they are for-profit businesses – the fact that they are on-chain and decentralized puts them at a disadvantage to centralized competitors, and they can never truly compete on marginal cost or revenue, they must win by (a) creating something that is useful, (b) creating something that is sustainable, (c) being as open and fair as possible, and (d) supporting the ecosystem to the largest extent possible. Short term profit maximization creates negative externalities, and in the game of network economic security these negative externalities are grave and ultimately destructive.

NOs don’t take enough protocol risk relative to their earnings (have no meaningful capital at stake)

This is true in a narrow economic sense, it’s really not true if you think about it from an overall risk perspective or protocol health. NOs basically put their entirely livelihood at risk in case of a serious or egregious mishap. Brand, reputational, and economic damage (future revenues) for long-term repetitive games are more important than capital at stake, for a multitude of reasons, the most important being:

  • In case of a mass-slashing, bonds won’t really do much unless they’re huge (e.g. 16 ETH), and bonds of that size aren’t viable economically. So, in worst case scenario they don’t address the risk (or its impact) and in the best case scenario they just cause capital inefficiency and add friction.
  • There is no way to guarantee that bonds actually come from node operators themselves. When you force things like bonding requirements on operators that do not have the capital required to do so, you end up with a few scenarios:
    • NOs who get capital from elsewhere (e.g. capital allocators) - you end up with the same problem (the entity running the service isn’t the actual entity providing the bonding capital) and an additional layer of risk to the protocol where the capital allocator (who may be funding multiple different operations, behind the scenes) effectively has levers over both the staking protocol as well as the underlying economic security of the network (permissionless bonding with validators basically allows a malicious capital allocator to launch multiplicative-power attacks on the network).
    • NOs who will crowd-source capital from the market - you actually end up with what are essentially unbonded validators where NOs are giving up some % of their revenue share to retail. Why would you want this lever and economic mechanism to be extra-protocol vs intra-protocol?
    • Centralization of intra-protocol stake distribution (and potentially also inter-protocol) amongst few NOs who can amass the large amounts of bonds required, thereby drastically increasing protocol risk. For example look at rocketpool which boasts 2500+ node operators (which in reality are probably sub-1700, since there are entities running multiple “nodes”) but the top 10 operators basically run ~35-40% of the of the stake.
  • The necessity of bonds limits the NO pool, and especially the possibility of new entrants, and NOs who cannot get the capital for bonds (who may otherwise be great at their job) are shut out and new NOs are not allowed to grow into being big NOs.
    • You increase barriers to entry for new Node Operators. We have had multiple cases of relatively smaller NOs joining the Lido operator set and not only performing well but also increasing their operations and meaningfully contributing to the protocol. With bonding requirements at the outside such smaller organizations would likely not be able to participate at all. A reputation and scoring system can help alleviate this, but this stuff is very difficult to get right while preventing sybil attackers from taking advantage of it.
    • There are operators such as client teams which participate in Lido (Sigma Prime, Prysmatic Labs, ChainSafe, etc.). Participating in the Lido protocol is essentially a revenue stream for them that they would otherwise not have, while they are focused on creating public goods (Ethereum clients) (which is net benefit and positive externality not only to Lido node operator set but also to the network as a whole). Again, special cases can probably be made, I’m just trying to point out that the current model works for a lot of reasons which are not transparent from the purely financial point of view that you’ve taken in the post.

This isn’t to say that bonds aren’t necessary for permissionless scaling. They are, but they aren’t necessary currently (we can see that the current model works, and works at scale, which no other protocol has achieved) and they have be implemented carefully as to:

  • not introduce additional risk vectors
  • not meaningfully increase the barriers to entry / friction of new (capable) NOs joining
  • upset an actually really good (and steadily improving) validator and operator set makeup

What we want to solve for here is reducing the risk of something going very wrong with the unbonded validators (i.e. serious slashings). There’s already multiple vectors to do this without requiring bonding:

  • Moving the current curated operator set where validators are “solo-run” to a curated operator set where validators are “co-run” via DVT. Multiple DVT testnets have been done and more will be, and DVT-based staking router modules are in development by third party contributors. Within 12-18 months once DVT has (hopefully) battle-tested on mainnet, it would make sense to start transitioning the set over to a model like this
  • Having node operators use threshold-encrypted signing keys (e.g. via Dirk). A lot of NOs participating in Lido use Dirk, but it’s also not desirable for ALL of them to do so since this would increase risk of correlated failure etc.
  • Some form of TEE-enabled secure signers (eg like puffer has done)

NOs who run validators subject to slashing events are currently allowed to continue operating on behalf of the DAO. Historically, the Lido DAO has covered all user losses. NOs also have no liquid capital at risk through the protocol.

There’s multiple ongoing[1] discussions[2] about how the DAO should treat instances like this. If for example you think operators should get kicked out if a slashing happens, that would be a great place to make an argument for it (personally I think in certain cases it is warranted, but let’s do that work in the right place vs via sweeping protocol changes).

Additionally, for losses not related to slashings, NOs have actually reimbursed stakers for various losses multiple different times (1 2 3 4, there are more). In a normal bonded and insurance model, these kinds of losses are actually not covered. Node Operators are highly incentivized to engage in this behavior exactly because of the incentive framework at play here. To my knowledge, this kind of compensation / reimbursement does not occur in any other staking protocol (either by process/mandate/insurance coverage, or indirectly via voluntary participation of the operators).

Lastly, NOs are rewarded in stETH (and whatever they don’t sell for opex they hold), so actually they do have capital at risk through the protocol.

Paying some random researcher a few hundred thousand dollars to provide us information of limited usefulness doesn’t sound like a good idea.

  • Crypto markets are immensely volatile and immature, and thus are not able to price things correctly, and definitely not over long periods of time (e.g. more than 1-3 months)
  • The staking market is actually extremely opaque too heterogeneous a market to be able to do any really good research here. For example, it’s clear from historic stake allocation (in Ethereum) that fees aren’t actually the primary component driving user demand for staking products (e.g. Binance was (is?) offering 0% fees on ETH staking), but there’s no way to measure what is, for how long it will be, and factors and how affect consumer demand for these things.
  • Lido is a protocol with a minimum horizon of something like 5-10 years. Making short-term decisions to pinch pennies is of little incremental value compared to the longer-term proposition.

What we do know is:

  • current fee split largely works (it provides sufficient revenue for the DAO, it provides sufficient revenue for NOs, some of who take this revenue and put it back into doing good things for the ecosystem (e.g. client development, running validators in underserved regions where hosting is more expensive, running multi-client setups etc.), which you cannot really financially account for
  • different NOs have very different cost structures based on their business setups, infrastructure choices, etc. If we compress NO fees we will push NOs to compete on cost which means more homogeneous setups, which means less decentralization and diversification of the validator set, which means increased correlated risk and a less robust underlying Ethereum
  • higher revenue during good times means more buffer (with appropriate treasury management) during bad times (don’t forget NOs receive their rewards in stETH so market volatility affects them directly), which means that if you compress fees to a point where NOs are competing on cost during good times they’re going to be going bankrupt during bad times (without sufficient technical “escape hatches” yet in the network like EIP 7002 for those “break glass in case of emergency” scenarios)
  • the overall fee (10%) seems like it was a good read on where the market would be in ~2 years when the protocol started
  • it’s perhaps possible for NOs to provide the service of running thousands validators well at somewhere below the 5% fee share, but it’s basically impossible to measure how much increased risk that would entail at each price point, especially given the volatility of the markets

Staking router modules will afford us the ability to play with the fee split based on the objective function of each module. For example, if we want to bring solo stakers to lido, it probably makes sense to look at things like the DAO share going down and the NO share going up, in order to be competitive with other staking protocols, perhaps on some sort of sliding scale on how well-bonded the validators submitted are. This will provide vital data to be able to figure out how a market for fee-competition may develop and the secondary effects it might have on things like validator set makeup and quality.

This is an easy thing to say, and a very difficult thing to do right. There is a reason that the research on this is progressing slowly (although more or less at the pace expected when we began) and it’s because it’s a problem with many facets and serious drawbacks if done incorrectly. As linked above, this is a core part of the research together with Nethermind and there’s even a grant that @ccitizen is working on for a shorter-term pilot approach could be trialed eg via a staking module.

If you reward only certain things (e.g. performance or profitability) you will drive operator sets towards homogeneity or corner-cutting. Finding ways to reward all the different things that are important in a robust validator set, and doing it well, transparently, and in a sustainable manner (i.e. as autonomous as possible, as trustless as possible, with as little manual intervention as possible) is probably one of the hardest things in the proof of stake space. If you do this wrong, you will doom the protocol (and potentially the underlying network at the same time). It is probably the most important piece to get right.

The lynchpin here is being able to actually take stake away from operators who are either not performing well or not in the best way that increases the good qualities of the validator set, and for something like that the protocol needs triggerable exits (EIP-7002) to be able to do so permissionlessly and at scale; bonds and fees don’t do anything here.

I think staking router modules are actually a really interesting sandbox to try out different types of models here (e.g. if operators self-insure validators vs providing bonds, group-insurance via something like stETH/ETH/LDO stake, revenue-share with on-chain cover providers in exchange for slashing insurance).

However, in general I think it’s also worth combatting the idea that insurance/cover should be provided by the protocol itself. stETH is a financial primitive; if users want their stETH insured, I think the optimal market solution for this would be to make it as easy (and cheap) as possible for users to do this, versus the protocol doing it for them.

17 Likes