Open Validator Service Broker: Empowering Client Diversity

Open Validator Service Broker

This draft spec is based off of 1. Kubernetes Service Catalog’s and Operators & 2. Open Service Broker API

Some basic knowledge of Kubernetes and Infrastructure as a Service is assumed.

Motivation

We are proposing that Lido Finance adopts a formal specification for implementation by its Node Operator’s (those that are maintaining and operating the underlying infrastructure that Lido uses for its stakers) for a Service Broker to help facilitate Client Diversity and Node Operator interoperability across their infrastructure. By adopting such requirements, it will enable Lido Finance users (re: stakers) to request that their stake be used to allocate/provision a certain client, which can be informed by potential incentives by Lido Finance for encouraging or discouraging certain choices.

A service broker provides an interface between an service provider (e.g. GCP, Azure or AWS), and an application platform (e.g. Kubernetes or Cloud Foundry). The service broker is managed by platform operators (Node Operators in Lido’s case).

These platform/node operators are responsible for configuring the broker to meet the needs of their network, platform(w.r.t. Staking requirements, e.g. uptime, availability, slashing mitigation, etc), and developers(including users). Developers could possibly use the broker to provision and bind new services to their applications.

An example of this would be a team wanting to provide RPC access to its managed node’s for their own users (this could end up being like a ‘Infura’ like service where a team uses their staked ETH and can access the equivalent nodes that their stake would provision).

Therefore, a service broker is responsible for federating access between an application provider and a developer with respecting the wishes of the platform and its operators. Each of these parties influences the broker, its services, and structure.

Application: Client Diversity

Client diversity is simply ensuring that not one client implementation is relied on to the extent that if an issue was to arise it would not cause a cascading or catastrophic failure. Using multiple different clients is especially important post-Merge to reduce the probability of a bug causing a halt of the chain, as the requirements for such a scenario are smaller than pre-Merge.

With a Node Operator having this capability, Lido Finance can incentivize stakers to request that their ‘representative validator’ should be running a specific client that is chosen from the service catalog.

  • Fetching the catalog of backing services that a service broker offers
    The catalog describes all of the services that can be provisioned through a service broker, and each service is made up of a specific Client Implementation and version, Execution layer specifics, Engine API version, etc. A service can bundle multiple clients or just simply have one. It can even be middleware (e.g. MEV-Boost).

  • Provisioning new service instances
    A service instance is a provisioned instance of a service and plan as described in the service broker’s catalog.

  • Connecting and disconnecting applications and containers from those service instances

  • Deprovisioning service instances

This API Specification is based off of what we use to model our infrastructure internally at Manifold Finance. You can see the draft specification here on GitHub, GitHub - manifoldfinance/open-validator-operator: Open Validator Operators - specification and design documents

Feedback

Feedback is always welcomed. This was written based off of only publicly available information that I could find w.r.t. Node Operator requirements from Lido.

Disclosure: I am a co-founder at Manifold Finance, and we also have a submitted application to become a Node Operator for Ethereum Mainnet for Lido Finance. I own a position in Lido Finance, Manifold Finance, Sushiswap, Yearn Finance, Rocket Pool, Rari Capital.

6 Likes

This is an interesting proposal/project and possible initiative, but I’m not sure adopting a formal spec for all NOs to implement (at least one as tailored as this) is desirable. However, a formal spec that is available for NOs to use (but not required) could be useful.

A couple of initial thoughts from my end.

We are proposing that Lido Finance adopts a formal specification for implementation by its Node Operator’s (those that are maintaining and operating the underlying infrastructure that Lido uses for its stakers) for a Service Broker to help facilitate Client Diversity and Node Operator interoperability across their infrastructure. By adopting such requirements, it will enable Lido Finance users (re: stakers) to request that their stake be used to allocate/provision a certain client, which can be informed by potential incentives by Lido Finance for encouraging or discouraging certain choices.

We should bear in mind that formal specifications can have unintended/negative effects, such as creating common/singular points of failure and increasing correlated risk. The nature of this spec/proposal means it works better (or is at least much easier to execute on) for certain types of infra setups (i.e. cloud-native ones managed via kubernetes) vs others (e.g. local / on-prem ones), and will likely lead to a centralization of infra setups and also infra providers (an over-reliance on cloud service providers). When we’re talking about risk of centralization and diversity, infra setup variability is almost as important as things like client diversity (in the future, it may even be more important). In the long-term there will likely be a set of large providers provisioning such services across multiple chains, but the difficulty and real value is in creating a support system for a fat and long tail of smaller operators who can contribute, and this type of proposed solution pushes them to basically be rough copies of each other running similar setups on cloud providers, and that doesn’t really do much for decentralization.

These platform/node operators are responsible for configuring the broker to meet the needs of their network, platform(w.r.t. Staking requirements, e.g. uptime, availability, slashing mitigation, etc), and developers(including users). Developers could possibly use the broker to provision and bind new services to their applications.

Interesting, but the question becomes how can users (or a protocol, like Lido), verify that the way a broker has set up their platform actually matches how they say they have set it up? In a permissioned/curated system, this is somewhat possible (at least punishing operators may be), but in a permissionless system a bit less so.

An example of this would be a team wanting to provide RPC access to its managed node’s for their own users (this could end up being like a ‘Infura’ like service where a team uses their staked ETH and can access the equivalent nodes that their stake would provision).

For stETH to be fungible, it should explicitly not be linked to a specific validator or set of validators. This may make sense in protocols where users mint staking tokens that are directly attributable to a specific operator or set of validators, but this is not something Lido wishes to do. I guess you can pool “all Lido validators” together, but on the other hand you don’t want to do things like make the EL & CL nodes that are used for validating also public RPCs, it leads to a DDoS attack vector. That said, the operators could provision separate nodes that would act as RPCs accessible by Lido stakers. The question then is how do you ensure that these RPCs are managed in a way according to Lido requirements (privacy, data retention, not abused for MEV etc.), but it’s a very interesting idea!

With a Node Operator having this capability, Lido Finance can incentivize stakers to request that their ‘representative validator’ should be running a specific client that is chosen from the service catalog .

As discussed above, this is a bit counter-intuitive from a fungibility perspective. My opinion is that Lido could (and should) eventually enforce this at the Lido-level, but not at the staker → validator level. I think the best mechanism to do this is stake (re-)allocation and/or fee variability according to a set of parameters that optimizes for risk mitigation and network health – something that will be possible once withdrawals and some form of triggerable exits are possible. Stakers and LDO holders should most likely have a say in this, but it should be managed at the protocol-level (e.g. via a specific implementation of an Operator Set Strategy).


I think this is a really strong idea but there is a tension here with the kind of specific infrastructure it lends itself to. Taking a step back, if there’s a way to implement something similar but in a slightly more abstract way, and then manage the “brokering” via on-chain mechanisms, I think there could be something very useful for the long term.

Broadly, a well designed “service broker” like this should:

  • be implementable regardless of specific infrastructure configuration
  • allow for allocation of stake should be done on-chain against the on-chain identity that each operator has within Lido (or via “pools” of operators who may constitute 1 identity, who can then have a similar brokerage system between them)
  • allow the protocol to somehow verify that what the Operator is purporting through the Broker is what it’s actually delivering (not sure there’s a way to really do this permissionlessly right now, but would be a very interesting problem to solve)
3 Likes

This is a great point, the specification allows Terraform as well as Kubernetes, here is an example using terraform and here is a pure Go example for enabling Service Brokers for Redis.

The specification doesn’t formalize specific requirements for provisioning a service (e.g. you must use this flavor of linux distro, with these packages installed, etc). Its focused primarily on:

  • Provisioning new service instances
    A service instance is a provisioned instance of a service and plan as described in the service broker’s catalog.
  • Connecting and disconnecting applications and containers from those service instances
  • Deprovisioning service instances
    This action simply deletes all the resources created upon the initial provisioning of the service instance.

This question is more of an implementation detail. We assume that they are good-faith operators otherwise they would face repercussions.

stETH is already linked de jure to all Lido validators. Node operators are i.e. implicitly subsidizing its peg.

The RPC Methods would not be accessible on nodes that are actually doing attestation/signing/etc. A sidecar node / failover node would be a possible candidate for such activity. Additionally, an internal messaging queue could be leveraged for facilitating access in a robust manner.

At Manifold Finance we only use bare metal for production/staging environments. This would be compatible as well with other Cloud Service Provider offerings. In fact here is a matrix assessment of current cloud provider offerings where we can tell you exactly which SKU’s would be ordered to provide a reliable Eth2 instance.

At worst, we see this as a self reporting mechanism to help facilitate inter-node operator capacity and availability. Being able to report which specific versions of a client you are operating can be crucially important in circumstances related to block propagation and construction. Having this information so that we can enrich network topology construction will be important for establishing protections against correlated time-level attacks, reducing fault correlations, clock sync issues, etc.

Lots of great questions, will clarify some of the points of ambiguity and clarify some potential metrics and KPI’s that this should be driven by and should be used to benchmark against!

Much appreciated,

Sam

1 Like

Thanks for the reply!

This is a great point, the specification allows Terraform as well as Kubernetes, here is an example using terraform and here is a pure Go example for enabling Service Brokers for Redis.

Ah I see. This is quite interesting!

stETH is already linked de jure to all Lido validators. Node operators are i.e. implicitly subsidizing its peg.

I mean that “my stETH” should never be linked specifically to one or a (sub)set of (Lido) validators – the concept of “representative validators” for an stETH holder’s specific ETH doesn’t exist, the entire set is their representative.

At Manifold Finance we only use bare metal for production/staging environments. This would be compatible as well with other Cloud Service Provider offerings. In fact here is a matrix assessment of current cloud provider offerings where we can tell you exactly which SKU’s would be ordered to provide a reliable Eth2 instance.

This matrix assessment is great, thank you for sharing!

At worst, we see this as a self reporting mechanism to help facilitate inter-node operator capacity and availability. Being able to report which specific versions of a client you are operating can be crucially important in circumstances related to block propagation and construction. Having this information so that we can enrich network topology construction will be important for establishing protections against correlated time-level attacks, reducing fault correlations, clock sync issues, etc.

Agree with this.

1 Like

Ahh, yes, 100% agree. An attestation proof can be used to decouple such possible linking.

I am also wondering what the community thinks about cooperating with other pools (Rocket Pool as an example) and how this can be applicable cross chains (Solana as an example). Having said that I think we could suddenly find ourselves creating a generalized messaging layer rather than an API for ‘internal’ use.

1 Like