Ethereum Consensus Layer Client Evaluation

Purpose

The purpose of this project is to perform an external, objective and unbiased evaluation of the different Ethereum Consensus Layer Clients in order to create targets for client decentralization. The project will create a number of outcomes in the form of technical reports to provide transparent and reliable data to the ecosystem in order to educate node operators and drive decisions towards a more robust, secure and decentralized network.

Summary

We will evaluate the hardware resource consumption of all Ethereum Consensus Layer Clients as well as many other different metrics related to their network bandwidth consumption, number of peers, syncing speed, storage requirements, among many other metrics. We will then produce a detailed technical report describing our research methodology, showing all our findings and drawing conclusions. The technical report will be publicly available on the Internet for everyone to read. Moreover, we will write an easy-to-digest shorter post summarizing our main findings. We also plan to extend the technical report into a scientific paper and submit it to an international conference on blockchain technology. As customary, the scientific paper will explicitly acknowledge the funds received and the support organizations. We will also use social media to disseminate the results of our research.

Details

In our previous work we have demonstrated how consensus bugs could translate into catastrophic failures due to the lack of finality leading to extreme resource consumption. While finality bugs are not frequent, and the Beacon Mainnet has been running stable for over a year, this kind of studies do show why it is important to characterize the resource consumption of the different clients and to be able to spot outlier behavior. This also helps the ecosystem to understand the different trade-offs existing between different clients.

Experiments

There are a number of performance counters that we can analyze in this study and that will provide valuable information to the ecosystem. In addition to the previously studied performance counters (i.e., CPU usage, memory usage, disk usage, network bandwidth, synchronization time and number of peers) of the main five consensus clients (i.e., Prysm, Lighthouse, Teku, Nimbus and Lodestar) we will also perform a number of new experiments:

  • We will study the performance of the new available clients such as Grandine
  • We will also study the performance of validator clients such as Vouch
  • Study new combinations of Beacon Node - Validator couple
  • Compare multiple client modes (i.e., fast mode vs archival mode)
  • Analyze the different interoperability pairings between clients (e.g., Common metrics)

Methodology

It is important to define a clear and fair methodology for the study, as do not want the results to be biased in either positive or negative way. To guarantee fairness in our evaluation we will follow the next steps:

  1. We will contact each client team and discuss the ideal conditions (in terms of computer hardware requirements) in which their client should be executed.
  2. We will discuss the different configurations in which each client can be run (e.g., archival mode, fast mode) and test and report multiple configurations.
  3. When comparing clients we should make sure that they were tested in similar conditions, such as the same number of CPU cores, DRAM memory, etc.
  4. When reporting results, all details regarding hardware configuration, software environment, time and duration of the experiments will be reported.
  5. We will verify by multiple means that the numbers obtained are correct, for instance measuring CPU consumption internally and with an external script.
5 Likes

Thanks for proposing this, I think it’s a really good idea!

One caveat is that performance/storage numbers might be a “rounding error” for node operators because, all else equal, a higher attestation and block inclusion rate is what principally affects rewards. We’ve already worked on getting such numbers as part of a previous grant.

That said, given the anti-correlation penalties, and the broader worries about client diversity, I think framing this as “what are all the clients node operators should be comfortable using in production” can be very valuable, because then it can help switch towards minority clients.

On that note, one thing I’d suggest is to evaluate how easy it is to switch from/to various clients. That might be a valuable nudge to node operators.

4 Likes

Yes, absolutely. The idea is to move one step closer to client diversity.

Evaluating the switch from one client to another is a great idea. Thanks for the sugestion!

3 Likes

I like Tim’s idea about client switching research as well. Is that possible to incorporate into the proposed scope, and if so would it require any substantial change in the proposed budget?

Major points of consideration when choosing a client for a node operators are, aside from performance:

  • code quality, quality of maintenance, quality of incident response, quality of release management
  • key management capabilities (e.g. possibility and stability of separation of private keys handling logic from business logic; hardware keys support; threshold keys support)
  • richness and sanity of metrics
  • stability of the chosen toolchain
  • number and quality of dependencies
  • simplicity of incorporating into CI/CD pipelines

Combined, it matters much more than miniscule difference in performance. It’s all difficult to quantify, and inherently somewhat subjective, but an attempt to quantify these will probably be immensely valuable.

3 Likes

is there any way to integrate ethereum consensus into LDO token economics ?

Thanks for the input @vsh.

I agree with you in that many of those are hard to evaluate quantitatively, as most of those points are inherently subjective.

Also, for things like code quality and maintenance, CI/CD pipelines, stability of the toolchain, it would take substantial resources to have someone checking clients code and CI pipelines. I think this would kind of go a bit out of the scope.

For things like richness and sanity of metrics, as well as key management capabilities, we can provide some qualitative analysis and feedback to client teams.

1 Like

It’s not a blocker to get a grant approved - it already is. Just my two cents on what I think would be actually valuable to move the needle on client diversity.

1 Like