Hello Lido community,
we are a Lido NO (CryptoManufaktur) and run a Vouch/Dirk validator client to three diverse consensus:execution layer clients (Ligththouse:Erigon, Teku:Besu, Lodestar:Nethermind). In our testing, we noticed that sync committee performance was sub-optimal (around 90%) when the third client was Nimbus.
A fellow NO had also seen this, specifically when using Vouch as the validator client. This does not happen when using the Nimbus VC.
We replaced Nimbus with Teku, and performance recovered. Then we replaced Teku with Lodestar, and performance stayed good.
We’ve been talking to the Nimbus team. They are willing to help find the root cause of this interop issue. We cannot do that on Goerli, because Goerli is missing too many blocks for meaningful investigation - it only has about 80% participation to begin with.
We are proposing to find the issue on mainnet, so that the client teams involved - Nimbus and Attestant/Vouch - can work together to fix this issue, so it is fixed for everyone down the road. To that end, we’d take one of our environments with 1,000 keys where sync committee duties are at the start of their cycle, switch the Lodestar for Nimbus (leaving Lighthouse and Teku in place), and run with debug logs. Observe performance, and if it is degraded like it was before, then gather the logs, switch Nimbus back out for Lodestar, and work with the client teams. Repeat as necessary as fixes are proposed or more testing is required.
This will deliberately degrade sync performance on that environment. We usually see between one and two validators at any given time in a sync committee per environment, so that’s the blast radius: Reduced performance on two validators for one sync committee cycle, or a little less, depending on how quickly we see an issue, per testing run.
Any objections?