DAO Review Request - Galaxy Infrastructure Update

Galaxy would like to move our Lido validators to our internal SOC2 compliant staking platform during the CMv2 migration. This new environment uses Lighthouse and Web3signer for the validator client and signing infrastructure. Our validators can be split between APKR, EU, and US regions or all in a single region, depending on recommendation from the foundation.

We are posting this to get DAO approval to move forward with our planned migration during the CMv2 upgrade.

Validator Summary

  • Lighthouse as our validator client

  • Web3signer

  • MEV boost

RPC Endpoint Summary

We have redundant beacon endpoints available to our cluster. Consensus and Exec client pairs are hosted on baremetal servers in the 3 regions listed above and are as follows:

  • Teku/Besu

  • Lighthouse/Geth

  • Lighthouse/Nethermind

We are currently reviewing our use of the above clients and are looking to diversify some of the higher use clients we have.

Managing Team

The infrastructure management will move from our existing Validator team to our Platform team which are both under the same team umbrella internally.

The operator management, onchain configuration, and communication/data submission with Lido will remain with the Validator team.

Future initiatives

  • We want to help along the Lighthouse Circuit Breaker implementation (PR) in any way we can. We are volunteering to be early testers when it becomes available on both Hoodi and Mainnet.

  • Our platform team will work to diversify our consensus and execution clients available.

Replacing the above post, here is a revised and more detailed review request.

DAO Review Request — Galaxy Infrastructure Update

TL;DR

Galaxy is using the CMv2 migration window to consolidate its Lido validators onto its internal SOC 2 Type II compliant staking platform. This moves our validator and signing stack from Vouch/Dirk to Lighthouse + Web3signer, reduces operational overhead and total cost of ownership, and frees up engineering time in Galaxy to actively contribute to the reliability and tooling improvements. This is not a maintenance migration, it is a deliberate step toward becoming the kind of node operator that is top in performance, reliability, and open ecosystem contribution.

Background & Our Commitment to This Community

Galaxy’s participation in the Lido curated module operator set grew significantly through its July 2024 acquisition of CryptoManufaktur (CMF), which brought CMF’s engineering team and their established Lido infrastructure into Galaxy’s organization. At the time of that acquisition, Galaxy publicly committed to discussing any material infrastructure change with the DAO first. In the spirit of that agreement, this post outlines a change we are planning to make.

We want to be direct about something we know will be a concern: the CMF operator set was originally onboarded under a specific infrastructure setup built around Vouch and Dirk. We are proposing a change to that setup. We believe this is the right decision for performance, reliability, and long-term operational sustainability and we want to explain exactly why.

Galaxy’s goal within Lido is not simply to be a reliable operator running in the background. We want to be one of the operators that defines what professional node operation looks like in this ecosystem by contributing to protocol reliability tooling, publishing findings the broader operator set can learn from, and actively helping Lido’s validator infrastructure become more resilient.

Why are we migrating?

When CMF adopted Vouch and Dirk, it made sense. We’re not migrating away from them because they’re bad, we’re moving toward a stack the Galaxy ETH Staking Team already runs, already have configuration for, and already know how to operate. Less to maintain, same capability.

Our new configuration moves to Lighthouse + Web3signer as our validator client and remote signer which provides the same security separation as Vouch/Dirk: key isolation, slashing protection, and remote signing. Currently we are actively monitoring Lighthouse changes and improvements including but not limited to the circuit breaker feature (PR #8445) we look forward to test and potentially invest time in. Also to note, currently Web3signer covers our SOC 2 Type II audit logging and key management requirements.

Infrastructure Summary

Platform

All infrastructure runs on Galaxy’s internal SOC 2 Type II compliant staking platform, hosted on bare metal servers across three geographic regions: APAC, EU, and US. The platform is operated by our dedicated ETH Staking team under full Blockchain Engineering SRE coverage, with alerting, incident response, and on-call rotations spanning our global footprint. The operator management, on-chain configuration, and any on-chain actions will remain with the Validator team.

Validator & Signing Layer

The validator cluster is fully isolated. Lighthouse runs as the validator client; Web3signer runs as the remote signer. The cluster connects to full nodes over segmented network layers using HTTPS and firewall access controls. No other workloads share this environment or the nodes. The isolation and segregation is performed for reliability and performance requirements.

Consensus / Execution Client Pairs

Currently, all of our full nodes run MEV-Boost with OFAC compliant relays. For client diversity, we use Besu/Teku as our primary node combination. This yields strong attestation performance, operational familiarity and a deliberate contribution to our diversity while maintaining reliability. In addition to the Besu/Teku node, we run two additional backup nodes per cluster for redundancy and DR.

Twice a year, we evaluate the ETH Client Diversity data and monitor the percentage for each client so we can add/remove clients seeking less concentration and potential security against client bugs. The December 2025 Prysm/Fusaka incident, where a single client bug nearly cost Ethereum finality, is a good reminder of why this matters.

Pod Architecture & Blast Radius Management

Our validators are organized into unique pods, with a sizing designed to limit blast radius. A failure in any single pod — whether from a software bug, hardware issue, or upgrade complication is contained within that pod and does not affect the rest of our validator set. Resource sizing is deliberately conservative relative to maximum capacity.

CMv2, Key Generation, and What Is Changing

CMv2 requires all Curated Module Node Operators to provision new 0x02 validator keys as part of the protocol-mandated migration to the new module structure. This is not a discretionary key rotation — it is participation in the same migration process every node operator in the curated set will be undertaking.
We are using this required key generation window to bring our new keys onto our internal platform from inception, rather than re-instantiating the Vouch/Dirk stack for a key set that will exist on the new architecture for years to come. CMv2 is the right boundary to make this transition
We understand the DAO’s prior concern about key rotation — specifically the precedent involving an operator that cycled all of its keys outside of a structured migration context. To be explicit: this situation is categorically different. There is no voluntary exit and re-entry here. Our existing validators will run through the standard CMv2 consolidation process, and new 0x02 keys will be generated as part of that same protocol-standard flow.

Circuit Breaker Roadmap

Galaxy is keeping a close monitoring on this area, and wants to go beyond “we’ll test it when it’s ready.” We believe the circuit breaker is one of the most important reliability primitives available to the curated operator set, and we want to actively contribute to making it production-ready.
Currently, there are two approaches are in active development:

Lighthouse PR #8445 (Multi-node attestation consensus)

A change that adds circuit breaker logic natively to the Lighthouse validator client. It works by querying all connected beacon nodes at the start of each epoch and only enabling attestation if source and target checkpoints match across a configurable threshold of nodes. We will be evaluating it more closely and analyzing code to see if we can collaborate with the developers.

Vero

Vero is a production-ready multi-node validator client that cross-checks attestation data across multiple consensus and execution client implementations before submitting. Vero v1.0 shipped in April 2025 and is in active production use. It provides a cross-client circuit breaker available today, with slashing detection that halts all validator duties immediately if any managed validator is slashed. We have been closely monitoring the development of this client and will be performing some tests in our lower environments to review its functionality, reliability and performance.

These two approaches are complementary, not competing: Vero provides a cross-client safety layer available now; the Lighthouse PR provides native circuit breaking within Lighthouse’s own multi-BN setup once it matures. We want to provide an excellence as a Node Operator for our customers and clients; thus we take every investigation and analysis with rigor. Our teams need to be able to construct monitoring, alerts and configurations as well as actively monitor releases to confirm they are production ready.

Future Contributions & Our Commitment to This Ecosystem

Galaxy’s collaboration within Lido is not to be a quiet, reliable operator. We want to be an operator that is known for performance, reliability tooling, and open ecosystem contribution. That means:

  • Being early on safety tooling. Not waiting for the circuit breaker to be production-ready and then adopting it. Getting into the PR, running it on testnet, publishing what we find, and helping accelerate its maturity.

  • Contributing back publicly. Any findings from our Hoodi testing, threshold tuning, or Vero evaluation will be shared openly on the research forum. Other operators should be able to benefit from our testing, not just Galaxy.

  • Treating client diversity as a first-order commitment. We chose our primary beacon combination based on network-level diversity goals, not just our own operational convenience. Galaxy running Besu/Teku primary is a small but real contribution to keeping Ethereum healthy.

3 Likes

Thanks Tony and team for the revised post - the additional detail is appreciated, especially around the pod architecture, circuit breaker roadmap, and the planned Vero evaluation.

A few points I want to flag for the community’s consideration:

Context on the change:

When the CryptoManufaktur acquisition was brought to the DAO, the original post stated that infrastructure would remain as-is and the same team would continue to maintain it. This proposal represents a change - both in the validator client stack (Vouch/Dirk to Lighthouse/Web3signer) and in the team operating the infrastructure given the departure of most of the Cryptomanufaktur team members. Galaxy has been very transparent about this in both discussion with Lido DAO NOM contributors and via this update.

Client diversity:

While Lighthouse currently represents roughly half of the network’s consensus layer usage, it’s worth noting that the migration here is limited to the validator client layer - Galaxy will continue running multiple EL/CL pairs with Besu/Teku primary, and Lighthouse/Geth and Lighthouse/Nethermind as backup. The risk profile from a VC-only Lighthouse deployment is meaningfully different from running Lighthouse as both VC and BN.

That said, this is still a move toward the dominant client on one layer, and it’s encouraging to see Galaxy committing to both testing the Lighthouse circuit breaker PR and evaluating Vero as complementary mitigations.

Overall:

It is helpful to also understand that the proposed migration would take place during the CMv1 to CMv2 migration via the use of consolidations to the new infrastructure setup. This is generally a very secure way to conduct such a migration, and in general if a team were to make a significant infrastructure change, this is a very opportune time to do so.

The commitments around circuit breaker testing, Vero evaluation, and client diversity monitoring are meaningful. I think publishing the Hoodi test results and circuit breaker findings on the forum as outlined would be very helpful for the entire operator set, and continue to showcase the technical leadership that we have seen historically from Galaxy/Cryptomanufaktur as a member of the Lido Curated Set.

I’d welcome others to weigh in on the proposal, and will be sharing this with the Curated Module Committee for any other feedback.

1 Like