Replacing the above post, here is a revised and more detailed review request.
DAO Review Request — Galaxy Infrastructure Update
TL;DR
Galaxy is using the CMv2 migration window to consolidate its Lido validators onto its internal SOC 2 Type II compliant staking platform. This moves our validator and signing stack from Vouch/Dirk to Lighthouse + Web3signer, reduces operational overhead and total cost of ownership, and frees up engineering time in Galaxy to actively contribute to the reliability and tooling improvements. This is not a maintenance migration, it is a deliberate step toward becoming the kind of node operator that is top in performance, reliability, and open ecosystem contribution.
Background & Our Commitment to This Community
Galaxy’s participation in the Lido curated module operator set grew significantly through its July 2024 acquisition of CryptoManufaktur (CMF), which brought CMF’s engineering team and their established Lido infrastructure into Galaxy’s organization. At the time of that acquisition, Galaxy publicly committed to discussing any material infrastructure change with the DAO first. In the spirit of that agreement, this post outlines a change we are planning to make.
We want to be direct about something we know will be a concern: the CMF operator set was originally onboarded under a specific infrastructure setup built around Vouch and Dirk. We are proposing a change to that setup. We believe this is the right decision for performance, reliability, and long-term operational sustainability and we want to explain exactly why.
Galaxy’s goal within Lido is not simply to be a reliable operator running in the background. We want to be one of the operators that defines what professional node operation looks like in this ecosystem by contributing to protocol reliability tooling, publishing findings the broader operator set can learn from, and actively helping Lido’s validator infrastructure become more resilient.
Why are we migrating?
When CMF adopted Vouch and Dirk, it made sense. We’re not migrating away from them because they’re bad, we’re moving toward a stack the Galaxy ETH Staking Team already runs, already have configuration for, and already know how to operate. Less to maintain, same capability.
Our new configuration moves to Lighthouse + Web3signer as our validator client and remote signer which provides the same security separation as Vouch/Dirk: key isolation, slashing protection, and remote signing. Currently we are actively monitoring Lighthouse changes and improvements including but not limited to the circuit breaker feature (PR #8445) we look forward to test and potentially invest time in. Also to note, currently Web3signer covers our SOC 2 Type II audit logging and key management requirements.
Infrastructure Summary
Platform
All infrastructure runs on Galaxy’s internal SOC 2 Type II compliant staking platform, hosted on bare metal servers across three geographic regions: APAC, EU, and US. The platform is operated by our dedicated ETH Staking team under full Blockchain Engineering SRE coverage, with alerting, incident response, and on-call rotations spanning our global footprint. The operator management, on-chain configuration, and any on-chain actions will remain with the Validator team.
Validator & Signing Layer
The validator cluster is fully isolated. Lighthouse runs as the validator client; Web3signer runs as the remote signer. The cluster connects to full nodes over segmented network layers using HTTPS and firewall access controls. No other workloads share this environment or the nodes. The isolation and segregation is performed for reliability and performance requirements.
Consensus / Execution Client Pairs
Currently, all of our full nodes run MEV-Boost with OFAC compliant relays. For client diversity, we use Besu/Teku as our primary node combination. This yields strong attestation performance, operational familiarity and a deliberate contribution to our diversity while maintaining reliability. In addition to the Besu/Teku node, we run two additional backup nodes per cluster for redundancy and DR.
Twice a year, we evaluate the ETH Client Diversity data and monitor the percentage for each client so we can add/remove clients seeking less concentration and potential security against client bugs. The December 2025 Prysm/Fusaka incident, where a single client bug nearly cost Ethereum finality, is a good reminder of why this matters.
Pod Architecture & Blast Radius Management
Our validators are organized into unique pods, with a sizing designed to limit blast radius. A failure in any single pod — whether from a software bug, hardware issue, or upgrade complication is contained within that pod and does not affect the rest of our validator set. Resource sizing is deliberately conservative relative to maximum capacity.
CMv2, Key Generation, and What Is Changing
CMv2 requires all Curated Module Node Operators to provision new 0x02 validator keys as part of the protocol-mandated migration to the new module structure. This is not a discretionary key rotation — it is participation in the same migration process every node operator in the curated set will be undertaking.
We are using this required key generation window to bring our new keys onto our internal platform from inception, rather than re-instantiating the Vouch/Dirk stack for a key set that will exist on the new architecture for years to come. CMv2 is the right boundary to make this transition
We understand the DAO’s prior concern about key rotation — specifically the precedent involving an operator that cycled all of its keys outside of a structured migration context. To be explicit: this situation is categorically different. There is no voluntary exit and re-entry here. Our existing validators will run through the standard CMv2 consolidation process, and new 0x02 keys will be generated as part of that same protocol-standard flow.
Circuit Breaker Roadmap
Galaxy is keeping a close monitoring on this area, and wants to go beyond “we’ll test it when it’s ready.” We believe the circuit breaker is one of the most important reliability primitives available to the curated operator set, and we want to actively contribute to making it production-ready.
Currently, there are two approaches are in active development:
Lighthouse PR #8445 (Multi-node attestation consensus)
A change that adds circuit breaker logic natively to the Lighthouse validator client. It works by querying all connected beacon nodes at the start of each epoch and only enabling attestation if source and target checkpoints match across a configurable threshold of nodes. We will be evaluating it more closely and analyzing code to see if we can collaborate with the developers.
Vero
Vero is a production-ready multi-node validator client that cross-checks attestation data across multiple consensus and execution client implementations before submitting. Vero v1.0 shipped in April 2025 and is in active production use. It provides a cross-client circuit breaker available today, with slashing detection that halts all validator duties immediately if any managed validator is slashed. We have been closely monitoring the development of this client and will be performing some tests in our lower environments to review its functionality, reliability and performance.
These two approaches are complementary, not competing: Vero provides a cross-client safety layer available now; the Lighthouse PR provides native circuit breaking within Lighthouse’s own multi-BN setup once it matures. We want to provide an excellence as a Node Operator for our customers and clients; thus we take every investigation and analysis with rigor. Our teams need to be able to construct monitoring, alerts and configurations as well as actively monitor releases to confirm they are production ready.
Future Contributions & Our Commitment to This Ecosystem
Galaxy’s collaboration within Lido is not to be a quiet, reliable operator. We want to be an operator that is known for performance, reliability tooling, and open ecosystem contribution. That means:
-
Being early on safety tooling. Not waiting for the circuit breaker to be production-ready and then adopting it. Getting into the PR, running it on testnet, publishing what we find, and helping accelerate its maturity.
-
Contributing back publicly. Any findings from our Hoodi testing, threshold tuning, or Vero evaluation will be shared openly on the research forum. Other operators should be able to benefit from our testing, not just Galaxy.
-
Treating client diversity as a first-order commitment. We chose our primary beacon combination based on network-level diversity goals, not just our own operational convenience. Galaxy running Besu/Teku primary is a small but real contribution to keeping Ethereum healthy.