I wanted to update you on plans for Launchnodes to migrate back to its bare metal infrastructure in Africa, following the outage and issues experienced last year.
We have carefully rebuilt our infrastructure with an improved architecture designed to further reduce points of failure, and provide additional robustness and security.
Over the past 2 months, in addition to the feedback and support we have had from Lido operators and the broader community, we have had support from 3rd parties to examine our setup prior to go-live, and highlight areas for improvement. To date, the following have been flagged:
- 10 ‘High Severity’ risk issues
- 11 ‘Medium Severity’ risk issues
- 11 ‘Low Severity’ risk issues
- 2 ‘Informational’ issues
As examples:
One ‘High Severity’ issue flagged was traffic from our internal Grafana servers not being SSL encrypted.
A ‘Medium Severity’ issue was not using SSO across all of our infrastructure.
A ‘Low Severity’ issue was not logging all traffic and connections, which would be useful for diagnostics and forensics.
(These and the majority of other recommendations have been implemented).
Resolutions have been implemented for all ‘High’ severity issues. Resolutions for 8 of the ‘Medium’ severity issues have been implemented, with the remaining 3 acknowledged by Launchnodes for further investigation and future planned updates. Resolutions have been applied for 10 of the 11 ‘Low’ Severity issues, with 1 issue acknowledged.
Issues have typically been ‘acknowledged’ for non-critical issues, where the recommendation is understood, however further investigation or planning/testing is required for the recommendation to be fully implemented. For example, we are cautious in implementing some automation of processes/server re-starts, and to set hard limits for some parameters until we have monitored production systems for some weeks at least. Where a resolution might have an impact on node performance or uptime, we have been particularly careful.
We are conscious that our environments and software will continue to change and evolve, and we regard this as a journey of continous improvement. We are planning further reviews and optimizations with our 3rd party specialists in the coming days and weeks. We have also been successfully running multiple nodes on Holesky in our updated environments.
The team is confident in moving back to our bare metal data centres, and aiming to start this process from 20th March 2024. We plan to move across a small number of nodes, monitor these for 7 days, before migrating our remaining nodes to the new environment. The old environment and web3signer will be destroyed before nodes are migrated and re-started.
This timing has been chosen to provide a good period of stability after Dencun and a decent period of monitoring unaffected by major events or public holidays.
Our monitoring and alerting will continue to operate 24x7, with our global team on standby throughout.
Please reach out if you have any feedback or would like more information, happy to discuss in more detail.
Many thanks as always for your patience and support - We’ll update you with progress as this migration takes place and ongoing. 