Validator pruning missed checked points due to problems with several RPC URLs + Heimdall eventually froze

Hi,

We had a scheduled bor pruning on the Validator Sunday morning and we normally use as bor_rpc_url the polygon-rpc.com rpc but it had intermittent errors and missed CPs.

curl https://polygon-rpc.com/ -X POST --data '{"jsonrpc":"2.0","method":"eth_getBlockByNumber","params":["finalized", false],"id":1}' -H "Content-Type: application/json"

{"jsonrpc":"2.0","id":1,"error":{"code":-39001,"message":"Unknown block"}}

We then tried paid 3rd party APIs like infura, ankr, etc and we were getting

{"jsonrpc":"2.0","error":{"code":-32002,"message":"rejected due to project ID settings"}}

We eventually just used a sentry’s bor rpc url throughout the rest of the pruning which worked smoothly without missing any CPs.

Polygon team is aware of this behavior with 3rd party RPCs and will investigate.

In addition, during the several heimdall restarts from changing bor_rpc_url a few times, the heimdall eventually got stuck and would not catch up.

curl localhost:26657/status

catching_up was always true and it would not update - even after restarting heimdall.

The way we fixed this was by restarting all our sentry heimdalls at the same time.

TLDR: during validator pruning it is recommended to use an internal sentry’s bor rpc url

Thanks

2 Likes

At ShardLabs, we settled to process substantially same as yours. We do not use any 3rd party provider, but use our own sentry nodes.

  • Prepare configs for validator heimdall (changing bor endpoint to sentry), validator bor (disable mining), and sentry bor(enable mining).
  • Stop validator bor
  • Restart sentry bor (which has now validator heimdall connected and is mining)
  • Prune validator bor and sync validator bor (now with mining disabled)
  • Prepare configs as in step 1.
  • Restart sentry bor
  • Restart validator bor

I am writing this without referencing our internal checklists, but that is pretty much it. All these restart steps are done immediately after checkpoint so probability of losing next one is minimal. This has worked pretty well for us.

4 Likes

We just use now one of our sentry’s bor :slight_smile:

Without altering the mining - are you saying you are making your sentry be a validator temporarily ? are you including keystore and other files required to convert a sentry into a validator - at least for the bor?

2 Likes

Yes. That sentry node is essentially backup validator and we treat it as such security wise. Peering configuration is also setup as validator so it is connected only to trusted nodes. That way it can take validator duties at moment’s notice.

3 Likes

Got it - we plan to do the same

The steps involved to convert sentry to validator (at least since 2023+)

a) copy priv_validator_key.json

b) Bor config set mine to true

c) set the trusted peers

And I believe toggling mine is the only flag needed to change assuming Heimdall points to this Bor url

4 Likes