I found the performance tweaking part fairly enjoyable (as long as you're prepared to be patient) - the docs were fairly clear on the hardware requirements and also clearly state that the disk i/o is important. I did of course make a few early mistakes such as lowball the `node_size` despite having the RAM to use large/huge configurations and playing with RocksDB params netted nothing of value.
I think the one thing that sticks out to me is this. Anyone with experience of running sensitive systems would recognize that the cluster configuration where the validator uses a stock node in the cluster as it's proxy (and with peer_private = 1) is the most desireable configuration - however there's no advice in this scenario that suggests the validating node in the cluster should also connect to peers outside the cluster using entries in the [ips_fixed] section.
I raise this point because after taking metrics and monitoring I was scratching my head for a few weeks trying to understand why the validator was falling behind concensus. This is where I pinged a few seasonsed operators for advice and then came to understand that the validator needs more than just one or two peers from the cluster to propose a ledger in good time. I think it would be good to indicate that the validators may also need additional peers if the cluster is smallish.
One other thing I noticed - the popular peers/hubs are fairly busy with connections from lots of stock nodes. Because of that I had struggled with lots of insane/unknown peers ,and also peers that were old versions not behaving well connecting to my stock/proxy node. These were sometimes taking 40% or more of the peer connections. Rabbit pointed me at his ban hammer script for insane nodes that are out of concensus - I took some inspiration from that and wrote a daemon that temporarily firewall bans "unstable" peers, and also punts their connection.
This has helped my proxy/stock node maintain a very healthy list of peers which are are in concensus and also only current patch version -1 in age.
Validator keys setup was fine, it's simpler than clustering nodes - took me less than 5 mins to regenerate new keys then generate a new token and restart the validator.
I'll happily look at the capacity planning and setup instruction again and see where I can add some more detail or help. Is the new XRPL.org site published from this repo?: https://github.com/ripple/ripple-dev-portal
> P.S. if you're looking for your next project, may I suggest setting up an xrp-ledger.toml file?
I'll be all over that tomorrow