Jump to content

SkeinGraham

Member
  • Content Count

    9
  • Joined

  • Last visited

About SkeinGraham

  • Rank
    Newbie

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Thanks @mDuo13 - Initial investigations: The data volume where the rippled database resides on for this node is backed by a pair of NVME SSD drives configured as software raid1 (Linux MD). Raid status is good Manual intervention of stopping the rippled daemon, removing the database, and restarting rippled has bee unsuccessful. The problem re-occured quite quickly (at least within 30 minutes). The type of SSD is Samsung TLC V-NAND with Polaris era controller, reasonable but definately consumer grade SKU's; with one of them reporting significantly higher lifetime wear leveling usage via smartmon tools. Because the problem re-occured fairly quickly after manually removing the database, and the higher usage stats of one of the raid pair, I think it's reasonable to suspect an SSD hardware issue is the cause of this.
  2. Hi, On one of my stock nodes I'm seeing fatal exception every 30-60 seconds or so & daemon restart - looks like maybe DB corruption but thought I'd post this here to see if anyone has seen something similar before? 2019-Sep-12 14:26:48.244803814 SHAMap:WRN Corrupt node received 2019-Sep-12 14:26:49.957015801 Peer:WRN [067] onReadMessage from XXXX at x.x.x.x:40003: stream truncated 2019-Sep-12 14:26:50.404850371 NetworkOPs:WRN We are not running on the consensus ledger 2019-Sep-12 14:26:51.307654091 NodeObject:FTL Exception, lz4 decompress: LZ4_decompress_fast 2019-Sep-12 14:26:58.493346589 Application:NFO process starting: rippled-1.3.1
  3. SWIFT is a multi stakeholder organisation - as we know it's also much like the bank incumbents. Slow, lumbering, conservative. I can't think why people even thought an acquisition, let alone by a comparative dinosaur, would be on the table.
  4. https://xrptools.alloy.ee/checktoml?domain=skein.systems
  5. The checker input field doesn't seem to like the tld in `skein.systems` https://xrpl.org/xrp-ledger-toml-checker.html
  6. Very fair and valid points - fortunately operators in the current most trusted UNL offered to accept a peer connection from the validator. I suppose if I didn't trust these node operators then by extension I would also not trust the UNL. There's a healthy sceptiscim in not trusting everything, that's a given - but there's the operational costs as well - I could try to spin up more stock/proxies in the cluster and have the validator only peer with them. That might be fine at some point in the future but it's not ideal right now at this stage for me. I mentioned in the xrpl geek slack I didn't really want to be exclusionary with fiddling my firewall, and that's why each time the deamon makes the decision it's temporary. You're right though, it's more of a concern when the max_peers value is either default or low - raising that helps. On the flip side I wonder if gentle pressure on node operators to upgrade or endeavour to join concencus more often would benefit the network as a whole?
  7. I found the performance tweaking part fairly enjoyable (as long as you're prepared to be patient) - the docs were fairly clear on the hardware requirements and also clearly state that the disk i/o is important. I did of course make a few early mistakes such as lowball the `node_size` despite having the RAM to use large/huge configurations and playing with RocksDB params netted nothing of value. I think the one thing that sticks out to me is this. Anyone with experience of running sensitive systems would recognize that the cluster configuration where the validator uses a stock node in the cluster as it's proxy (and with peer_private = 1) is the most desireable configuration - however there's no advice in this scenario that suggests the validating node in the cluster should also connect to peers outside the cluster using entries in the [ips_fixed] section. I raise this point because after taking metrics and monitoring I was scratching my head for a few weeks trying to understand why the validator was falling behind concensus. This is where I pinged a few seasonsed operators for advice and then came to understand that the validator needs more than just one or two peers from the cluster to propose a ledger in good time. I think it would be good to indicate that the validators may also need additional peers if the cluster is smallish. One other thing I noticed - the popular peers/hubs are fairly busy with connections from lots of stock nodes. Because of that I had struggled with lots of insane/unknown peers ,and also peers that were old versions not behaving well connecting to my stock/proxy node. These were sometimes taking 40% or more of the peer connections. Rabbit pointed me at his ban hammer script for insane nodes that are out of concensus - I took some inspiration from that and wrote a daemon that temporarily firewall bans "unstable" peers, and also punts their connection. https://github.com/gnanderson/rbh This has helped my proxy/stock node maintain a very healthy list of peers which are are in concensus and also only current patch version -1 in age. Validator keys setup was fine, it's simpler than clustering nodes - took me less than 5 mins to regenerate new keys then generate a new token and restart the validator. I'll happily look at the capacity planning and setup instruction again and see where I can add some more detail or help. Is the new XRPL.org site published from this repo?: https://github.com/ripple/ripple-dev-portal > P.S. if you're looking for your next project, may I suggest setting up an xrp-ledger.toml file? I'll be all over that tomorrow
  8. Hello zerpers, After some patient months of tweaking and understanding the behaviour and perf characteristics of rippled, and now being happy with it's stability, I have switched my validator to a new public key and verified the operating domain. It's now listed as `skein.systems`: nHDpmRw3nYVWsbTrBaSScqHDQvNvnZJSAo7pxa3CQXbG571MVGHo I though it would be good form to introduce myself to put a (hopefully) friendly face to the validator/domain. I also like to public thank Rabbit, Alloy and Nik Bougalis for the extremely helpful advice!
×
×
  • Create New...