Jump to content
r3lik

How to speed up XRP ledger sync?

Recommended Posts

Hello everyone,

I think this belongs in the Technical section, but I can't seem to post there as a new user. 

I'm operating a rippled server and pulling the entire ledger history, as I need access to all ledgers and transactions for a data app that we are building.

So far I've only been able to sync 38GB in 16.5h. Assuming the full history is about ~9TB (where can I find exact size?), that's only 0.42% and will take about 6months to sync at the current rate. This is unacceptable. 

Is there any way to significantly speed up the sync? My node is running in Amsterdam and has 16 cores, 128GB RAM, 24TB HDD and 20Gb/s NICs.

Here is my config:

[server]
port_rpc_admin_local
port_ws_public

# port_peer
# port_ws_admin_local
# ssl_key = /etc/ssl/private/server.key
# ssl_cert = /etc/ssl/certs/server.crt

[port_rpc_admin_local]
port = 5005
# allow from everywhere and restrict on network side
ip = 0.0.0.0
admin = 127.0.0.1
protocol = http

[port_ws_public]
port = 80
ip = 0.0.0.0
protocol = ws

# [port_peer]
# port = 51235
# ip = 0.0.0.0
# protocol = peer

# [port_ws_admin_local]
# port = 6006
# ip = 127.0.0.1
# admin = 127.0.0.1
# protocol = ws

[node_size]
huge

# tiny
# small
# medium
# large
# huge

[node_db]
type=rocksdb
path=/data
advisory_delete=0
open_files=2000
filter_bits=12
cache_mb=256
file_size_mb=8
file_size_mult=2

# How many ledgers do we want to keep (history)?
# Integer value that defines the number of ledgers
# between online deletion events
#online_delete=

[ledger_history]
# How many ledgers do we want to keep (history)?
# Integer value (ledger count)
# or (if you have lots of TB SSD storage): 'full'
full

[database_path]
/data

[fetch_depth]
full

[sntp_servers]
time.windows.com
time.apple.com
time.nist.gov
pool.ntp.org

[ips]
r.ripple.com 51235

[validators_file]
validators.txt

[rpc_startup]
{ "command": "log_level", "severity": "info" }

# severity (order: lots of information .. only errors)
# debug
# info
# warn
# error
# fatal

Any input is much appreciated! 
Thanks,

Share this post


Link to post
Share on other sites

I believe WietseWind did fetch the whole history. I believe he might have public api for that too. You should probably ask him, but my guess it that you can't speed it up. I guess the limiting factor is that only few nodes exists with full history.

Share this post


Link to post
Share on other sites

RocksDB won't work for full history and yes, it'll take half a year or more at least to sync from the network. https://github.com/ripple/rippled/issues/2688 could help, but as you see there's not much interaction. I don't want to generate/share historic shard files (which rippled would be able to download via http(s) and import since a recent version) unless they are verifiable and deterministic, since a lot of stuff can happen.

An alternative would be to reach out to Ripple (good luck!) or a community member like Wietse or me who operate full history servers to get a copy of their nodestore file and use it for import. This has the risk for you that we might have bad/malicious data in there and for us that you then know the salt value for our database files, which allows you to specifically target our nodestore with algorithmic attacks. I used to offer to order a large HDD from Amazon, format it, put a fulll nodestore file on there and ship it worldwide for 1 BTC - so far nobody has taken the offer though (might be related to the price of 1 BTC measured in EUR/USD I guess).

Opening up the peer port would probably get you more peers, though most of them just leech and don't serve deep history. I usually looked at the peers lists from time to time and added IPs that served full history to my config file to have a higher chance of useful peers.

You do have ~10TB of (fast) SSDs in that server too, right? HDDs won't be able to handle this.

Share this post


Link to post
Share on other sites

Last I checked all "decentralized blockchain storage" stuff could only store encrypted, private files and wasn't useful for hosting public data. IPFS can host public data and if you want to get fancy, you could host a gateway on Codius (but a local one or using the cloudflare one makes more sense...). It doesn't guarantee availability though and incentivizing seeds via their "filecoin" solution is still not available.

The bigger issue is which files to actually host there, that's what the bug report on Github tries to address because at the moment rippled can only share shard files and these are currently not really verified to be correct. It is also hard to generate them, since everyone who would create them would create a different version that's incompatible to everyone else's and whose contents are ordered differently.

Share this post


Link to post
Share on other sites

@r3lik You will absolutely have to switch to fast SSD storage for the entire ledger if you want your node to work for any purpose, even just fetching.  While RocksDB may appear to work fine at the beginning, you will arrive at all kinds of spurious problems as your stored history increases.  And yes, it will likely take at least 6 months to get the full store, no matter how you configure anything.  Unfortunately, there is as yet no deterministic, canonical data storage format for ledger history.  Until there is, we all share this problem.

Share this post


Link to post
Share on other sites

Thanks for the responses so far! A few more questions:

Is there a way to leverage sharding (https://developers.ripple.com/history-sharding.html) to get ledger data faster? I'm unclear whether this would be any faster than just getting ledger data from peers. 

Has anyone been able to download a copy of the nodestore file and import it successfully?

@Sukrim I'm not using 10TB of SSD. The price goes up significantly. What is the primary concern here?

Thanks much

Share this post


Link to post
Share on other sites

SSDs are necessary for fast random access. The node store is just a big key-value map and keys are randomly distributed. NuDB is more bare-bones and has the full index on disk, RocksDB tries to be more sophisticated, but can/will write your disk to death in the process while compacting and will likely eat too much RAM for its cache. Here's a thread by @CapnKelp who wanted to fetch full history onto an HDD array with SSD caching, unfortunately he never followed up if it actually worked: 

History sharding can of course help (you'd have more peers on the network serving history), but I don't think that many people have switched it on. Importing a full nodestore works, but has no progess indicator, requires both(!) databases to be on SSD and will take a while (though still much faster than syncing from scratch over the network of course). I'm not sure if importing also updates your ledger.db and transaction.db files - I hope that it does.

Share this post


Link to post
Share on other sites
11 hours ago, r3lik said:

@Sukrim I'm not using 10TB of SSD. The price goes up significantly. What is the primary concern here?

SSD's are simply (and unfortunately) the price of entry for running a full history rippled server at the moment. If you get it working without needing them, please shout from the rooftops exactly what you did, but also... good luck, seriously.

Share this post


Link to post
Share on other sites

Is it possible to sync a specific range of ledgers? I'm ingesting a lot of the data into a db, so I could do this in chunks rather than attempting to sync 8TB. 

Share this post


Link to post
Share on other sites
2 hours ago, r3lik said:

Is it possible to sync a specific range of ledgers? I'm ingesting a lot of the data into a db, so I could do this in chunks rather than attempting to sync 8TB. 

Kind of.  You can change ledger_history from "full" to an integer value n representing how many "current_ledger minus n" ledgers you want rippled to fetch & hold. After its reached that limit, you can stop rippled, extend the value further, and start it again.  However, I believe this won't really help.  For one, by forcing rippled to only a limited range, it may pull slower as any servers able to offer ledgers outside of that range will not be taken advantage of.  Further, restarting involves two delays. The first is, rippled will have to go through the store it has to date and figure out what ledgers it has (though actually in practice it doesn't really know even after this process, during which it just kind of "glances").  Even so, this process can take anywhere from a few minutes to hours or longer depending on your machines storage format and how many ledgers it has.  The second delay is that the amount of initial downtime in restarting, plus the amount of time rippled spent figuring out how many ledgers it has, is all time that its not been collecting new ledgers that are being closed by the network every 3.5 seconds.  So, after doing the tally, it has to fetch all the ledgers its missed during that time and after that, its ready to go again.  So, big delay introduced in multiple restarts, and probably slower acquisition as well.

There may well be another way to force a specific range, as there are various queries you can run and maybe depending how you do it something could cause rippled to go off and fetch the ledgers its missing - I'm not sure.  But again, I wouldn't see an advantage, unless there's something I'm missing.

Share this post


Link to post
Share on other sites
Posted (edited)

@Professor Hantzen I'm looking for something that will allow me to specify the range without having to sync the most recent ledgers again. This is because I already have the current ledger data in mysqlDB (I'm using this to run some calculations to generate other data). I'd like to be able to then delete the ledgers that I already have, set a new range, sync those, import them to my MySQL db, rinse and repeat. 

If I just increment ledger_history, it will still start from the most recent ledger and work backwards, but I don't want to constantly sync the current ones. Instead, I want the full history but in ranges that I specify.

This would allow me to consume all of the XRP ledger data one chunk at a time without needing to have 8TB of provisioned storage (expensive). 

Is this possible? 

Thanks for all your help!

Edited by r3lik

Share this post


Link to post
Share on other sites

You could maybe try the https://developers.ripple.com/ledger_request.html RPC call in a loop on each of these ledgers until no error is reported for them?

Alternatively, history sharding still seems to be worked on and there might be a way to do what you want in the future or you could take up my 1 BTC offer and get an HDD shipped to your doorstep.

Share this post


Link to post
Share on other sites

×