Jump to content
Professor Hantzen

History Sharding & Backfill Speed

Recommended Posts

The History Sharding documentation states: "...acquiring shards begins after synchronizing with the network and backfilling ledger history to the configured number of recent ledgers."

Is it correct to interpret this as meaning backfill speed for a given server would not be improved by enabling history sharding on that server?  The statement suggests only one acquisition process may be active at a time.  I was thinking in the case of wanting to speed up backfill, someone could enable a shard store of size larger than the total history.  But would this fill any faster than ordinary acquisition?  Presumably those ledgers would be coming from the same source either way?

Share this post


Link to post
Share on other sites
19 minutes ago, Professor Hantzen said:

Is it correct to interpret this as meaning backfill speed for a given server would not be improved by enabling history sharding on that server?

Improved with respect to what?

Because before sharding there was no backfilling at all. The servers were just saving the history from the moment they connected so they didn't have the past history at all.

Share this post


Link to post
Share on other sites
39 minutes ago, tulo said:

Improved with respect to what?

Because before sharding there was no backfilling at all. The servers were just saving the history from the moment they connected so they didn't have the past history at all.

As I understand it, the shard store and the server ledger history are kept in two separate db's.  When I read the documentation, I can't find a specific reference that shows these two databases will interact, other than across different nodes (though of course it would make sense they should also interact locally).  The specifics of the ordering in the quoted statement suggests to me that these two stores will never interact because if the server in question is configured to acquire all ledgers, it will first complete that process before launching the sharding process (and accessing its db). Ie, it will never launch the sharding process until it already has all the ledgers.

Share this post


Link to post
Share on other sites
6 minutes ago, Professor Hantzen said:

As I understand it, the shard store and the server ledger history are kept in two separate db's.  When I read the documentation, I can't find a specific reference that shows these two databases will interact, other than across different nodes (though of course it would make sense they should also interact locally).  The specifics of the ordering in the quoted statement suggests to me that these two stores will never interact because if the server in question is configured to acquire all ledgers, it will first complete that process before launching the sharding process (and accessing its db). Ie, it will never launch the sharding process until it already has all the ledgers.

I think the two databases are separated, i.e. one doesn't check if the other has already the ledgers, but when retrieving historical ledgers, rippled will check in both the shard and the normal history.

There is also this quote:

Quote

The ledger store history size should at minimum be twice the ledgers per shard, due to the fact that the current shard may be chosen to be stored and it would be wasteful to reacquire that data.

but it is not clear to me.

Share this post


Link to post
Share on other sites

Also this:

Quote

The retrieval process begins with the server checking for the data locally. For data that is not available, the server requests data from its peer rippled servers. Those servers that have the data available for the requested period respond with their history. The requesting server combines those responses to create the shard. The shard is complete when it contains all the ledgers in a specific range.

 

Share this post


Link to post
Share on other sites
On 4/13/2018 at 5:39 AM, tulo said:

Improved with respect to what?

Because before sharding there was no backfilling at all. The servers were just saving the history from the moment they connected so they didn't have the past history at all.

What about [ledger_history] = full?  I thought that was the way to create a full history server (setting online_delete=0) ?  I have noticed it does back fill on the mainnet (have not gone back very far) but when playing with XRPL in my lab (my own XRP network) it does not backfill at all with the same settings.  It only starts with the ledger from the network and moves on from there.  Any ideas about the discrepancy?

Share this post


Link to post
Share on other sites

Probably depends on the settings of the server(s) in your lab. Does at least one server even have all historic ledgers so it can serve them?

Share this post


Link to post
Share on other sites
7 minutes ago, rjremien said:

What about [ledger_history] = full?  I thought that was the way to create a full history server (setting online_delete=0) ?  I have noticed it does back fill on the mainnet (have not gone back very far) but when playing with XRPL in my lab (my own XRP network) it does not backfill at all with the same settings.  It only starts with the ledger from the network and moves on from there.  Any ideas about the discrepancy?

Did you put some options for the sharding? But it's a new stuff so I don't know anything about that...

And as Sukrim said you need at least some nodes with old ledgers, because they have to fetch them somewhere.

Share this post


Link to post
Share on other sites
On 4/13/2018 at 5:39 AM, tulo said:

Improved with respect to what?

Because before sharding there was no backfilling at all. The servers were just saving the history from the moment they connected so they didn't have the past history at all.

That is not accurate. Your server would grab history and backfill if you had configured it to do that.

Share this post


Link to post
Share on other sites
Posted (edited)
45 minutes ago, Sukrim said:

Probably depends on the settings of the server(s) in your lab. Does at least one server even have all historic ledgers so it can serve them?

Yes.  One of my servers have ledgers 5-XXXX (server_info).  Also, you always have 1 ledger before that so I can query ledger 4 successfully.  I noticed when I bootstrapped the blockchain the ledger advances from 0 to 3 (which seems normal).  When I restarted my full history server (adding more validators to the network) the server_info with the --load option only displays the last played ledger (e.g. reboot at ledger 1000) the server_info command displays ledger 1000-xxxx.  Although, the DB still contains full_history because I can query ledger 4 successfully.

The reason I bring up the server_info command (I believe it is the same output from peers section in regards to ledger history) is that I thought it used this to backfill the history.

So, if that is the case (reading from the complete ledgers field) a server cannot backfill if it cannot "see" the history from other servers?  @nikb

Edited by rjremien
Additional information

Share this post


Link to post
Share on other sites
10 hours ago, nikb said:

That is not accurate. Your server would grab history and backfill if you had configured it to do that.

Oh really? Also before sharding? Which is the configuration?

Share this post


Link to post
Share on other sites
5 minutes ago, Sukrim said:

server_history = full and remove the online_delete/advisory_delete stuff.

And what happens? It tries to fetch ledgers backward from random nodes?

Oh now I see it on the rippled.cfg, the standard is 256. Can that be the motivation for why it takes a few minutes (2-5) before being ready?

Share this post


Link to post
Share on other sites

×