Horizon will give an error similar to
Gap detected in stellar-core database. Please recreate Horizon DB.
Horizon and core run independently of each other.
core produces meta data information that is then imported by Horizon.
Gaps occur when for some reason, Horizon doesn't find data for a ledger that it didn't import yet.
This can happen for a few reasons.
core uses "cursors" to ensure that garbage collection doesn't delete data needed by consumers.
If cursors have not been configured, what can happen is that before Horizon has a chance to set a cursor, core's garbage collection can run and delete some data that Horizon expects.
Solution is to properly define cursors in core's configuration:
KNOWN_CURSORS=["HORIZON"]
When running core with partial history CATCHUP_COMPLETE=false
core's policy is to keep up to the network with a contiguous tail of at least CATCHUP_RECENT
ledgers.
As a consequence, if core is taken offline for any reason, when it's powered back on, its goal is to catchup to the current ledger N
from the network as quickly as possible by replaying ledgers that will include all ledgers from N-CATCHUP_RECENT
up to N
.
If core is offline longer than roughly CATCHUP_RECENT
* 5 seconds (the average time between ledgers), it's possible that it will not replay certain ledgers.
Horizon on the other hand expects to replay all ledgers passed its initial ledger.
Here is an example to illustrate all this.
Assume Horizon started to ingest at ledger 10,000, it therefore expects core to emit data for all ledgers past 10,000.
core is configured with CATCHUP_RECENT=1024
(roughly 85 minutes of tail ledgers).
everything is running fine until one day at ledger 500,000 core is taken down for a day. At this point we have:
- core's latest ledger is 500,000
- Horizon ingested ledger 500,000 (or something like 499,999 if it was a bit behind)
Later, core is taken back online. The network happens to be at ledger 520,000 (that's ~27 hours later), which causes core to
- catchup starting at 520,000-1024 = 518,976 (in reality right before that)
- replay ledgers up to 520,000 ; at this point it's in "Synced!" state
- close ledgers 520,001 and so on with the network
Horizon was at 500,000 but now sees ledger 518,976 ... where is 500,001?
It panics with
Gap detected in stellar-core database. Please recreate Horizon DB.
Some bugs can cause Horizon to get confused during ingestion.
In many cases resetting your instance is the simplest way to recover: the data will be reconstructed from history.
It's a good occasion to double check that your configuration in both core and Horizon are consistent with each other.
Pros:
- recovery is very likely as you will be importing data from a clean state
- data in Horizon is reconstructed with the latest code, providing the most detailed
Cons:
- importing a lot of ledgers can take a long time
- core and Horizon are not available during the recovery process
Note: if core is configured with CATCHUP_COMPLETE=true
you can either switch your node to partial history (as it was already ingested by Horizon) or reset everything.
- find out at what is the latest ledger (let's refer to it as
MY_LEDGER_NUMBER
) that Horizon ingested- you can get this by calling the '/metrics' endpoint on the Horizon server, the field is history.latest_ledger
- for example for the SDF public network this data is at https://horizon.stellar.org/metrics
- stop core and horizon
- look at the value of history.latest_ledger for the public network, SDF https://horizon.stellar.org/metrics
- this value will be somehow larger than
MY_LEDGER_NUMBER
, substract the two to get the gap between your instance and SDF's - multiply this number by some safety factor (how long do you think you will need to get back in business) to be safe you can use 10 as a ratio.
- this value will be somehow larger than
- in your core config:
- set CATCHUP_RECENT to that computed value, your configuration will look something like
CATCHUP_COMPLETE=false
CATCHUP_RECENT=50000
- set cursors if not set already
KNOWN_CURSORS=["HORIZON"]
- call "newdb" on the core instance to reset core's database
- start up stellar-core normally, wait for it to be in "Synced!" state
- start up Horizon, it should be able to see that core has data to ingest