Skip to content

Instantly share code, notes, and snippets.

@yyforyongyu
Created September 20, 2022 04:17
Show Gist options
  • Save yyforyongyu/d345ac4a9bca0740c1c23947214e5768 to your computer and use it in GitHub Desktop.
Save yyforyongyu/d345ac4a9bca0740c1c23947214e5768 to your computer and use it in GitHub Desktop.

The issue

An entry with LogIndex: 1009, ID: 670, Type: Settle cannot be reloaded to localUpdateLog because its parent Add entry(htlcCounter=670) cannot be found in remoteUpdateLog.

This entry is loaded from db using bucket remoteUnsignedLocalUpdates, and it's used to settle an incoming HTLC sent by the remote. By looking at the remote's closing tx, our local commit, and the entries in remoteUnsignedLocalUpdates and unsignedAckedUpdates, this Settle entry was actually received by the remote, and we did remove it from our local commit, but failed to remove it in remoteUnsignedLocalUpdates.

Remote's force close tx.

local commit:
outgoingHTLCs: len=1
    Timeout: (uint32) 751840,
    Amount: (lnwire.MilliSatoshi) 32425551 mSAT, <- found in remote's force close tx

incomingHTLCs: len=7
   Timeout: (uint32) 752149,
   Amount: (lnwire.MilliSatoshi) 100014250 mSAT, <- found in remote's force close tx
  
   Timeout: (uint32) 752076,
   Amount: (lnwire.MilliSatoshi) 100000400 mSAT,
   HtlcIndex: (uint64) 669, <- found in `remoteUnsignedLocalUpdates`
  
   Timeout: (uint32) 752284,
   Amount: (lnwire.MilliSatoshi) 42713792 mSAT,
   HtlcIndex: (uint64) 671, <- found in `remoteUnsignedLocalUpdates`

   Timeout: (uint32) 752036,
   Amount: (lnwire.MilliSatoshi) 50000051 mSAT, <- found in remote's force close tx

   Timeout: (uint32) 752251,
   Amount: (lnwire.MilliSatoshi) 30002761 mSAT, <- found in remote's force close tx

   Timeout: (uint32) 752076,
   Amount: (lnwire.MilliSatoshi) 17954862 mSAT, <- found in remote's force close tx

   Timeout: (uint32) 752076,
   Amount: (lnwire.MilliSatoshi) 11000725 mSAT, <- found in remote's force close tx

This is our current state loaded from db,

┌──────────────────────────┐       
│  remoteCommitChain tip   │       
│                          │       
│ - height: 1544           │       
│ - ourMessageIndex: 1014  │       
│ - ourHtlcIndex: 280      │       
│ - theirMessageIndex: 955 │       
│ - theirHtlcIndex: 676    │       
└──────────────────────────┘       
┌──────────────────────────┐ ┌─────────────────────────┐
│  remoteCommitChain tail  │ │  localCommitChain tail  │
│                          │ │                         │
│ - height: 1543           │ │ - height: 1542          │
│ - ourMessageIndex: 1012  │ │ - ourMessageIndex: 1010 │
│ - ourHtlcIndex: 280      │ │ - ourHtlcIndex: 280     │
│ - theirMessageIndex: 955 │ │ - theirMessageIndex: 955│
│ - theirHtlcIndex: 676    │ │ - theirHtlcIndex: 676   │
└──────────────────────────┘ └─────────────────────────┘
 ┌─────────────────────────┐ ┌─────────────────────────┐ 
 │remoteUnsignedLocalUpdate│ │   unsignedAckedUpdates  │
 │                         │ │                         │
 │  local update entries   │ │  remote update entries  │
 │-------------------------│ │-------------------------│
 │       Settle:1009       │ └─────────────────────────┘
 │      parent Add:670     │(we received, pendingLocalAck)
 │-------------------------│
 │        Fail:1010        │
 │      parent Add:669     │
 │-------------------------│
 │        Fail:1011        │
 │      parent Add:671     │
 └─────────────────────────┘
 (we sent, pendingRemoteAck)

We will now look at the tails to understand how to update remoteUnsignedLocalUpdate.

Arrive at remote commit chain tail

To arrive remoteCommitChain tail 1543, there must exist a previous commit 1542. And we'd have the following state,

┌──────────────────────────┐ ┌─────────────────────────┐
│  remoteCommitChain tip   │ │   localCommitChain tip  │
│                          │ │                         │
│ - height: 1543           │ │ - height: ?             │
│ - ourMessageIndex: 1012  │ │ - ourMessageIndex: ?    │
│ - ourHtlcIndex: 280      │ │ - ourHtlcIndex: ?       │
│ - theirMessageIndex: 955 │ │ - theirMessageIndex: ?  │
│ - theirHtlcIndex: 676    │ │ - theirHtlcIndex: ?     │
└──────────────────────────┘ └─────────────────────────┘
┌──────────────────────────┐ ┌─────────────────────────┐
│  remoteCommitChain tail  │ │  localCommitChain tail  │
│                          │ │                         │
│ - height: 1542           │ │ - height: ?             │
│ - ourMessageIndex: ?     │ │ - ourMessageIndex: ?    │
│ - ourHtlcIndex: ?        │ │ - ourHtlcIndex: ?       │
│ - theirMessageIndex: ?   │ │ - theirMessageIndex: 955│
│ - theirHtlcIndex: ?      │ │ - theirHtlcIndex: 676   │
└──────────────────────────┘ └─────────────────────────┘

We actually don't know how many items are in localCommitChain, but we do know some info about its tail. When revoking commit 1542, we would filter update entries from localUpdateLog. If the entry's LogIndex is in range [localCommitChain.tail.ourMessageIndex, remoteCommitChain.tip.ourMessageIndex), it'd be saved to the bucket. This means the localCommitChain.tail.ourMessageIndex is at most 1009 here since we've loaded an entry with index 1009 from db.

For simplicity, we'll assume the localCommitChain.tail has height=1541, ourMessageIndex=1009, ourHtlcIndex=280. This means when we revoke remote commit 1542, the state looks like this,

┌──────────────────────────┐ ┌─────────────────────────┐
│  remoteCommitChain tip   │ │   localCommitChain tip  │
│                          │ │                         │
│ - height: 1543           │ │ - height: ?             │
│ - ourMessageIndex: 1012  │ │ - ourMessageIndex: ?    │
│ - ourHtlcIndex: 280      │ │ - ourHtlcIndex: ?       │
│ - theirMessageIndex: 955 │ │ - theirMessageIndex: ?  │
│ - theirHtlcIndex: 676    │ │ - theirHtlcIndex: ?     │
└──────────────────────────┘ └─────────────────────────┘
┌──────────────────────────┐ ┌─────────────────────────┐
│  remoteCommitChain tail  │ │  localCommitChain tail  │
│                          │ │                         │
│ - height: 1542           │ │ - height: 1541          │
│ - ourMessageIndex: ?     │ │ - ourMessageIndex: 1009 │
│ - ourHtlcIndex: ?        │ │ - ourHtlcIndex: 280     │
│ - theirMessageIndex: ?   │ │ - theirMessageIndex: 955│
│ - theirHtlcIndex: ?      │ │ - theirHtlcIndex: 676   │
└──────────────────────────┘ └─────────────────────────┘

And our localUpdateLog must at least have entries 1009, 1010, 1011, remoteUpdateLog must have the corresponding Add entries, whose htlc indexes are 669, 670, 671, and their log indexes are 948, 949, 950.

Arrive at local commit chain tail

Similarly, there must exist a commit 1541 to arrive at localCommitChain tail 1542.

┌──────────────────────────┐ ┌─────────────────────────┐
│  remoteCommitChain tip   │ │   localCommitChain tip  │
│                          │ │                         │
│ - height: ?              │ │ - height: 1542          │
│ - ourMessageIndex: ?     │ │ - ourMessageIndex: 1010 │
│ - ourHtlcIndex: ?        │ │ - ourHtlcIndex: 280     │
│ - theirMessageIndex: ?   │ │ - theirMessageIndex: 955│
│ - theirHtlcIndex: ?      │ │ - theirHtlcIndex: 676   │
└──────────────────────────┘ └─────────────────────────┘
┌──────────────────────────┐ ┌─────────────────────────┐
│  remoteCommitChain tail  │ │  localCommitChain tail  │
│                          │ │                         │
│ - height: ?              │ │ - height: 1541          │
│ - ourMessageIndex: 1010  │ │ - ourMessageIndex: ?    │
│ - ourHtlcIndex: 280      │ │ - ourHtlcIndex: ?       │
│ - theirMessageIndex: ?   │ │ - theirMessageIndex: ?  │
│ - theirHtlcIndex: ?      │ │ - theirHtlcIndex: ?     │
└──────────────────────────┘ └─────────────────────────┘

When revoking local commit 1541, we'd also update what's saved in remoteUnsignedLocalUpdates by checking LogIndex >= localCommitChain.tip.ourMessagIndex. This means only entries with LogIndex>=1010 are saved. Yet we did see log 1009 from db, which means either,

  1. by the time we revoked local commit 1541, remoteUnsignedLocalUpdates didn't have log 1009, which means remote commit 1542 hadn't been revoked yet.
  2. we somehow saved local commit 1542 to disk without updating remoteUnsignedLocalUpdates. Since these actions are wrapped inside a db tx, it's unlikely to happen?

On the other hand, if the above revoke is finished, we'd have something like the following,

┌──────────────────────────┐
│  remoteCommitChain tip   │
│                          │
│ - height: ?              │
│ - ourMessageIndex: ?     │
│ - ourHtlcIndex: ?        │
│ - theirMessageIndex: ?   │
│ - theirHtlcIndex: ?      │
└──────────────────────────┘
┌──────────────────────────┐ ┌─────────────────────────┐
│  remoteCommitChain tail  │ │  localCommitChain tail  │
│                          │ │                         │
│ - height: 1542           │ │ - height: 1542          │
│ - ourMessageIndex: 1010  │ │ - ourMessageIndex: 1010 │
│ - ourHtlcIndex: 280      │ │ - ourHtlcIndex: 280     │
│ - theirMessageIndex: ?   │ │ - theirMessageIndex: 955│
│ - theirHtlcIndex: ?      │ │ - theirHtlcIndex: 676   │
└──────────────────────────┘ └─────────────────────────┘

From this state, when revoking any remote commit, it's impossible to have log 1009 saved in db since we require the LogIndex >= localCommitChain.tail.ourMessageIndex.

All the updates are meant to happen in a single goroutine. However, if somehow the link.Start has been invoked multiple times, we might have multiple goroutines. In this case, we could have,

goroutine 1, revoke remote commit 1542, and goroutine 2, revoke local commit 1541.

  1. g-1 save CommitDiff to disk
  2. g-1 prepare unsigned local updates(access localCommitChain.tail).
  3. g-2 add new local commit to chain tip.
  4. g-2 advance local commit chain tail <- too late, we've already prepared the remoteUnsignedLocalUpdates.
  5. g-2 filter remoteUnsignedLocalUpdates, which requires a lock. It also saves the current local chain tail to disk.
  6. g-1 waits for the lock and then saves remoteUnsignedLocalUpdates.
  7. g-1 advance chain tail.

Data parsed from debug log

local commit:
    height: (uint64) 1542
    ourMessageIndex: (uint64) 1010,
    ourHtlcIndex: (uint64) 280,
    theirMessageIndex: (uint64) 955,
    theirHtlcIndex: (uint64) 676,
    ourBalance: (lnwire.MilliSatoshi) 9453928956 mSAT,                                  
    theirBalance: (lnwire.MilliSatoshi) 158162652 mSAT,

outgoingHTLCs: len=1
    Timeout: (uint32) 751840,
    Amount: (lnwire.MilliSatoshi) 32425551 mSAT,
    LogIndex: (uint64) 1008,
    HtlcIndex: (uint64) 279,
    localOutputIndex: (int32) 3,
    remoteOutputIndex: (int32) 0,

incomingHTLCs: len=7
   Timeout: (uint32) 752149,
   Amount: (lnwire.MilliSatoshi) 100014250 mSAT,
   LogIndex: (uint64) 944,
   HtlcIndex: (uint64) 665,
   localOutputIndex: (int32) 7,
   remoteOutputIndex: (int32) 0,
  
   Timeout: (uint32) 752076,
   Amount: (lnwire.MilliSatoshi) 100000400 mSAT,
   LogIndex: (uint64) 948,
   HtlcIndex: (uint64) 669, <- found in `remoteUnsignedLocalUpdates`
   localOutputIndex: (int32) 6,
   remoteOutputIndex: (int32) 0,
  
   Timeout: (uint32) 752284,
   Amount: (lnwire.MilliSatoshi) 42713792 mSAT,
   LogIndex: (uint64) 950,
   HtlcIndex: (uint64) 671, <- found in `remoteUnsignedLocalUpdates`
   localOutputIndex: (int32) 4,
   remoteOutputIndex: (int32) 0,

   Timeout: (uint32) 752036,
   Amount: (lnwire.MilliSatoshi) 50000051 mSAT,
   LogIndex: (uint64) 951,
   HtlcIndex: (uint64) 672,
   localOutputIndex: (int32) 5,
   remoteOutputIndex: (int32) 0,

   Timeout: (uint32) 752251,
   Amount: (lnwire.MilliSatoshi) 30002761 mSAT,
   LogIndex: (uint64) 952,
   HtlcIndex: (uint64) 673,
   localOutputIndex: (int32) 2,
   remoteOutputIndex: (int32) 0,

   Timeout: (uint32) 752076,
   Amount: (lnwire.MilliSatoshi) 17954862 mSAT,
   LogIndex: (uint64) 953,
   HtlcIndex: (uint64) 674,
   localOutputIndex: (int32) 1,
   remoteOutputIndex: (int32) 0,

   Timeout: (uint32) 752076,
   Amount: (lnwire.MilliSatoshi) 11000725 mSAT,
   LogIndex: (uint64) 954,
   HtlcIndex: (uint64) 675,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 0,


remote commit: <- remote's force closing tx
    height: (uint64) 1543,
    ourMessageIndex: (uint64) 1012,
    theirMessageIndex: (uint64) 955,
    ourHtlcIndex: (uint64) 280,
    theirHtlcIndex: (uint64) 676,
    ourBalance: (lnwire.MilliSatoshi) 9454550956 mSAT,
    theirBalance: (lnwire.MilliSatoshi) 300876844 mSAT,

outgoingHTLCs: len=1
   Timeout: (uint32) 751840,
   Amount: (lnwire.MilliSatoshi) 32425551 mSAT,
   LogIndex: (uint64) 1008,
   HtlcIndex: (uint64) 279,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 3,

incomingHTLCs: len=5
   Timeout: (uint32) 752149,
   Amount: (lnwire.MilliSatoshi) 100014250 mSAT,
   LogIndex: (uint64) 944,
   HtlcIndex: (uint64) 665,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 5,

   Timeout: (uint32) 752036,
   Amount: (lnwire.MilliSatoshi) 50000051 mSAT,
   LogIndex: (uint64) 951,
   HtlcIndex: (uint64) 672,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 4,
  
   Timeout: (uint32) 752251,
   Amount: (lnwire.MilliSatoshi) 30002761 mSAT,
   LogIndex: (uint64) 952,
   HtlcIndex: (uint64) 673,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 2,
  
   Timeout: (uint32) 752076,
   Amount: (lnwire.MilliSatoshi) 17954862 mSAT,
   LogIndex: (uint64) 953,
   HtlcIndex: (uint64) 674,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 1,
  
   Timeout: (uint32) 752076,
   Amount: (lnwire.MilliSatoshi) 11000725 mSAT,
   LogIndex: (uint64) 954,
   HtlcIndex: (uint64) 675,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 0,


pending remote commit:
   height: (uint64) 1544,
   ourMessageIndex: (uint64) 1014,
   ourHtlcIndex: (uint64) 280,
   theirMessageIndex: (uint64) 955,
   theirHtlcIndex: (uint64) 676,
   ourBalance: (lnwire.MilliSatoshi) 9555187206 mSAT,
   theirBalance: (lnwire.MilliSatoshi) 350876895 mSAT,

outgoingHTLCs: len=1
   Timeout: (uint32) 751840,
   Amount: (lnwire.MilliSatoshi) 32425551 mSAT,
   LogIndex: (uint64) 1008,
   HtlcIndex: (uint64) 279,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 3,

incomingHTLCs: len=3
   Timeout: (uint32) 752251,
   Amount: (lnwire.MilliSatoshi) 30002761 mSAT,
   LogIndex: (uint64) 952,
   HtlcIndex: (uint64) 673,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 2,

   Timeout: (uint32) 752076,
   Amount: (lnwire.MilliSatoshi) 17954862 mSAT,
   LogIndex: (uint64) 953,
   HtlcIndex: (uint64) 674,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 1,

   Timeout: (uint32) 752076,
   Amount: (lnwire.MilliSatoshi) 11000725 mSAT,
   LogIndex: (uint64) 954,
   HtlcIndex: (uint64) 675,
   localOutputIndex: (int32) 0,
   remoteOutputIndex: (int32) 0,

remoteUnsignedLocalUpdates

LogIndex: (uint64) 1009,
ID: (uint64) 670,
Settle

LogIndex: (uint64) 1010, 
ID: (uint64) 669,
Fail

LogIndex: (uint64) 1011,
ID: (uint64) 671, 
Fail
@Crypt-iQ
Copy link

if somehow the link.Start has been invoked multiple times

It's a bit trickier -- one link can be existing, unable to send messages, but still be processing a message (including invoking SignNextCommitment / ReceiveRevocation / AdvanceCommitChainTail / etc) that was handed to the link right before the connection hit EOF. The other link, on the new connection, can send & process like a normal link at the same time.

  1. prepare unsigned local updates

Not sure what you mean here, the remoteUnsignedLocal updates are done in UpdateCommitment (local revoke) and AdvanceCommitChainTail (remote revoke)

filter remoteUnsignedLocalUpdates

I think this filtering step is actually a no-op. I couldn't figure out any situation it could happen. I think I added it to be safe and copy the logic from the unsignedAckedUpdates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment