Skip to content

Instantly share code, notes, and snippets.

@covener
Last active September 16, 2024 20:01
Show Gist options
  • Save covener/7b5c922ff73221ee345ae59ef0c2c9d2 to your computer and use it in GitHub Desktop.
Save covener/7b5c922ff73221ee345ae59ef0c2c9d2 to your computer and use it in GitHub Desktop.

newRoutingInfo is only ever assigned to in the child process, it is used for the initial routing info and all subsequent ones.

    case MT_S2C_BEGIN_ROUTING_INFO:
        TR_DEBUG(ctx,rec,"clientProcess: received begin routing info message\n");
        odrRoutingInfoRelease(odr->newRoutingInfo);
        odr->newRoutingInfo = odrRoutingInfoCreate(odr);
        break;

There is in fact no way to create a new odr->routingInfo in the server process, it is eternal.

If newRoutingInfo is always NULL in the server process, then: pi->routingInfo is always reused across connectors:

    pi = createParseInfo(odr, odr->newRoutingInfo != NULL ? odr->newRoutingInfo : odr->routingInfo);

This means the objMap the server is working on is not replaced, and it's the objMap that retains info between redirects.

The objMap is also never wholesale destroyed on its own.

-/-

It appears the only way that entries can leave the objMap in the server process is to process a delete{} JSON operation, but there is no delete that happens when we get redirected from one connector to another.

Why is the lifecycle of this structure wrong? Because it is the amalgamation of objects from multiple connectors groups, but the objects inside don't know this so they can't be taken out. Each CG does have its own JSON, so perhaps when we are sending a non-delta, we could turn all the entries in the CG JSON into a "del" operation to clear out the object map. Or otherwise have 1 objMap per connector and teach the lookups about it.

  • Why do we have an objMap in the server process in the first place?

    • We only set pi->callObjectHandlers=TRUE in the server process here:

          /*
           * Call main logic to process the json that we just received and return the vid (version ID).
           * The vid is per-OdrConnectorInternal.  It starts off at a value of 1 in the response to the initial POST.
           * Each delta message contains a new vid, which is incremented by one each time.
           * If we fail over to another connector, a new POST is sent and the vid starts at 1 again.
           */
          TR_DEBUG1(ctx,rec,"calling jsonHandle - resetRoutingInfo=%d\n",pi->resetRoutingInfo);
      
          pi->callObjectHandlers = odrIsServerProcess(odr);
      

      But we do not need the objects to rebuild the JSON, and we don't need them to make connections to CGs.

      It may be related to:

              /*
           * Anytime we see an object of a type that requires resetting the routing info,
           * we set the bit so that it can be checked upon return and the routing info
           * can be reset.  For example, whenever we change an application object or
           * a web module object, we must reset all routing info.  The reason for this is
           * that the request mapper and URLMatcher code is not able to add/remove in
           * a thread-safe way.  Once this is done, we can change this so that we don't
           * have to rebuild everything as often.
           */
          pi->resetRoutingInfo |= oh->requiresRoutingInfoReset;
          ...
              if (odrIsServerProcess(odr)) {
          TR_DEBUG(ctx,rec,"serverProcess: broadcast JSON to all client processes\n");
          if (pi->resetRoutingInfo) {
              hmResetInfo(odr);
              c2sResetInfo(odr,c);
              
              ^^^ too late to clear the objmap, but hmResetInfo() is similar
      
              sendAllRoutingInfoToClient(odr,-1);
          } else {
              OdrMessage msg;
              /*
               * Broadcast the json message to all client processes.
               */
              TR_DEBUG(ctx,rec,"serverProcess: broadcast JSON to all client processes\n");
              msg.hdr.type = isDelta?MT_S2C_ROUTING_INFO_DELTA:MT_S2C_ROUTING_INFO;
              msg.hdr.len = strlen(jsonStr)+1;
              msg.body = jsonStr;
              odrServerProcessBroadcastMessage(odr->serverProcess,&msg);
          }
      } else if (odrIsClientProcess(odr)) {
      
      

-/-

If all of the above is true, how does any connector failover scenario work?

Are we missing messages about "failed to add":

[14/Sep/2024:14:04:19.97311] 00000010 f01f4700 - ODR:DEBUG: failed to add cell object which already exists: /cell/cell1

Does it somehow only manifest in this sequence?

  • server1 is stopped
  • attached node agent killed and server1 started simultaneously?
  • reconnect completes and v1 JSON cannot "add" server1
  • new child is spawned and inherits json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment