While debugging, after refactoring some tests and test helpers, I found myself struggling to make use of the debug log output that I was adding as it can easily get lost in the structured logging and myriad other debug log lines which print in the context of integration tests.
The threshold for writing a script to solve a one-off problem like this might typically be measured in a large number of hours or perhaps even day(s) for me. Using LLMs to generate and test such a script can greatly reduce the time and directed attention required to get to making use of your scirpt to solve your actual problem (rather than debugging it). So much so, that I would argue that it's worth reconsidering that threshold to prefer giving LLMs a shot much earlier than one otherwise may.
Asking chatGPT to directly accomplish the task resulted in failure for both GPT3.5 and GPT4 using the following prompt:
split the following log lines into 3 groups containing consecutive lines containing a peerID, then compare the peerIDs contained within each group to those in every other group and explain any discrepancies you see. Let's work step by step:
... (contents of log_lines.txt)
Quickly ran out of tokens, resulting in having to prompt it to continue where it left off, introducing errors when it finally gets to the comparison step.
Took a very long time to just produce the groups. It also eventually ran out of tokens and had to be prompted to continue.
Continuing from GPT3.5's incorrect discrepency summary, I had the thought to prompt for code which would produce the solution, rather than the solution itself:
write a python script which can produce a similar summary of the discrepancies given a text file which contains the log lines
The results are the contents of analyze_log.py
.
To ensure correctness and speed up my understanding of the code, I asked it to write a test for the script which it produced; the contents of which are in analyze_logs_test.py
.
Love it!