Skip to content

Instantly share code, notes, and snippets.

@austintackaberry
Created September 18, 2024 20:37
Show Gist options
  • Save austintackaberry/76617f31748d62dcf626c1efcfd90013 to your computer and use it in GitHub Desktop.
Save austintackaberry/76617f31748d62dcf626c1efcfd90013 to your computer and use it in GitHub Desktop.
Braintrust JSONDiff Inaccurate 100%
sharecal:eval: > NODE_ENV=production pnpx braintrust eval "./evals/json-repro.eval.ts"
sharecal:eval:
sharecal:eval: .../Library/pnpm/store/v3/tmp/dlx-5175 | Progress: resolved 1, reused 0, downloaded 0, added 0
sharecal:eval: .../Library/pnpm/store/v3/tmp/dlx-5175 | Progress: resolved 56, reused 27, downloaded 0, added 0
sharecal:eval: .../Library/pnpm/store/v3/tmp/dlx-5175 | Progress: resolved 84, reused 62, downloaded 0, added 0
sharecal:eval: .../Library/pnpm/store/v3/tmp/dlx-5175 | +106 +++++++++++
sharecal:eval: .../Library/pnpm/store/v3/tmp/dlx-5175 | Progress: resolved 127, reused 106, downloaded 0, added 0
sharecal:eval: .../Library/pnpm/store/v3/tmp/dlx-5175 | Progress: resolved 127, reused 106, downloaded 0, added 106, done
sharecal:eval: Processing 1 evaluators...
sharecal:eval: Experiment main-1726691757 is running at https://www.braintrust.dev/app/sharecal/p/JSONDiff%20repro/experiments/main-1726691757
sharecal:eval: ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ | JSONDiff repro | 1% | 1/100 datapoints
sharecal:eval:
sharecal:eval:
sharecal:eval: =========================SUMMARY=========================
sharecal:eval: main-1726691757 compared to main-1726691587:
sharecal:eval: 100.00% 'JSONDiff' score (0 improvements, 0 regressions)
sharecal:eval:
sharecal:eval: 0.00s 'duration' (0 improvements, 0 regressions)
sharecal:eval:
sharecal:eval: See results for main-1726691757 at https://www.braintrust.dev/app/sharecal/p/JSONDiff%20repro/experiments/main-1726691757
Tasks: 6 successful, 6 total
Cached: 4 cached, 6 total
Time: 19.055s
import { Eval } from "braintrust";
import { JSONDiff } from "autoevals";
const testFn = () => {
return [
{
start: new Date("2023-08-01T14:00:00.000Z"),
end: new Date("2023-08-01T15:00:00.000Z"),
},
];
};
Eval<any, any>("JSONDiff repro", {
data: () => {
return [
{
input: {
rawText: "test1",
},
expected: [
{
start: new Date("2023-08-03T14:00:00.000Z"),
end: new Date("2023-08-03T15:00:00.000Z"),
},
],
},
];
},
task: async (input) => testFn(),
scores: [JSONDiff],
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment