Jiantao Fu fuji246

OpenAI's Eval tool.

Eval provides a framework and leaderboard benchmarking different language models against a standardized set of tests measuring capabilities like reasoning, knowledge, and fluency.

Eval focuses on benchmarking the core model quality - it is not a full testing solution for downstream applications and interfaces built on top of models.

The standard tests measure narrow academic metrics. Real-world applications require validation of business logic, data correctness, personalized performance etc.

Eval ranks public reference models. It does not offer environment to build custom tests for proprietary systems.

LLMs have the potential to assist with several aspects of traditional software testing:

Test case generation: LLMs could analyze requirements documents and design specifications to automatically generate test cases that cover different scenarios, use cases, and edge cases. This could amplify testing significantly.
Test script creation: For manual testing needs, LLMs could take high-level descriptions of test scenarios and automatically generate detailed test scripts and cases ready for execution. This would reduce manual scripting effort.
Test data generation: Generating effective test data is crucial but challenging. LLMs may be trained on past test data to automatically generate new valid and invalid test data sets for robust testing.
Log analysis: Large logs produced during testing are hard to analyze manually. LLMs can be leveraged to automatically parse logs to identify software defects, anomalies, warnings to surface insights.

Keybase proof

I hereby claim:

I am fuji246 on github.
I am jeromyfu (https://keybase.io/jeromyfu) on keybase.
I have a public key ASAtj5xF-8mOSZvCj93gfjqji5B6Lqpf4yOTr4QQX6gi8Ao

To claim this, I am signing this object:

Notes for ACEPC T11 Ubuntu installation

Install Ubuntu 20.04 fix the HDMI audio issue. But need to set Start Application to choose the default audio input/output.
WiFi driver:

sudo apt install dkms bc git 
git clone https://github.com/brektrou/rtl8821CU.git
cd rtl8821CU

sudo apt-get update 
sudo apt-get -y dist-upgrade
sudo apt-get install raspberrypi-ui-mods -y

sudo apt-get install libgdk-pixbuf2.0-dev -y
sudo /usr/lib/arm-linux-gnueabihf/gdk-pixbuf-2.0/gdk-pixbuf-query-loaders  --update-cache
sudo reboot

	// Implements a tunneling forward proxy for CONNECT requests, while also
	// MITM-ing the connection and dumping the HTTPs requests/responses that cross
	// the tunnel.
	//
	// Requires a certificate/key for a CA trusted by clients in order to generate
	// and sign fake TLS certificates.
	//
	// Eli Bendersky [https://eli.thegreenplace.net]
	// This code is in the public domain.
	package main

	#!/bin/bash

	# kbit
	LINK_BW="800"

	RAND_LOSS="0.00"

	# ms
	DELAY=0

	git clean -dfx && git submodule foreach --recursive "git clean -dfx" && git checkout . && git pull origin master && git submodule sync && git submodule update --init --recursive


	git submodule sync
	git submodule foreach --recursive
	git submodule update --init --recursive
	git reset --hard
	git clean -fdx

	#!/bin/bash
	#
	# (Above line comes out when placing in Xcode scheme)
	#
	# Inspired by original script by incanus:
	# https://gist.github.com/1186990
	#
	# Rewritten by martijnthe:
	# https://gist.github.com/1379127
	#

	import argparse
	import random
	import matplotlib.pyplot as plt
	from collections import defaultdict

	class PktDrop(object):

	def drop(self):
	return False