See ‣ for more details.

Convert Questions

Convert existing cases into a format that can be easily run through an LLM engine.

Data

See https://github.com/dreamingspires/PyETR/blob/master/pyetr/cases.py for a list of cases.

GPT3logs.docx

Generated Questions

See https://github.com/Oxford-HAI-Lab/PyETR/blob/master/lm_eval/data_generation/random_logical/generated/generated_cases_medium.py

Running Evals

Create a harness for running questions through LLMs. See ‣ for details about that, but the upshot is that I’m going to use LM Evaluation Harness, which should do what we want.

High Level Desiderata

What Type of Question

See ‣

High Level Goals

Data Collection