Let’s consider this case from cases.py
:
class e16(DefaultInference, BaseExample):
"""
Example 16, p83
P1 There is a ten and an eight and a four, or else there is a jack and a king and a queen, or else there is an ace.
P2 There isn't a four.
P3 There isn't an ace.
"""
v: tuple[View, View, View] = (
ps("{King()Jack()Queen(),Ace(),Four()Ten()Eight()}"),
ps("{~Four()}"),
ps("{~Ace()}"),
)
c: View = ps("{King()Jack()Queen()}")
How should we structure the question? I discuss the considerations of LLM question asking in Running LLM Evals, then propose three formats:
Which of these should we do? All of them? What am I missing?
Question:
P1 There is a ten and an eight and a four, or else there is a jack and a king and a queen, or else there is an ace. P2 There isn't a four. P3 There isn't an ace. What, if anything, follows?
Answer:
there is a jack and a king and a queen
Wrong answers for multiple choice, something like this:
A. There is a jack and a king and a queen B. There is a ten and an eight and a four C. There is an ace D. Nothing follows
This is just like asking it in natural language, but we can use a formal encoding language like first order logic or the ETR representation.
LLMs will struggle with this, especially if we use an uncommon encoding, but we can compensate by doing multi-shot questioning, or explaining the format in the question.
Question: