What Questions to Ask?

Let’s consider this case from cases.py:

class e16(DefaultInference, BaseExample):
    """
    Example 16, p83

    P1 There is a ten and an eight and a four, or else there is a jack and a king and a queen, or else there is an ace.
    P2 There isn't a four.
    P3 There isn't an ace.
    """

    v: tuple[View, View, View] = (
        ps("{King()Jack()Queen(),Ace(),Four()Ten()Eight()}"),
        ps("{~Four()}"),
        ps("{~Ace()}"),
    )
    c: View = ps("{King()Jack()Queen()}")

How should we structure the question? I discuss the considerations of LLM question asking in Running LLM Evals, then propose three formats:

  1. Multiple Choice (Natural Language)
  2. Multiple Choice (Encoded)
  3. Open Ended

Which of these should we do? All of them? What am I missing?

Running LLM Evals

Multiple Choice (Natural Language)

Question:

P1 There is a ten and an eight and a four, or else there is a jack and a king and a queen, or else there is an ace. P2 There isn't a four. P3 There isn't an ace. What, if anything, follows?

Answer:

there is a jack and a king and a queen

Wrong answers for multiple choice, something like this:

A. There is a jack and a king and a queen B. There is a ten and an eight and a four C. There is an ace D. Nothing follows

Multiple Choice (Encoded)

This is just like asking it in natural language, but we can use a formal encoding language like first order logic or the ETR representation.

LLMs will struggle with this, especially if we use an uncommon encoding, but we can compensate by doing multi-shot questioning, or explaining the format in the question.

Question: