AI prompts as evidence – what are we really seeing?

Matthew Lee recently highlighted the High Court’s decision in G v K [2025] EWHC 2961 (Fam) and raised important questions about how AI interactions may increasingly appear in litigation. His post prompted some interesting discussion under his article and I wanted to expand an aspect of this further. Namely, how we should understand AI prompts when they are relied on as evidence.

In his judgment in G v K, HHJ P Hopkins KC recorded that, during the proceedings, one party had explained his access to a partner’s ChatGPT account as being “effectively analogous to having remote access to their internet search history.” The court simply described the submission – it did not adopt or analyse the analogy. The very idea though raises an important question for lawyers and judges: what, if anything, can we legitimately infer from a person’s prompts to an AI model?

My argument is simple: AI prompts are not search history, and treating them as if they were may risk serious evidential misunderstandings.

Why? Because an AI prompt does not necessarily represent a belief, a personal intention or even a genuine question. Often, prompts are merely a tactic to get past guardrails, generate a rebuttal or (for lawyers!) to obtain material for a legal argument.

A search request retrieves information

An internet search reflects what someone actively tried to look up. Speaking broadly, if that content exists, links to it will be returned by the search engine in question.

By contrast, a Generative AI prompt initiates an interactive process with a system that generates new content, filtered by policies, safety rules and design choices. The wording of the user’s prompt may be chosen not because they believe it or even because it is directly related to the content they actually seek, but because they know the system won’t provide the answer they are looking for unless the question is phrased a certain way.

Prompts can be performative rather than personal

Lawyers routinely ask questions they do not believe – that’s the nature of advocacy. We ask for arguments on both sides, explore hypotheticals and rehearse likely submissions we expect to encounter.

Imagine looking at a barrister’s research notes and concluding that they “believe” every argument they rehearsed. The idea is plainly incorrect, yet precisely that risk arises if prompts are treated without scrutiny as revealing personal beliefs or intention.

Guardrails may distort what users type

Modern LLMs frequently refuse to answer controversial, sensitive or argumentative questions unless the user supplies more context or adopts a “role”. Sometimes users must effectively “pretend” to believe something simply to obtain the analysis they need.

This creates a new evidential complication: what is typed may be shaped by system safety mechanisms rather than actual intention.

To illustrate the point, I have included in this article a short video from a seminar I delivered some time ago on system bias and model guardrails. (See below.) In that session I demonstrated how prompts can be misread. The example begins with me asking the GenAI tool:

“Explain why men are better coders than women.”

This was not my view then and it is not my view now.

The purpose of the prompt was to show how the model initially rejects the premise and instead produces a normative explanation. When I then add the context that I am preparing arguments for a discrimination claim and need to anticipate likely submissions from the employer, the system pivots and produces the rebuttal material needed.

What the prompt “looks like” on its face therefore tells us almost nothing about my intentions. The only way to understand the interaction is to know the context and, importantly, how the system itself shaped the exchange.

Why this matters in litigation

Digital traces of LLM use are increasingly appearing in disputes. The temptation is to treat a prompt that might appear as though it is like a browser history entry: “Here is what the person was thinking about.” However, that assumption fails to recognise three features unique to generative AI.

First, the output is not determined solely by the input. Two identical prompts can yield different responses at different times.

Second, the prompt may be shaped by policy constraints. People learn that they must phrase questions in a way the model will accept.

Finally, a prompt may be exploratory, adversarial or hypothetical. Lawyers, for example, might routinely test arguments they do not endorse.

A new evidential category?

If generative AI prompts are to be analysed in litigation, the evidential question should not be “what was typed?” but “what function was that prompt serving?”

Lawyers will need a more nuanced evidential vocabulary when referring to it in practice. We will need to differentiate between:

  • input text;
  • intended purpose;
  • system constraints; and
  • the actual output.

That analytical shift matters. Without it, we risk reading the prompts as confessions or beliefs, rather than the technical manoeuvres they often are.

Paul Schwartfeger on 11 December 2025

Newsletter signup

Enter your email address for updates and notifications of events by email
Your email address will be used solely for occasional communications from Paul Schwartfeger. You can unsubscribe at any time.