Human-Centred AI UX
Traditional UX research methods aren't sufficient for LLM products. This article explores specialized techniques—prompt probes, comprehension testing, expectation mapping, and trust calibration—that help teams understand how users interact with AI, what they truly comprehend, and whether they're trusting AI assistants appropriately for better product outcomes.
6/23/20254 min read


As large language models reshape digital experiences, traditional UX research methods are struggling to keep pace. The probabilistic nature of AI outputs, combined with users' evolving mental models of what machines can do, demands a fundamentally different approach to understanding user needs and behaviors. For teams building LLM-powered products, the challenge isn't just creating functional interfaces—it's developing research practices that account for the unique dynamics of human-AI interaction.
The New Research Landscape
The literature on AI-mediated interactions has expanded exponentially, presenting challenges for researchers in staying current, while AI systems now require consideration of informed use, misinformed use, and no use—a unique difference from traditional software design. This complexity means researchers must employ specialized techniques to understand how users form expectations, comprehend AI capabilities, and calibrate their trust appropriately.
Prompt Probes: Understanding User Intent
Prompt probes represent a critical research method for LLM products—techniques that reveal how users naturally communicate their needs to AI systems. Rather than simply analyzing what users type into a chatbot, prompt probing involves structured, iterative approaches that follow frameworks like R.E.F.I.N.E. (Role, Expectations, Format, Iterate, Nuance, Example). This methodology helps researchers understand the mental models users bring to AI interactions.
In practice, prompt probes involve observing users as they craft requests to LLM systems, then asking them to explain their reasoning. Why did they phrase it that way? What did they expect the system to understand? What information did they assume they needed to provide? These insights reveal gaps between how users think LLMs work and how they actually function, informing both interface design and system prompting strategies.
Teams are now experimenting with personalized prototypes that make experiences feel more real, with some practitioners moving toward co-creation sessions where users build solutions suited to their specific contexts. This shift from static mockups to dynamic, AI-powered prototypes enables more authentic testing of prompt patterns and conversational flows.
Comprehension Testing: Measuring True Understanding
Just because an LLM response seems fluent doesn't mean users understand it. Comprehension testing for AI products goes beyond traditional usability metrics to assess whether users genuinely grasp what the system is telling them. Research testing language models on simple comprehension questions found that models perform at chance accuracy and waver considerably in their answers, highlighting that even sophisticated outputs may not convey actual understanding.
For UX researchers, this means implementing think-aloud protocols where users paraphrase AI responses in their own words. Can they explain the reasoning behind a recommendation? Do they understand the confidence level being expressed? Can they identify when information might be incomplete or uncertain?
Conversation quality metrics measure the chatbot's understanding, accuracy, and self-awareness—functioning as an IQ test that reveals whether the system is responding intelligently and appropriately. Researchers should apply similar scrutiny to user comprehension, testing whether people can accurately recall, apply, and evaluate the information they receive from LLM interfaces.
Expectation Mapping: Aligning Mental Models
One of the most critical challenges in LLM UX is managing user expectations. The goal shouldn't be to maximize trust at all costs, but to achieve well-calibrated trust—a state where users neither over-trust nor under-trust the system. Expectation mapping helps researchers visualize the gap between what users think an AI can do and its actual capabilities.
This technique involves assessing users' expectations and understanding of what the product can and cannot do, helping them set the appropriate level of trust rather than implicitly trusting the system in all circumstances. Researchers create expectation maps by conducting interviews before, during, and after users interact with LLM features, documenting:
What capabilities users assume the system has
Which tasks they believe are appropriate for AI assistance
How they expect the system to handle edge cases or failures
What level of accuracy they anticipate
Pre-interaction calibration happens before users engage with the system, and setting expectations up front can prevent initial over-trust, which is disproportionally more difficult to correct later. By mapping these expectations against actual system behavior, teams can identify where educational interventions, UI signals, or product changes are needed.
Detecting Trust Miscalibration
Perhaps the most dangerous UX failure in AI products is when users trust systems either too much or too little. Users shouldn't implicitly trust AI systems in all circumstances, but rather calibrate their trust correctly—research shows examples both of algorithm aversion and of people over-trusting AI systems.
To detect over-trust, researchers should observe users in scenarios where the AI is likely to fail or provide incomplete information. Do users verify outputs against other sources? Do they notice when responses seem inconsistent or vague? What differentiates correct use of automation versus misuse, disuse, and abuse is the user's ability to monitor the system, and to monitor something, users must understand how it works.
For under-trust, look for patterns where users abandon helpful AI features or refuse to engage with capabilities that would benefit them. Integrating user research with quantitative metrics ensures evaluations represent not only performance but also perception and impact. Exit interviews with users who stopped using features can reveal whether trust signals created false expectations or whether legitimate concerns weren't addressed.
Building a Research Practice
Implementing these methods requires establishing clear goals, requirements, and testing parameters from the outset to ensure aligned objectives across research, design, product management, and engineering teams. Unlike traditional products, AI experiences demand more frequent testing and customer validation given their tendency to produce variable outputs.
Research analysis can save approximately 20% of time when using LLMs for processing interview data, though the method requires clear structure, clear hypothesized outcomes, and remains highly reliant on researcher skills to catch errors. This suggests that while AI can augment research processes, the human researcher's expertise in interpretation and validation remains essential.
The intersection of AI capability and human understanding is where great LLM products are built. By employing prompt probes to understand intent, comprehension testing to verify understanding, expectation mapping to align mental models, and trust calibration techniques to ensure appropriate reliance, UX researchers can navigate the unique challenges of this technology. As AI continues to evolve, these research methods will be essential for creating experiences that genuinely serve human needs rather than merely showcasing technical capabilities.

