Overview
The goal
Understand and test how AI and Gen AI can help make the WhatsApp contact experience faster and more streamlined, for both customers and chat agents.
Throughout the process I focused on addressing the unique problems within the human agent teams and for customers of John Lewis. With the advancements of this technology it can be easy to want to make fancy features for the sake of features and so I was keen for the projects goals and outcomes to be steered by the human experience (both the customer and human agents) and the intended business outcomes.
My contribution
Product design and product strategy
Responsible principles for Ai
User research
The team
1 × product manager
1 × product designer (me)
1 x business analyst
3 × engineers
Year
2023

Process
Laying down the groundwork
The design process for the Ai WhatsApp chatbot involved several key stages. Firstly, negotiating to conduct a time boxed discovery (through a variety of workshops / observational studies / interviews facilitated and lead by myself) to understand the needs and pain points of the agents within the customer service centres who were responding to customer messages.
First challenge: metrics to measure
I felt it important to analyse the existing messaging transcripts. As this was the first time this analysis was being done, I worked with various teams to make sure this was done safely and created a framework so the team to complete this faster in the future.
As the the metrics that existed we very limited: whilst creating an initial design for a customer feedback functionality I also created a plan for implementing and baselining other metrics before launching our tests.
Shaping the hypotheses
Based on the insights from this research, I facilitated design workshops within the team to refine some of the initial ideas. These workshops helped shape the hypotheses for the design tests.
I also created a working prototype of the conversation which linked up with large language models so that a more realistic experience could be shared with stakeholders. The design included features such as an order look up that didn't need an order number, instead displaying a carousel of most recent orders.
A continuing relationship with the human agents
From the beginning of the project I ensured we always remained grounded and connected with the human experience of the agents in the customer service centres. This was through workshops, listening for their input into the designs and overall an open line of communication.
Improvement plans
Upgrading to a newer version of the Salesforce back end which would mean we could use more of the UI interactions such as carousels for recent order look up.
Whilst AI technology is constantly evolving, the ways in which we can ensure clarity and consistency of experience for users is ever improving.
One of the challenges with Intent based chatbots is how to deal with queries that don’t match an intent and what to do with those. This could be done through a combination of analysing the chatbot’s “missed messages” that it didn’t understand and having a feedback loop or also by using Gen AI to create example utterances.
However having such large amounts of training data affects the performance of the cost of running the chatbot.
There will also be intents that are:
Simply out of scope of the customer service capabilities or chatbot
or simply attempts to break the chatbot so it behaves in an undesirable way.
So I started to think of how I would in future work with data science/machine learning engineering team to build a bot which could use a variety of different ways of functioning (both Gen AI and intent based training).
In a way that could mean that these intents could be answered using a combination of:
- The traditional intent model
- RAG - it references closed information points like documents and is much faster to train
- Calls on APIs to be able to retrieve personal info like last orders
- Can also use Gen AI to be able to understand the Jailbreak/out of scope queries as these can be difficult to predict with intents.
For example, in the future we could work out more accurate classifications - so how accurately is this matching with a certain intent?
It would be great to work with the data science team to explore ways to classify levels of accuracy that we want to accept as a match and what falls beneath that (like image below)






