AI-powered English Speaking Tutor

Role

Senior Product Designer

Team

Product Designer (me)  
Product Manager  
AI/ML Engineers
  Full-stack Developer

Timeline

Research & Design — 5 weeks
Development — MVP in progress

My contribution

Product Strategy & Discovery  
User Research (Interviews, JTBD, Personas)  
AI Interaction Logic (ASR/LLM Prompting flows)
  Information Architecture & UX Scenarios  
UI Design System

Context

A cross-platform AI-powered tutor designed to maintain operational fluency through daily 3–5 minute monologue practices followed by instant, professional feedback. The product automates deep analytics—covering phonetics, grammar, and CEFR standards—helping users overcome the "frozen speech effect" without the need for a live tutor.

The Problem

Instead of just "learning" English, users at the Intermediate (B1) level and above face a unique challenge: maintaining the operational readiness of their speech. Through deep-dive interviews and affinity mapping, I identified that the primary struggle isn't a lack of knowledge, but the "Rusted" Speaking Muscle.

Key Insights & Strategic Pivots

Maintenance Over Learning: For B2+ users, speaking is a perishable skill; they don't need another academic course, but a 5-minute daily "warm-up" to keep their brain from feeling "empty" during actual meetings.
The Active vs. Passive Gap: While users can consume complex content (movies, articles), they lack the neural pathways to produce speech fluently, creating a frustrating disconnect between what they understand and what they can say.
The Feedback Void: In professional environments, peers are often too polite to correct mistakes, leading to a false sense of fluency while critical grammar and pronunciation defects persist.
Professionalism Over Gamification: High-level professionals (Engineers, PMs, Designers) are demotivated by "childish" rewards; they demand rigorous, teacher-like analysis across five metrics: fluency, pronunciation, grammar, vocabulary, and coherence.
Contextual Safety: Users require a "judgment-free sandbox" to make mistakes and build confidence before transitioning to high-stakes real-world interactions.

"I know the words, but when I start talking, my brain feels empty. I need to 'wake up' my speech before the actual meeting." — Insight from User Interviews.

Research Synthesis

I used the Affinity Diagramming method to synthesize qualitative data from user interviews. This process allowed me to identify a critical gap in current solutions: most apps focus on passive learning, while users suffer from 'The Speaking Barrier'. This insight led to the creation of the Recorder-first architecture, prioritizing active production over passive consumption.

Strategic Goal

To develop an MVP that transforms the stress of public speaking into a controlled daily habit. The goal is to provide professional-grade error analysis and a seamless difficulty gradient across topics, ensuring consistent growth and long-term user engagement.

Design Strategy

Bridging the Gap Between Passive and Active Fluency. Based on the identified pain points, my design strategy focused on two pillars: minimizing cognitive load during entry and maximizing value density at the output.

1. Eliminating Decision Fatigue
  (Home Screen)

The Insight: Users often practice after a long workday and suffer from "choice paralysis" when faced with complex lesson catalogs.
The Decision: I prioritized a single, clear "Start Speaking" action combined with a "Daily Challenge" prompt.
The Value: This approach reduces friction, turning a high-effort task into a low-barrier daily ritual.

2. Supporting Cognitive Flow   
(The Recorder)

The Insight: The "brain feels empty" phenomenon occurs when users lose their train of thought during a monologue.
The Decision: I implemented dynamic Hint Cards and talking points that guide the user without dictating a script.
The Value: By providing these "scaffolds," the design reduces the stress of speaking into a void and ensures the user reaches the 3–5 minute target duration.

3. Professional-Grade Feedback
(Results & Analysis)

The Insight: Senior professionals are demotivated by generic praise; they value objective, actionable data that mirrors a real tutor's feedback.
The Decision: I designed a "Professional Coaching" dashboard featuring color-coded error categories (Grammar, Pronunciation, Vocabulary) and a "Top 3 Mistakes" summary.
The Value: Aligning results with the CEFR scale provides users with a globally recognized benchmark of their growth, shifting the focus from "playing a game" to "professional development".

4. Bridging the "Content Leap"
  (The Library)

The Insight: Moving from basic daily English to abstract or literary topics is too jarring for most intermediate learners.
The Decision: I structured the library with a granular difficulty gradient, ranging from "Ordering a Taxi" to "Reviewing a Philosophy Book".
The Value: This allows users to scale their complexity at their own pace, preventing the demotivation caused by sudden spikes in difficulty.

5. Orchestrating AI for Seamless UX

The Insight: High-quality LLM analysis takes time (latency), and waiting for results can break the user's flow.
The Decision: I introduced an Iterative Loading State. The user sees the transcribed text immediately (via ASR), while the deep linguistic analysis loads progressively in the background.
The Value: This "Optimistic UI" approach makes the technology feel instantaneous, even when performing complex processing under the hood.

Technical Logic & AI Orchestration: Behind the Screen

For an AI-driven product, design doesn't end with the interface. I designed the interaction logic between the frontend and backend to ensure that technological complexity never interferes with the user experience.

1. The Data Journey:  From Voice to Insights

The process of turning raw audio into a structured professional report is divided into three synchronized stages:

Capture & Streaming: The client-side (React Native) captures audio and uses VAD (Voice Activity Detection) to filter out silence, sending only informative audio chunks via WebSockets.
Parallel ML Processing: While the recording is in progress, the server performs real-time ASR (Whisper) to generate text. Once finished, specialized workers trigger forced alignment for pronunciation analysis and LLM processing for grammatical and lexical evaluation.
Synthesis: An orchestrator aggregates results from all models into a single domain model (scores, mistakes, feedback) and delivers a final JSON payload to the frontend.

2. Designing for Latency:  UX vs. Processing Power

The main challenge in AI products is latency. To prevent the user from waiting in silence, I implemented several UX solutions:

Partial ASR (Streaming): Users see intermediate speech recognition in real-time, providing immediate feedback that the system is "listening".
Optimistic UI & Multi-Stage Loading: Results are delivered iteratively. The transcript and basic fluency metrics (WPM) appear first, while complex linguistic analysis from the LLM loads progressively as the model completes its task.

3. Smart Prompting Logic

I collaborated on designing the LLM prompting logic to ensure a "Professional Coaching" tone. The system is tuned to:

Highlight no more than 3–5 key errors to maintain motivation.
Provide a clear "Explain & Fix" structure for every correction.
Suggest specific exercises for the next day based on identified gaps.

Success Framework

The MVP launch is only the beginning. The design strategy was developed with measurable business goals and a clear scaling plan in mind.

1. Defining Success
  (North Star Metrics)

To evaluate the effectiveness of the solution, I selected key product metrics directly linked to the user experience:

D1/D7 Retention: The percentage of users completing ≥3 challenges per week (habit formation).
Average Monologue Duration: A target range of 2:30–4:30 minutes to confirm the "speaking barrier" is being overcome.
CEFR Growth Rate: Tracking the transition from B1 to B2 and the reduction of errors per minute.
NPS/CSAT: Specifically targeting the quality of the "Analysis & Tips" block.

2. Monetization & Sustainability

The design accommodates a two-tier monetization model to offset AI processing costs:

Free Tier: One daily challenge with basic error analysis.
Plus Tier: Unlimited topics, detailed phonemic pronunciation maps, and personalized learning paths.

3. The Future Roadmap

The product is designed for future iterations focused on deep personalization:

Phase 1 (Foundational Experience): Launching the core "Recorder → Analysis → Results" loop and defining North Star Metrics.
Phase 2 (Engagement): Implementing detailed progress dashboards, achievement systems (streaks), and advanced phonemic alignment.
Phase 3 (Personalization): Utilizing vector search (FAISS/pgvector) for AI-generated topics based on user interests and introducing offline ASR.

Final Thoughts:   The Designer's Role in AI Evolution

This project demonstrates that a designer’s role in the AI era goes beyond visual aesthetics. It is about orchestrating technology to solve human problems. We turned the intimidating process of a monologue into an accessible 5-minute ritual, where every second of speech works toward building the user's confidence.

Let’s get in touch

I am ready to discuss this experience or apply it to new projects.