This document details how ElevenLabs and OpenAI are integrated into the CityU Vet Sim application for interactive voice conversations and automated evaluation of student performance.
The CityU Vet Sim application uses two primary AI services to enable realistic veterinary consultation simulations:
Both systems leverage a sophisticated prompt generation system (src/utils/createPrompt.ts) that creates highly detailed, skill-specific prompts to ensure the AI pet owner behaves realistically and helps students practice specific communication skills.
ElevenLabs provides the primary voice conversation interface for the application, enabling real-time, bidirectional voice communication between students and the AI pet owner.
ElevenLabs is integrated using the @elevenlabs/react library, which provides a React hook (useConversation) for managing WebRTC-based voice conversations.
Key Components:
src/components/(chat)/ElevenLabsRecorder.tsx: The main component that manages the ElevenLabs conversation session. It handles:
src/components/ElevenLabsConversation.tsx: A simpler wrapper component that provides a basic interface for ElevenLabs conversations, useful for testing or simpler use cases.
Backend Integration (src/server/api/routers/elevenlabs.ts):
The ElevenLabs router provides server-side procedures for secure interaction with the ElevenLabs API:
getAgentId: Returns the ElevenLabs agent ID from environment variables for client-side WebRTC connection.getCustomPrompt: Generates a custom system prompt for a specific skill and case using the createPrompt utility, then appends a base prompt for the AI assistant.getConversationToken: Generates a conversation token for WebRTC connection with private agents that have authentication enabled.getSignedUrl: Generates a signed URL for WebSocket connection with ElevenLabs, required for private agents.The ElevenLabs agent is configured using environment variables:
ELEVENLABS_AGENT_ID: The unique identifier for the ElevenLabs conversational AI agent.ELEVENLABS_API_KEY: The API key for authenticating requests to the ElevenLabs API.When starting a conversation, the application:
Fetches Custom Prompt: Uses api.elevenlabs.getCustomPrompt.useQuery() to retrieve a skill- and case-specific prompt generated by createPrompt.ts. This prompt defines the pet owner's persona, behavior, and the specific communication skill being practiced.
Selects Voice: Uses getVoiceIdByGender() from src/utils/voice-utils.ts to select an appropriate ElevenLabs voice ID based on the case's gender setting (male or female).
Applies Overrides: Passes these customizations to the ElevenLabs session via the overrides parameter:
const overrides = {
agent: {
prompt: {
prompt: customPrompt,
},
},
tts: {
voiceId: voiceId,
},
};
OpenAI is used for multiple purposes in the application, providing both an alternative voice conversation mode and automated evaluation capabilities.
Location: src/server/api/routers/voice.ts
The voiceRouter provides an alternative voice conversation mode using OpenAI's multimodal audio capabilities. This router implements the converse mutation, which:
webmToMp3() utility (falls back to WebM if ffmpeg is unavailable).createPrompt() to generate a skill- and case-specific prompt, then appends a base prompt.openai.audio.transcriptions.create() with the gpt-4o-mini-transcribe model to transcribe the user's speech.openai.chat.completions.create() with the gpt-4o-mini-audio-preview model, configured for both text and audio modalities, to generate both a textual response and audio output.Models Used:
gpt-4o-mini-transcribe: For speech-to-text transcriptiongpt-4o-mini-audio-preview: For generating both text and audio responsesOpenAI's Whisper model (via gpt-4o-mini-transcribe) is used to transcribe user speech in the voice conversation mode. The transcription is performed in parallel with the main conversation API call for efficiency.
OpenAI's gpt-4o-mini-audio-preview model generates audio responses directly, using the "alloy" voice and WAV format. The audio is returned as base64-encoded data that can be played back in the browser.
Location: src/server/api/routers/evaluation.ts
The evaluation system uses OpenAI to automatically generate personalized feedback for students based on their conversation performance.
Process:
buildSystemPrompt() function creates a comprehensive evaluation prompt that includes:aims array)generateText() from the ai SDK with the openai("gpt-5-mini") model to generate personalized feedback.Model Used:
gpt-5-mini: For generating evaluation feedbackPersonality-Based Tailoring:
The evaluation system tailors feedback based on the student's personality profile, which is represented by a four-letter code:
Location: src/utils/createPrompt.ts
The prompt generation system is a critical component that ensures the AI pet owner behaves realistically and helps students practice specific communication skills.
Key Features:
Function Structure:
export function createPrompt(skillId: number, caseData: Case): string
The function:
Usage:
The prompt is used in two main contexts:
overrides parameterElevenLabsRecorder.tsx.api.elevenlabs.getAgentId.useQuery() to get the agent ID.api.elevenlabs.getCustomPrompt.useQuery() with skillId and caseId to get the customized prompt.startSession() from useConversation hook with:onMessage callback receives messages from ElevenLabs, extracts the text and role, and updates the conversation state.endSession() and triggers cleanup callbacks.RecorderBar.tsx captures audio as WebM.api.voice.converse.useMutation() with:audioBase64: Base64-encoded WebM audiohistory: Previous conversation messagesskillId and caseId: For prompt generationcreatePrompt()userText: Transcribed user speechassistantText: AI's textual responseassistantAudioBase64: AI's audio response (WAV, base64)api.evaluation.evaluate.useMutation() with:conversation: Array of messages with roles and contentskillId: The skill that was practicedcaseId: The case that was usedgpt-5-mini modelThe following environment variables are required for AI integration:
ElevenLabs:
ELEVENLABS_AGENT_ID: Your ElevenLabs conversational AI agent IDELEVENLABS_API_KEY: Your ElevenLabs API keyOpenAI:
OPENAI_API_KEY: Your OpenAI API key (used for all OpenAI services)These should be configured in your environment configuration file (typically src/env.js).# AI Integration Documentation
This document details how ElevenLabs and OpenAI are integrated into the CityU Vet Sim application for interactive voice conversations and automated evaluation of student performance.