Integration Concepts
Understanding the core concepts for Voice AI development
As a developer building Voice AI apps, you will have to work with 7 core concepts:
- Configuring the user session that you start with settings like model, transcriptions, speed etc
- Sending the user voice to the provider like OpenAI or Gemini
- Sending tool call response
- Receiving user voice transcriptions
- Receiving agent response audio
- Receiving agent response transcript
- Receiving tool call
Each provider has its own WebSocket specification to communicate and execute these 7 core concepts. The implementation details, message formats, and event structures vary significantly between providers, creating integration complexity and vendor lock-in.
In the scenarios below, you will see what specification you should use for OpenAI Realtime and Gemini Live, and how changing just one connection string gives you the power of multiple providers without changing the API specification style you are already using.
A working example is available in our repository here and details on setting up are available in the README.
Scenario 1: You are already using OpenAI Realtime and want to integrate with RealtimeSwitch API
Here is a Node.js example showing how to migrate from OpenAI direct connection to RealtimeSwitch:
Node.js OpenAI Migration Example
Before: Direct OpenAI Connection
import WebSocket from "ws";
const url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17";
const ws = new WebSocket(url, {
headers: {
"Authorization": `Bearer ${process.env.OPENAI_API_KEY}`,
"OpenAI-Beta": "realtime=v1",
}
});
ws.on("open", function open() {
console.log("Connected to OpenAI server.");
// 1. Configure user session
ws.send(JSON.stringify({
type: "session.update",
session: {
modalities: ["text", "audio"],
voice: "alloy",
input_audio_format: "pcm16",
output_audio_format: "pcm16",
turn_detection: { type: "server_vad", silence_duration_ms: 700 }
}
}));
// 2. Send user voice
ws.send(JSON.stringify({
type: "input_audio_buffer.append",
audio: base64AudioData
}));
});
// 3-7. Receive responses
ws.on("message", function incoming(message) {
const data = JSON.parse(message.toString());
switch (data.type) {
case 'conversation.item.input_audio_transcription.completed':
console.log('User said:', data.transcript);
break;
case 'response.audio.delta':
playAudio(data.delta);
break;
case 'response.audio_transcript.delta':
console.log('AI response:', data.delta);
break;
case 'response.function_call_arguments.done':
handleToolCall(data.name, data.arguments);
break;
}
});
After: RealtimeSwitch Connection
We will only change the host URL with new query parameters as shown below, and all other code remains unchanged. Important parameters to note:
- rs_api - Defines the WebSocket API specification we are using. In this case, it's OPENAI.
- rs_core - Specifies the voice AI provider, which in this case is also OPENAI.
Interestingly, the rs_core parameter allows you to manually change the provider (e.g., GEMINI) while keeping the original API style (OPENAI), or let RealtimeSwitch handle provider switching automatically based on performance rules and underlying uptime monitoring.
// Generate HMAC authentication (backend only)
function generateAuthHash(sessionId, secretKey) {
return crypto.createHmac('sha256', secretKey)
.update(sessionId, 'utf8').digest('hex');
}
const accountId = '996-sdassds-86-asd'; // Test credentials - replace with your own
const sessionId = 'session-123';
const authHash = generateAuthHash(sessionId, process.env.SECRET_KEY);
const url = `ws://localhost:3000?rs_accid=${accountId}&rs_u_sessid=${sessionId}&rs_auth=${authHash}&rs_api=OPENAI&rs_core=OPENAI`;
const ws = new WebSocket(url);
ws.on("open", function open() {
// Same OpenAI session configuration and audio sending
// All rest remains same
});
Only the connection URL and authentication method change. All your existing OpenAI session handling and message processing code works unchanged!
Note: The account ID and secret key shown above are test credentials. After logging into your RealtimeSwitch account, you will find your actual account ID and secret key in your dashboard. Replace the test values with your own credentials. For more details on security, visit the Security section.
Scenario 2: You are already using Gemini Live and want to integrate with RealtimeSwitch API
Here is a Node.js example showing how to migrate from Gemini Live direct connection to RealtimeSwitch:
Node.js Gemini Migration Example
Before: Direct Gemini Connection
import WebSocket from "ws";
const apiKey = process.env.GEMINI_API_KEY;
const url = `wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent?key=${apiKey}`;
const ws = new WebSocket(url);
ws.on("open", function open() {
console.log("Connected to Gemini server.");
// 1. Configure user session
ws.send(JSON.stringify({
setup: {
model: "models/gemini-2.0-flash-live-001",
generation_config: {
response_modalities: ["AUDIO"],
speech_config: {
voice_config: { prebuilt_voice_config: { voice_name: "Aoede" } }
}
}
}
}));
// 2. Send user voice
ws.send(JSON.stringify({
realtimeInput: {
media_chunks: [{
mime_type: "audio/pcm",
data: base64AudioData
}]
}
}));
});
// 3-7. Receive responses
ws.on("message", function incoming(message) {
const data = JSON.parse(message.toString());
if (data.serverContent) {
if (data.serverContent.inputTranscription) {
console.log('User said:', data.serverContent.inputTranscription.text);
}
if (data.serverContent.modelTurn?.parts) {
handleGeminiAudio(data.serverContent.modelTurn.parts);
}
if (data.serverContent.outputTranscription) {
console.log('AI response:', data.serverContent.outputTranscription.text);
}
if (data.serverContent.functionCall) {
handleToolCall(data.serverContent.functionCall);
}
}
});
After: RealtimeSwitch Connection
We will only change the host URL with new query parameters as shown below, and all other code remains unchanged. Important parameters to note:
- rs_api - Defines the WebSocket API specification we are using. In this case, it's GEMINI.
- rs_core - Specifies the voice AI provider, which in this case is also GEMINI.
Interestingly, the rs_core parameter allows you to manually change the provider (e.g., OPENAI) while keeping the original API style (GEMINI), or let RealtimeSwitch handle provider switching automatically based on performance rules and underlying uptime monitoring.
// Generate HMAC authentication (backend only)
function generateAuthHash(sessionId, secretKey) {
return crypto.createHmac('sha256', secretKey)
.update(sessionId, 'utf8').digest('hex');
}
const accountId = '996-sdassds-86-asd'; // Test credentials - replace with your own
const sessionId = 'session-123';
const authHash = generateAuthHash(sessionId, process.env.SECRET_KEY);
const url = `ws://localhost:3000?rs_accid=${accountId}&rs_u_sessid=${sessionId}&rs_auth=${authHash}&rs_api=GEMINI&rs_core=GEMINI`;
const ws = new WebSocket(url);
ws.on("open", function open() {
// Same Gemini session configuration and audio sending
// All rest remains same
});
Your existing Gemini message handling logic works unchanged!
Note: The account ID and secret key shown above are test credentials. After logging into your RealtimeSwitch account, you will find your actual account ID and secret key in your dashboard. Replace the test values with your own credentials. For more details on security, visit the Security section.