Imagine the all-too-common scene: a customer, frustrated with a recurring issue, navigates to your website for help. They open the chat widget and are greeted by a simplistic, text-based chatbot that fails to understand their nuanced problem. After several failed attempts to get a real answer, they finally type “talk to a human,” only to be told the wait time is over an hour. By the time an agent is available, the customer’s frustration has boiled over, and the opportunity for a positive interaction is long gone. On the other side of the screen, support agents are drowning in a sea of tickets, copy-pasting scripted responses, and struggling to keep up with the volume. This constant pressure leads to burnout and a decline in the quality of customer care, turning your support center into a cost center rather than a value driver.
The fundamental challenge is one of scale versus personality. As businesses grow, the personal touch that defined their early success becomes nearly impossible to maintain. Traditional solutions present a difficult choice: hire more agents, which is costly and difficult to scale, or rely on text-based automation, which often feels robotic and impersonal, damaging the customer experience. Customers crave speed, but they also crave empathy and understanding—qualities that standard chatbots sorely lack. This is where the next evolution of customer interaction comes into play, a solution that combines the instant response time of automation with the genuine warmth of human conversation. What if you could provide an immediate, spoken response to every customer query, directly within the chat interface they’re already using?
This guide will walk you through the exact steps to build a real-time voice response system that integrates the hyper-realistic AI voices of ElevenLabs directly into the Intercom messenger. This powerful combination allows you to automate initial responses with a voice that sounds genuinely human, providing instant engagement and satisfaction. We will explore the technical architecture, from setting up the webhooks in Intercom to generating voice with the ElevenLabs API and delivering it back to the customer in seconds. By the end of this walkthrough, you will have a clear blueprint for transforming your customer support from a reactive bottleneck into a proactive, engaging, and highly efficient engine for customer happiness. Prepare to bridge the gap between automation and personalization, creating a support experience that is not only instant but also impressively human.
Why Voice Is the Next Frontier in Customer Messaging
For years, the gold standard for digital customer service has been live chat. It’s faster than email and more convenient than a phone call. However, as chat has become dominated by simplistic, text-based bots, its effectiveness has waned. Customers are growing tired of robotic, unhelpful interactions. They can immediately sense when they aren’t talking to a real person, and the conversation becomes a game of figuring out the right keywords to escalate to a human agent.
This is where AI-powered voice changes the game. By responding with a spoken message instead of just text, you immediately break the robotic mold. A human-like voice conveys empathy, tone, and personality in a way that plain text never can. It makes the interaction feel more personal and significant, signaling to the customer that their query is being taken seriously. Research consistently shows that a majority of consumers still prefer human interaction for customer service. AI voice provides the perfect hybrid solution, delivering the speed and availability of a bot with the personal touch of a human agent.
Prerequisites: What You’ll Need to Get Started
Before we dive into the integration, let’s assemble the necessary tools. This is a technical walkthrough, but the components are straightforward and well-documented.
Your ElevenLabs API Key
ElevenLabs is the core of our voice generation engine. Their API allows you to convert text to speech with incredibly realistic and customizable voices. You will need an active ElevenLabs account to get your API key, which authenticates your requests to their service.
An Active Intercom Account
Intercom will serve as our customer communication platform. You’ll need an account with administrative privileges that allow you to create and manage apps within the Intercom Developer Hub. This is where we will configure the webhooks that trigger our voice response workflow.
A Server Environment for Webhooks
To connect Intercom and ElevenLabs, we need a small application to act as the intermediary. This server will listen for incoming messages from Intercom (via webhooks), send the text to ElevenLabs for voice generation, and then post the resulting audio file back into the Intercom conversation. For this guide, we’ll use a simple Node.js server with the Express framework, but the logic can be adapted to any language or serverless environment you prefer.
The Step-by-Step Guide to Integrating ElevenLabs with Intercom
Now we get to the fun part: building the integration. Follow these steps carefully to connect the two platforms and bring your real-time voice response system to life.
Step 1: Setting Up Your Intercom App and Webhooks
First, you need to tell Intercom where to send notifications when a new message arrives.
- Navigate to the Intercom Developer Hub: Log in to your Intercom account and go to the Developer Hub section in your settings.
- Create a New App: Give your app a name (e.g., “ElevenLabs Voice Responder”) and ensure you have access to the necessary scopes, primarily the ability to read conversations and write replies.
- Configure Webhooks: Find the “Webhooks” section of your app’s settings. You’ll need to provide a publicly accessible URL from your server (using a tool like ngrok for local development can be very helpful here). Subscribe to the
conversation.user.createdtopic. This will send a payload to your server every time a new conversation is started by a user.
Step 2: Building the Server to Receive Intercom Messages
Next, we’ll create the server application that listens for the webhook from Intercom. Using Node.js and Express, the initial setup is minimal.
const express = require('express');
const bodyParser = require('body-parser');
const app = express();
app.use(bodyParser.json());
const PORT = process.env.PORT || 3000;
app.post('/webhook/intercom', (req, res) => {
// Log the entire incoming payload to inspect its structure
console.log('Received Intercom webhook:', JSON.stringify(req.body, null, 2));
// Basic validation: Intercom sends a 'notification_type' of 'ping' to test the endpoint
if (req.body.type === 'ping') {
console.log('Received ping from Intercom.');
return res.status(200).send({ message: 'Ping successful' });
}
// Look for a new message in the payload
if (req.body.topic === 'conversation.user.created') {
const conversation_parts = req.body.data.item.conversation_parts.conversation_parts;
if (conversation_parts && conversation_parts.length > 0) {
const messageText = conversation_parts[0].body.replace(/<[^>]+>/g, ''); // Strip HTML tags
// TODO: Call the ElevenLabs function with messageText
}
}
res.status(200).send(); // Respond immediately to Intercom
});
app.listen(PORT, () => {
console.log(`Server is running on port ${PORT}`);
});
This basic server listens on the /webhook/intercom endpoint. It parses the incoming JSON payload from Intercom, extracts the text from the user’s first message, and prepares it for the next step.
Step 3: Generating Speech with the ElevenLabs API
With the user’s message in hand, we now send it to ElevenLabs to generate the voice response. You’ll need axios or a similar HTTP client to make the API call.
const axios = require('axios');
const fs = require('fs');
const path = require('path');
async function generateVoiceResponse(text, conversationId) {
const XI_API_KEY = 'YOUR_ELEVENLABS_API_KEY';
const VOICE_ID = 'YOUR_CHOSEN_VOICE_ID'; // e.g., '21m00Tcm4TlvDq8ikWAM'
const response = await axios({
method: 'POST',
url: `https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}`,
headers: {
'Accept': 'audio/mpeg',
'Content-Type': 'application/json',
'xi-api-key': XI_API_KEY,
},
data: {
text: `Hello! You said: "${text}". We have received your message and will be with you shortly.`,
model_id: 'eleven_monolingual_v1',
voice_settings: {
stability: 0.5,
similarity_boost: 0.5,
},
},
responseType: 'stream',
});
// Save the audio to a file
const filePath = path.join(__dirname, `${conversationId}.mp3`);
const writer = fs.createWriteStream(filePath);
response.data.pipe(writer);
return new Promise((resolve, reject) => {
writer.on('finish', () => resolve(filePath));
writer.on('error', reject);
});
}
This function takes the text, crafts a simple reply, and sends it to the ElevenLabs API. It streams the resulting MP3 audio and saves it to a local file named after the conversation ID for easy tracking.
Step 4: Replying in Intercom with the Voice Message
Finally, once the audio file is saved, we need to upload it and post it as a reply in the Intercom conversation. This requires using the Intercom API with proper authentication (an access token from your Intercom App).
async function postVoiceReplyToIntercom(conversationId, audioFilePath) {
const INTERCOM_ACCESS_TOKEN = 'YOUR_INTERCOM_ACCESS_TOKEN';
// Step A: Upload the file to get a public URL
// This step is pseudo-code. You need a way to host the file publicly.
// For example, upload to an AWS S3 bucket and get the public URL.
const publicUrl = await uploadFileAndGetPublicUrl(audioFilePath);
// Step B: Post the reply to Intercom
await axios({
method: 'POST',
url: `https://api.intercom.io/conversations/${conversationId}/reply`,
headers: {
'Authorization': `Bearer ${INTERCOM_ACCESS_TOKEN}`,
'Content-Type': 'application/json',
'Accept': 'application/json',
},
data: {
type: 'admin',
admin_id: 'YOUR_ADMIN_ID', // ID of the admin/bot user
message_type: 'comment',
body: `<a href="${publicUrl}">Listen to our voice response</a>`,
}
});
}
This final function uploads the saved MP3 file to a public hosting service (like Amazon S3 or a simple public folder on your server) and then uses the Intercom API to post a comment containing a link to that audio file. The customer can then click the link to hear the spoken response instantly.
Best Practices for Your AI Voice System
Building the integration is just the beginning. To make your voice response system truly effective, consider these best practices:
- Choose the Right Voice: Your brand isn’t generic, so your voice shouldn’t be either. Use ElevenLabs’ Voice Lab to clone a voice that represents your brand’s persona or design a new one that feels warm, empathetic, and professional.
- Craft Smart, Natural Responses: The initial automated response should be more than just an acknowledgment. Use it to set expectations, provide initial guidance, or ask clarifying questions to gather more context for the human agent who may take over later.
- Plan the Handoff: Not every query can be solved by an initial voice response. Design a clear and seamless escalation path. If the user replies to the voice message, the system should automatically assign the conversation to a human agent, who now has a full transcript and initial context.
Remember the frustrated customer and the overwhelmed agent from the beginning? With this system, the customer now receives an immediate, personal-sounding voice message acknowledging their issue and assuring them it’s being handled. The agent, in turn, receives the ticket with the initial interaction already complete, allowing them to step in with more context and less pressure. You haven’t just automated a response; you’ve elevated the entire customer experience.
By integrating the leading-edge voice AI from ElevenLabs with a ubiquitous platform like Intercom, you can reclaim the personal touch that was lost to scale. This isn’t about replacing humans but empowering them, freeing them from repetitive tasks to focus on the high-value, complex interactions where their expertise truly matters. Ready to revolutionize your customer support with real-time voice? Click here to sign up for ElevenLabs.



