How to Build a Seamless Text-to-Speech Feature in Next.js & Cloudflare Edge

Listen to Article
Click to start listening
Hey everyone, Ajit Kumar Pandit here. If you've been following my recent work, you know how much I love pushing the boundaries of what web applications can do. Lately, I've been experimenting with AI-generated voice responses for a personal web assistant.
Adding Text-to-Speech (TTS) sounds easy on paper. But when you mix Next.js, Cloudflare Pages, and the Edge Runtime, things get... complicated.
Today, I want to show you exactly how to build a lightning-fast Voice AI feature using Google's TTS endpoints — and more importantly, how to dodge the frustrating 500 Internal Server Errors that break it in production.
Let's dive in!
Why Google TTS (and Why Popular Packages Break on Edge)
When I first started building this feature, I tried heavy AI packages like @lobehub/tts and google-tts-api. They work flawlessly on your local machine. But the moment you push to Cloudflare Pages, your console lights up with errors like:
"WebSocket error occurred" or "adapter is not a function"
Why does this happen?
These packages rely heavily on Node.js native HTTP adapters — specifically axios. Cloudflare's Edge network doesn't support the traditional Node.js HTTP environment natively. It expects standard Web APIs like fetch().
The fix? Bypass the heavy NPM packages entirely and hit Google's TTS endpoint directly using native fetch. It's faster, lighter, and 100% Edge-compatible.
Step 1: Preparing Your wrangler.toml (Cloudflare Users Only)
Google TTS sends back an audio buffer. To handle binary Buffer strings in an Edge environment, we need to tell Cloudflare to allow Node.js compatibility.
Open your wrangler.toml in the root of your Next.js project and add the compatibility flag:
name = "my-awesome-app"
compatibility_date = "2025-10-15"
type = "javascript"
# Add this line explicitly:
compatibility_flags = ["nodejs_compat"]
This flag tells Cloudflare to play nicely with native Node APIs — specifically allowing you to interact with node:buffer.
Step 2: The Next.js API Route
Create a file at src/app/api/tts/route.js and add the following:
// src/app/api/tts/route.js
import { Buffer } from "node:buffer";
import { NextResponse } from "next/server";
// Force the Edge Runtime for maximum performance
export const runtime = "edge";
export async function POST(request) {
try {
const { text } = await request.json();
if (!text) {
return NextResponse.json({ error: "Text is required" }, { status: 400 });
}
// Pro-Tip: Simple TTS endpoints get angry if your text is too long!
// Truncate to ~200 characters to prevent 400 Bad Request errors.
const truncatedText = text.substring(0, 200);
// Call Google's direct Translate TTS endpoint
const url = `https://translate.googleapis.com/translate_tts?ie=UTF-8&q=${encodeURIComponent(
truncatedText
)}&tl=en&client=tw-ob`;
// Always spoof a User-Agent, or Google will reject the automated request
const ttsResponse = await fetch(url, {
headers: {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
},
});
if (!ttsResponse.ok) {
throw new Error(`TTS Fetch Failed: ${ttsResponse.statusText}`);
}
// Convert the audio stream into a Base64 string for the frontend
const arrayBuffer = await ttsResponse.arrayBuffer();
const audioBuffer = Buffer.from(arrayBuffer);
const audioBase64 = audioBuffer.toString("base64");
return NextResponse.json({ audioBase64 });
} catch (error) {
console.error("TTS pipeline crashed:", error.message);
return NextResponse.json(
{ error: "Failed to generate audio." },
{ status: 500 }
);
}
}
🔍 Breaking Down the Key Decisions
| Decision | Why |
text.substring(0, 200) | Google's undocumented API rejects overly long strings |
User-Agent header | Without it, automated requests usually hit a 403 Forbidden |
Native fetch() | No Axios — works perfectly on Cloudflare Edge nodes globally |
Step 3: Playing the Audio on the Frontend
Now that the backend works, here's how to play the Base64 audio string in React:
"use client";
import { useState } from "react";
export default function TextToSpeechButton() {
const [loading, setLoading] = useState(false);
const speakText = async () => {
setLoading(true);
try {
const res = await fetch("/api/tts", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: "Hello! Ajit Kumar Pandit says welcome to the future.",
}),
});
const data = await res.json();
if (data.audioBase64) {
// Construct a data URI and play via the browser's native Audio API
const audio = new Audio(`data:audio/mpeg;base64,${data.audioBase64}`);
audio.play();
}
} catch (error) {
console.error("Failed to fetch audio", error);
} finally {
setLoading(false);
}
};
return (
<button
onClick={speakText}
disabled={loading}
className="px-4 py-2 bg-blue-600 text-white rounded hover:bg-blue-700 transition"
>
{loading ? "Generating Audio..." : "Speak! 🎤"}
</button>
);
}
By formatting the Base64 string as data:audio/mpeg;base64,..., the browser's native Audio object plays it instantly — no complex audio contexts or <audio> tags needed.
Final Thoughts
Building resilient applications is all about knowing your runtime environment.
While NPM packages like google-tts-api are excellent tools, dropping them blindly into Edge environments leads to painful axios adapter bugs that are hard to debug in production.
By reverting to the native fetch API, enabling Node.js buffer compatibility in wrangler.toml, and using Google's fallback TTS endpoints — we created a lightweight, lightning-fast edge AI audio generator with zero heavy dependencies.
I really enjoyed debugging this architecture challenge. Have you built any cool voice features recently? I'd love to hear about them!
Feel free to reach out: 📧 ajit@nakprc.com
Keep creating, Ajit Kumar Pandit Enhancing Future With Technology