How to Use Open AI Text to Speech API in 3 mins

May 16, 2024

Learn How to Use Open AI Text to Speech API in 3 Minutes

The ability to transform written text into natural-sounding speech has become increasingly valuable for various applications. OpenAI's Text-to-Speech (TTS) API empowers you to leverage this technology, bringing your text to life with a range of voices and functionalities. This guide explores the API's capabilities and equips you to integrate it into your projects.

What You'll Need:

Before diving in, ensure you have the following:

  • OpenAI Account: Sign up for a free account on https://openai.com/ to access the API and its functionalities. Login to OpenAI and go to the API section right beside ChatGPT.
  • API Key: In the API page, navigate to the left of the page and locate the API keys section. Generate a new secret API key from your account dashboard. This key acts as your unique identifier when interacting with the API. Once you have that, copy the key and be sure to save it somewhere safe and secure.

Getting Started

  1. This method requires writing code to send requests to the API and interpret the responses. You can create a virtual environment, such as Visual Studio (VS) Code, where you can run the API.
  2. Enter Key Inputs for Speech Endpoints:
    • The model name
    • The text that should be turned into audio: Ensure your text is well-formatted and free of errors for optimal results.
    • The voice to be used for the audio generation: OpenAI provides multiple voice models, each with distinct characteristics. Choose the one that aligns with your project's needs (e.g., tts-1 for real-time use, tts-1-hd for superior quality).

Next, copy this sample request from OpenAI’s text-to-speech documentation:

from pathlib import Path from openai import OpenAI  client = OpenAI() speech_file_path = Path(__file__).parent / "speech.mp3" response = client.audio.speech.create(  model="tts-1",  voice="alloy",  input="Today is a wonderful day to build something people love!" )  response.stream_to_file(speech_file_path)

3. Now, this code won’t run until you pass the API key. Head back to the OpenAI tab on your browser and navigate to the Usage section under API. Here, you’ll see what the cost of your usage is, and you can increase the limit to allow for more usage. The application is not free to use. Then, you can pass the secret key you generated earlier to the OpenAI( ) object.

For example:

client = OpenAI(api_key="secret key goes here")

4.  Next, you’ll need to install dotenv into your virtual environment (VS Code) using the following command:

pip install python-dotenv


Once you run the command, you can call the environment variables. Creating a .env file will let you conceal your secret key even if the code is shared publicly. So create a .env file and insert the following:

SECRET_KEY = "insert your secret key token here"

Now, in your main.py file, you can call the environment variable using dotenv. The code looks like this:

import os from pathlib import Path from openai import OpenAI from dotenv import load_dotenv load_dotenv( ) SECRET_KEY = os.getenv("SECRET_KEY") client = OpenAI(api_key=SECRET_KEY) speech_file_path = Path(__file__).parent / "speech.mp3" response = client.audio.speech.create( model="tts-1",  voice="alloy",  input="Today is a wonderful day to build something people love!" )  response.stream_to_file(speech_file_path)


Now, the code will run in your virtual environment, and you can begin running your text-to-speech program for audio generation.

Beyond the Basics

While this guide focuses on the core functionalities, the OpenAI TTS API offers advanced features for seasoned users. These include:

  • Customization Options: The API allows fine-tuning aspects like speaking rate and pitch to tailor the generated speech to your specific needs.
  • Integration with Other OpenAI Services: Explore the potential of combining the TTS API with other OpenAI services, such as GPT-3, to create even more powerful applications.

The Power of Speech at Your Fingertips

By leveraging the OpenAI Text-to-Speech API, you can unlock new possibilities for your projects. From enhancing eLearning experiences to creating interactive voice assistants, the API empowers you to transform text into a natural and engaging form of communication. Experiment, explore, and unleash the potential of speech within your projects!

Subscribe to our newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.