OpenAI is introducing the Realtime API: developers can now build fast speech-to-speech experiences into their applications

NoCash \ AI & machine learning \ OpenAI is introducing the Realtime API: developers can now build fast speech-to-speech experiences into their applications

3 octombrie 2024

OpenAI is introducing a public beta of the Realtime API, enabling all paid developers to build low-latency, multimodal experiences in their apps. „Similar to ChatGPT’s Advanced Voice Mode, the Realtime API supports natural speech-to-speech conversations using the six preset voices already supported in the API.” – according to the press release.

„We’re also introducing audio input and output in the Chat Completions API to support use cases that don’t require the low-latency benefits of the Realtime API. With this update, developers can pass any text or audio inputs into GPT-4o and have the model respond with their choice of text, audio, or both.” – the company said.

From language apps and educational software to customer support experiences, developers have already been leveraging voice experiences to connect with their users. „Now with the Realtime API and soon with audio in the Chat Completions API, developers no longer have to stitch together multiple models to power these experiences. Instead, you can build natural conversational experiences with a single API call.” – the company explained.

How it works

Previously, to create a similar voice assistant experience, developers had to transcribe audio with an automatic speech recognition model like Whisper, pass the text to a text model for inference or reasoning, and then play the model’s output using a text-to-speech model. This approach often resulted in loss of emotion, emphasis and accents, plus noticeable latency.

With the Chat Completions API, developers can handle the entire process with a single API call, though it remains slower than human conversation. The Realtime API improves this by streaming audio inputs and outputs directly, enabling more natural conversational experiences. It can also handle interruptions automatically, much like Advanced Voice Mode in ChatGPT.

Under the hood, the Realtime API lets you create a persistent WebSocket connection to exchange messages with GPT-4o. The API supports function calling, which makes it possible for voice assistants to respond to user requests by triggering actions or pulling in new context. For example, a voice assistant could place an order on behalf of the user or retrieve relevant customer information to personalize its responses.

Availability & pricing

The Realtime API will begin rolling out today in public beta to all paid developers. Audio capabilities in the Realtime API are powered by the new GPT-4o model gpt-4o-realtime-preview.

Audio in the Chat Completions API will be released in the coming weeks, as a new model gpt-4o-audio-preview. With gpt-4o-audio-preview, developers can input text or audio into GPT-4o and receive responses in text, audio, or both.

The Realtime API uses both text tokens and audio tokens. Text input tokens are priced at $5 per 1M and $20 per 1M output tokens. Audio input is priced at $100 per 1M tokens and output is $200 per 1M tokens. This equates to approximately $0.06 per minute of audio input and $0.24 per minute of audio output. Audio in the Chat Completions API will be the same price.

What’s next

As we work towards general availability, we’re actively collecting feedback to improve the Realtime API. Some of the capabilities we plan to introduce include:

More modalities: To start, the Realtime API will support voice, and we plan to add additional modalities like vision and video over time.

Increased rate limits: Today the API is rate limited to approximately 100 simultaneous sessions for Tier 5 developers, with lower limits for Tiers 1-4. We will increase these limits over time to support larger deployments.

Official SDK support: We will integrate support for Realtime API into the OpenAI Python and Node.js SDKs.

Prompt Caching: We will add support for Prompt Caching so previous conversation turns can be reprocessed at a discount.

Expanded model support: The Realtime API will also support GPT-4o mini in upcoming versions of that model.

We’re looking forward to seeing how developers leverage these new capabilities to create compelling new audio experiences for their users across a variety of use cases from education to translation, customer service, accessibility and beyond.

New funding to scale the benefits of AI

Every week, over 250 million people around the world use ChatGPT to enhance their work, creativity, and learning. Across industries, businesses are improving productivity and operations, and developers are leveraging our platform to create a new generation of applications. And we’re only getting started.

OpenAI raised $6.6B in new funding at a $157B post-money valuation. The new funding will allow the company „to double down on our leadership in frontier AI research, increase compute capacity, and continue building tools that help people solve hard problems” – according to the press release.

„We aim to make advanced intelligence a widely accessible resource. By collaborating with key partners, including the U.S. and allied governments, we can unlock this technology’s full potential.” – the company said.

Taguri: OpenAI

Noutăți

10 octombrie 2025

G20 roadmap for cross-border payments: „targets are not going to be hit by 2027”

Plati & transferuri

9 octombrie 2025

What else are banks betting on: Citi Ventures invests in stablecoin infrastructure platform BVNK „to power the next generation of financial infrastructure”

e-money & crypto

9 octombrie 2025

Stablecoins issued by different issuers on different blockchains can be fungible to the same extent as commercial bank deposits – ECB report

Analize

9 octombrie 2025

Checkout.com launches Flow Remember Me, a one-click solution for global digital payments

e-commerce

9 octombrie 2025

Payment preference for CBDC in Korea: over cash and mobile fast payment but less preferred than cards; adoption rate as the most preferred payment method ranging 19−27% of respondents – study.

Studii

9 octombrie 2025

Why payment facilitators are becoming acquirers (and what it means)

Blog

9 octombrie 2025

Finqware is the only Romanian start-up in the Sifted 250 — the Europe’s fastest-growing companies of 2025

Fintech

9 octombrie 2025

Two-thirds of financial services professionals admit employees are already using unapproved AI tools to communicate with customers. This poses serios cybersecurity concerns.

Studii

Cifra/Declaratia zilei

Dariusz Mazurkiewicz – CEO at BLIK Polish Payment Standard

Banking 4.0 – „how was the experience for you”

„To be honest I think that Sinaia, your conference, is much better then Davos.”

Many more interesting quotes in the video below:

Sondaj

NoCash TV

Republica Moldova este conectată la SEPA: plăți în euro rapide, sigure și la costuri reduse, la fel ca în Uniunea Europeană

vezi video

NVIDIA announces £2 billion investment in the United Kingdom AI Startup Ecosystem

vezi video