[stock-market-ticker symbols="FB;BABA;AMZN;AXP;AAPL;DBD;EEFT;GTO.AS;ING.PA;MA;MGI;NPSNY;NCR;PYPL;005930.KS;SQ;HO.PA;V;WDI.DE;WU;WP" width="100%" palette="financial-light"]

OpenAI is introducing the Realtime API: developers can now build fast speech-to-speech experiences into their applications

3 octombrie 2024

OpenAI is introducing a public beta of the Realtime API, enabling all paid developers to build low-latency, multimodal experiences in their apps. „Similar to ChatGPT’s Advanced Voice Mode, the Realtime API supports natural speech-to-speech conversations using the six preset voices already supported in the API.” – according to the press release.

„We’re also introducing audio input and output in the Chat Completions API to support use cases that don’t require the low-latency benefits of the Realtime API. With this update, developers can pass any text or audio inputs into GPT-4o and have the model respond with their choice of text, audio, or both.” – the company said.

From language apps and educational software to customer support experiences, developers have already been leveraging voice experiences to connect with their users. „Now with the Realtime API and soon with audio in the Chat Completions API, developers no longer have to stitch together multiple models to power these experiences. Instead, you can build natural conversational experiences with a single API call.” – the company explained.

How it works

Previously, to create a similar voice assistant experience, developers had to transcribe audio with an automatic speech recognition model like Whisper, pass the text to a text model for inference or reasoning, and then play the model’s output using a text-to-speech model. This approach often resulted in loss of emotion, emphasis and accents, plus noticeable latency.

With the Chat Completions API, developers can handle the entire process with a single API call, though it remains slower than human conversation. The Realtime API improves this by streaming audio inputs and outputs directly, enabling more natural conversational experiences. It can also handle interruptions automatically, much like Advanced Voice Mode in ChatGPT.

Under the hood, the Realtime API lets you create a persistent WebSocket connection to exchange messages with GPT-4o. The API supports function calling, which makes it possible for voice assistants to respond to user requests by triggering actions or pulling in new context. For example, a voice assistant could place an order on behalf of the user or retrieve relevant customer information to personalize its responses.

Availability & pricing

The Realtime API will begin rolling out today in public beta to all paid developers. Audio capabilities in the Realtime API are powered by the new GPT-4o model gpt-4o-realtime-preview.

Audio in the Chat Completions API will be released in the coming weeks, as a new model gpt-4o-audio-preview. With gpt-4o-audio-preview, developers can input text or audio into GPT-4o and receive responses in text, audio, or both.

The Realtime API uses both text tokens and audio tokens. Text input tokens are priced at $5 per 1M and $20 per 1M output tokens. Audio input is priced at $100 per 1M tokens and output is $200 per 1M tokens. This equates to approximately $0.06 per minute of audio input and $0.24 per minute of audio output. Audio in the Chat Completions API will be the same price.

What’s next

As we work towards general availability, we’re actively collecting feedback to improve the Realtime API. Some of the capabilities we plan to introduce include:

More modalities: To start, the Realtime API will support voice, and we plan to add additional modalities like vision and video over time.

Increased rate limits: Today the API is rate limited to approximately 100 simultaneous sessions for Tier 5 developers, with lower limits for Tiers 1-4. We will increase these limits over time to support larger deployments.

Official SDK support: We will integrate support for Realtime API into the OpenAI Python and Node.js SDKs.

Prompt Caching: We will add support for Prompt Caching so previous conversation turns can be reprocessed at a discount.

Expanded model support: The Realtime API will also support GPT-4o mini in upcoming versions of that model.

We’re looking forward to seeing how developers leverage these new capabilities to create compelling new audio experiences for their users across a variety of use cases from education to translation, customer service, accessibility and beyond.

New funding to scale the benefits of AI

Every week, over 250 million people around the world use ChatGPT to enhance their work, creativity, and learning. Across industries, businesses are improving productivity and operations, and developers are leveraging our platform to create a new generation of applications. And we’re only getting started.

OpenAI raised $6.6B in new funding at a $157B post-money valuation. The new funding will allow the company „to double down on our leadership in frontier AI research, increase compute capacity, and continue building tools that help people solve hard problems” – according to the press release.

„We aim to make advanced intelligence a widely accessible resource. By collaborating with key partners, including the U.S. and allied governments, we can unlock this technology’s full potential.” – the company said.

Every week, over 250 million people around the world use ChatGPT to enhance their work, creativity, and learning. Across industries, businesses are improving productivity and operations, and developers are leveraging our platform to create a new generation of applications. „And we’re only getting started.”

Taguri:
Noutăți
Cifra/Declaratia zilei

Anders Olofsson – former Head of Payments Finastra

Banking 4.0 – „how was the experience for you”

So many people are coming here to Bucharest, people that I see and interact on linkedin and now I get the change to meet them in person. It was like being to the Football World Cup but this was the World Cup on linkedin in payments and open banking.”

Many more interesting quotes in the video below:

Sondaj

In 23 septembrie 2019, BNR a anuntat infiintarea unui Fintech Innovation Hub pentru a sustine inovatia in domeniul serviciilor financiare si de plata. In acest sens, care credeti ca ar trebui sa fie urmatorul pas al bancii centrale?