OpenAI Releases New Text-To-Speech Models

At its inaugural developer day, OpenAI unveiled a plethora of new APIs.

After initially appearing on ChatGPT and Bing Chat, OpenAI's text-to-image model, DALL-E 3, is now accessible via an API.

According to the company, the API has built-in moderation to assist in guarding against misuse, much like the earlier DALL-E versions (like DALL-E 2).

The DALL-E 3 API costs $0.04 per created envision and includes many quality, format, and resolution options, ranging from 1024 x 1024 to 1792 x 1024.

However, at least right now, it isn't as robust as the DALL-E 2 API.

OpenAI launches DALL-E 3 API

The DALL-E 3 API, in contrast to the DALL-E 2 API, does not allow users to create edited versions of images by instructing the model to change specific portions of an existing image or to make variants of an existing image.

Additionally, OpenAI claims that when a generating request is delivered to DALL-E 3, it will automatically rewrite it "to add more detail" and "for safety reasons."

Depending on the prompt, this might result in less accurate answers.

Furthermore, OpenAI is also offering two generative AI model types along with six preset voices to pick from Alloy, Fable, Echo, Onyx, Nova, and Shimer.

This text-to-speech API is called Audio API. With prices starting at $0.015 for each 1,000 characters entered, it is available for use as of right now.