OpenAI Showcases New Multimodal AI Model with Advanced Capabilities

TapTechNews May 12th news, according to TheInformation, OpenAI recently demonstrated a new multimodal artificial intelligence model to some customers, which can conduct voice conversations and object recognition. Insiders revealed that this may be one of the contents that OpenAI plans to officially release on May 13th.

The report stated that compared to OpenAI's existing independent image recognition and text-to-speech models, this new model can process image and audio information faster and more accurately. For example, it can help customer service representatives 'better understand the tone of callers and judge whether they are using sarcastic tone'. Theoretically, the model can also help students learn mathematical knowledge or translate signs in the real world.

However, insiders also pointed out that although the model can surpass GPT-4 Turbo in certain problem-solving aspects, it is still possible for it to confidently provide incorrect answers.

TapTechNews noticed that developer Ananay Arora posted a screenshot containing code related to phone calls, implying that OpenAI may be adding phone call functionality to ChatGPT. Arora also found evidence indicating that OpenAI is configuring servers for real-time audio and video communication.

OpenAI CEO Sam Altman has explicitly denied that the upcoming new product is a large language model with the codename GPT-5 (which is said to have significantly better performance than GPT-4). TheInformation suggests that GPT-5 may be officially unveiled by the end of this year. Altman also stated that OpenAI will not release a new artificial intelligence search engine.

If TheInformation's report is true, the release of OpenAI's new product may still have a certain impact on the upcoming Google I/O developer conference. It is well known that Google is also testing technology to make phone calls using artificial intelligence. In addition, Google has a rumored project called 'Pixie' that is said to be released soon. Pixie is a multimodal Google Assistant alternative that can identify objects through the device's camera and provide information such as 'how to get to the purchase location' or 'how to use'.

Likes