OpenAI President and co-founder Greg once again showcased the capabilities of GPT-4o, prompting netizens to reminisce about DALL-E.
Directly observing the results, remarkable textual consistency is achieved! Fine hand details, lighting effects, and even the logo on the back are all accurately depicted.
Some netizens remarked that they momentarily thought it was actually the individual delivering the lecture.
Others marveled at the leap in image generation technology: Holy Cow!
From completely fragmented text to writing with consistent style and correct spelling, it only took one iteration.
Since the release of GPT-4o, it has been extensively explored, especially in the realm of image generation.
For instance, some users discovered that GPT-4o excels in combining any images.
Present it with two completely unrelated images.
The transformation results in a "Sad Frog Edition of cereal", which could easily be used by a design company.
However, tasks like generating statistical charts faced challenges...such as coloring the top 10% of a normal distribution in red, which was not successfully completed.
There were even more challenging outcomes...
This led some to believe that this should be DALL-E, as GPT-4o still cannot generate images.
Now, Greg has personally demonstrated the image generation capabilities of GPT-4o, which can be seen as a response.
Although, some users in the comments queried: Is this really the same version? Can more complete details be provided?
Regardless, this open presentation by OpenAI has allowed more individuals to explore the abilities of GPT-4o.
For instance, in contextual understanding, some users found it surpasses GPT-4-Turbo by far.
Moreover, in conversational contexts, GPT-4o is more willing to discuss its feelings and awareness compared to ChatGPT.
The head of the Omni team, Prafulla Dhariwal, expressed gratitude for his team members on social media, revealing that this work began a year ago.
Prafulla Dhariwal, an MIT graduate, has been with OpenAI for 7 years.
He mentioned that GPT-4o is their team's first model release and is OpenAI's first native multimodal large model.
This was followed by recognition of team members.
James Betker: Responsible for image and audio generation, data preparation, integration, and subsequent training.
Jamie Kiros: Handles GPT-4o's visual perception.
Rowan Zellers: Enables the model/product to naturally watch videos like humans.
Alexis Conneau: The first to propose the Her vision at OpenAI. Noted in his profile as the audio AGI director.
Gabriel Goh, Ishaan Gulrajani: Responsible for Scaling Law related work.
Alex Nichol, Heewoo Jun, Li JING: Ensure the image and 3D generation capabilities of GPT-4o.
Following this, Altman also tweeted an endorsement, stating that this work has sparked a revolution that can change the way we use computers.
If you have used GPT-4o, feel free to share your experiences in the comments.
Reference link: This article is from the WeChat public account: Quantum Bit (ID: QbitAI), author: Bai Jiao