Technology

OpenAI Enhances ChatGPT with Realistic Image Generation

Published March 25, 2025

OpenAI has made a significant announcement regarding its chatbot, ChatGPT. On Tuesday, the company revealed that it is integrating advanced image-generation technology into ChatGPT, utilizing the GPT-40 model that was introduced last year. This integration marks a notable shift as it combines artificial intelligence capabilities into a single platform.

One-Stop Solution for AI Content

With this new technology, ChatGPT aims to serve as a comprehensive source for artificially generated content, potentially replacing Dall-E 3, which was previously a separate image creation tool. Instead of relying on Dall-E 3 for image generation, ChatGPT will now utilize a platform called Sora, which focuses on video generation. However, the details regarding any video-generation features through ChatGPT remain unspecified in the company's announcement.

Improved Image Rendering Capabilities

One of the key improvements in ChatGPT's image capability is its text rendering ability. OpenAI has emphasized that the updated system can now generate images with text that is clear, meaningful, and legible, avoiding the issues of distorted or poorly rendered text seen in earlier models. According to the company, they have trained their models on a comprehensive dataset of online images and text, allowing the system to understand the relationships between images and language.

"We have developed a model that demonstrates surprising visual fluency, generating images that are contextually relevant and consistent," OpenAI stated in its press release. Additionally, the system can leverage user-uploaded images to generate new visuals and has improved capabilities in following user instructions. The company claims that while many other systems can manage about 5-8 objects in a scene, GPT-40 can effectively handle up to 10-20 distinct objects.

Trade-Offs and Limitations

Despite these advancements, there are some trade-offs. For instance, the model may crop longer images from the bottom and can mistakenly interpret details that aren't present. It also struggles with rendering text in non-Latin languages and has difficulty handling images with very small text. Nonetheless, GPT-40 is available now on various ChatGPT subscription tiers, including plus, pro, team, and free plans. However, "plus" subscribers will experience higher usage limits compared to free users. OpenAI has indicated that this feature will soon be accessible for enterprise and educational users, as well as developers using the application's API.

Demonstrations of the New Technology

OpenAI shared several impressive image demonstrations showcasing the capabilities of the new technology. For instance, one of the prompts required the model to create the following scene:

A wide image taken with a phone of a glass whiteboard in a room overlooking the Bay Bridge. A woman is writing on the board, wearing a t-shirt with a prominent OpenAI logo. Her handwriting is natural and a bit messy, with the photographer's reflection visible in the image.

The text in the image shows a combination of pros and cons related to the technology, detailing concepts like augmented image generation and native in-context learning.

Additional prompts generated various scenarios, including a whimsical four-panel comic strip featuring a serious snail in a car showroom, a candid photo of Karl Marx at a shopping mall, and even photorealistic images of witches reading street signs.

These demonstrations reflect not only the versatile applications of GPT-40's image generation capabilities but also highlight how OpenAI continues to push the boundaries of artificial intelligence.

OpenAI, ChatGPT, GPT-40