Stable Diffusion 3.0 update (almost) perfects typography in generated images

Stability AI is diffusing the line between real and rendered text with this one

By Zane Khan February 22, 2024, 12:32 9 comments

Stable Diffusion 3.0 update (almost) perfects typography in generated images

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

Why it matters: AI image generation is leaving the uncanny valley behind. Stability AI is rapidly advancing, making fake visuals truly indistinguishable from reality with its latest project. However, as rivals like Dall-E and Midjourney also enhance their capabilities, it's evident that this isn't just about achieving the clearest text; it's about leading the next wave of AI innovation.

Stability AI is tantalizing AI art enthusiasts with an early preview of its next-generation text-to-image model, Stable Diffusion 3.0. The startup has opened a waitlist for early access to the upgraded AI system, which promises crisper images, improved multi-subject handling, and significantly enhanced text rendering.

Typography has long been an Achilles' heel for AI image generation models like Stable Diffusion, even as they've become nearly indistinguishable from reality in other aspects. However, Stability AI asserts that the new 3.0 edition will offer a substantial improvement in rendering legible text and ensuring accurate spellings within generated visuals.

One example highlighted in the press release particularly caught our eye: an image of a city bus that looks virtually impossible to distinguish from an actual photograph, complete with impeccable text rendering on the road sign and the vehicle's side. While there are still minor imperfections (the license plate appears distorted), the overall quality represents a quantum leap from the model's predecessors.

That may not sound surprising when considering that, under the hood, Stable Diffusion 3.0 represents a major architectural overhaul from its predecessors. It employs a new "diffusion transformer" approach, similar to OpenAI's recent Sora model – a stark departure from the original Stable Diffusion architecture, according to Stability AI CEO Emad Mostaque, who spoke with VentureBeat.

Stable Diffusion 3.0 also integrates other cutting-edge techniques like "flow matching" – a novel method for training AI systems to better model complex data distributions. The researchers behind flow matching claim it enables faster training, more efficient sampling, and improved overall performance compared to traditional diffusion methods.

The revamped model suite will span a range of 800 million to 8 billion parameters when it eventually sees a full release. But before that public launch, Stability AI is putting the model through its paces with a closed preview to gather feedback and strengthen safety guardrails. The startup has implemented numerous safeguards for this preview release, with more in development through collaboration with researchers, experts, and, of course, its own community.

Stability AI's ambitions don't stop here, though. Mostaque has hinted that the new Stable Diffusion model will underpin the company's forthcoming work in 3D modeling, video synthesis, and other novel AI visual capabilities.

Interested parties can sign up for the waitlist.

9 comments 33 likes and shares

Tech Jobs: Find the next step in your career

Related Stories

Featured on TechSpot