Krea 2: Open-Weights Image Model That Caught the Frontier
Krea 2 is a 12B open-weights image model rivaling closed APIs. Here is what the technical report reveals and how to run it locally.
The closed frontier just got company. On June 22, 2026, Krea released the weights of Krea 2 — a 12.9-billion-parameter diffusion transformer trained from scratch on billions of real images — and the Hacker News thread hit 348 points within hours. The release ships as two complementary checkpoints: Krea 2 Raw, an undistilled base model built for fine-tuning and LoRA training, and Krea 2 Turbo, an 8-step distilled engine that generates 2K images in roughly two seconds on consumer hardware. Both are available on Hugging Face under a community license that allows free commercial use for individuals and small teams.
What makes this release different from the usual open-weights drop is the depth of what came with it. Krea published a full technical report detailing everything from data curation philosophy to distributed training infrastructure — the kind of document that frontier labs typically keep behind closed doors. As one community member put it: "Krea-2 is the single most uncensored nonlobotomized open source image model we've gotten in years."
What Krea 2 Actually Is
At its core, Krea 2 is a single-stream diffusion transformer. The architecture uses a 12.9B dense DiT backbone with 28 transformer blocks at width 6144, grouped-query attention with gated sigmoid attention, SwiGLU MLPs at 4x expansion, and 3D axial RoPE for positional encoding. The text encoder is Qwen3-VL-4B-Instruct with a novel multi-layer feature aggregation mechanism that dynamically selects coarse-to-fine text representations — a meaningful upgrade over relying solely on a language model's final-layer outputs.
The two-checkpoint system is intentional. Raw is the undistilled mid-training checkpoint — diverse, malleable, and designed specifically for researchers and fine-tuners to customize. Turbo is the production engine: an 8-step distilled version that runs with zero classifier-free guidance overhead. The transfer between them is engineered, not accidental: LoRAs trained on Raw are designed to apply directly to Turbo for inference.
Krea 2 ranks #1 among text-to-image models from independent labs on Artificial Analysis, and sits within 0.14 points of GPT Image 2 on style fidelity — closing the gap with the closed frontier more than any prior open-weights release.
What the Technical Report Reveals
Most open-weights releases come with a model card and a README. Krea dropped a technical report that reads more like a graduate thesis. Here is what stood out.
No Synthetic Data, by Design
The team explicitly rejects synthetic training data. Their position: "even a small proportion of AI-generated images introduces biases" that degrade output diversity. Instead, they built a multi-stage pipeline that processes billions of real images through increasingly selective filters — from Laplacian edge detection and RGB entropy checks at 256px, through quality and complexity scoring at 512px, to hierarchical k-means clustering with FAISS at 1024px.
They also ran PageRank over English Wikipedia to identify the top 5 million representable concepts, then prioritized sampling images that reference rare entities. The goal is not just high-quality outputs — it is broad world knowledge.
A Six-Stage Training Pipeline
The training pipeline runs six stages, each building on the last:
- Pretraining — progressive resolution from 256px to 1024px, using 8-bit training at lower resolutions for 15–20% speed gains
- Midtraining — bridges pretraining to SFT, equipping the model with high-resolution and text-rendering capabilities
- Supervised Fine-Tuning — small, hand-curated datasets targeting specific visual domains
- Preference Optimization — a custom method called STPO (Stabilized Temporal Preference Optimization) that prevents the model from degrading both winning and losing samples
- Reinforcement Learning — multi-reward GRPO with four independent signals: aesthetics, prompt-following, text rendering, and artifact detection
- Timestep Distillation — creates the Turbo checkpoint via Trajectory Distribution Matching
Rubric-Based RL Rewards
The RL stage introduces what might be the report's most transferable innovation. Instead of asking a judge model for a single holistic score, the system decomposes each prompt into individually verifiable requirements. A prompt for "a golden retriever on a mountain trail at sunset" gets broken into checks for entity presence, composition, lighting, and style adherence — each scored independently.
This prevents the common failure mode where optimizing for a single aesthetic score leads to reward hacking. The team adds a dedicated artifact reward model that catches extra fingers, malformed limbs, and distorted text — structural errors that "are visually obvious to humans but are often missed by general-purpose VLM judges."
Infrastructure Worth Reading About
For teams running their own training, the infrastructure section is unusually practical. Krea built a custom PostgreSQL-based system called Krablet that handles 208 TB of metadata and processes tens of thousands of contended UPSERT transactions per second. Their key finding on scaling: doubling GPU count produced "substantially more instability than anticipated," with runs above 128 GPUs failing to complete a single 24-hour run without crashes. Fabric instability — link flapping, packet errors, congestion — was the single largest contributor.
Checkpoint completion takes approximately 30 seconds on their Weka filesystem, which replaced Ceph after performance issues. The team optimizes for mean time between failures and mean time to recovery rather than building global checkpoint recovery systems.
How Krea 2 Compares to the Closed Frontier
According to the BuildFastWithAI review, Krea 2 is the #1 text-to-image model from an independent lab on the Artificial Analysis leaderboard, ranking #6 globally. It closes the gap with GPT Image 2 on style fidelity to within 0.14 points while generating 2K images in approximately two seconds — matching FLUX.1-schnell's speed with broader aesthetic range.
Where Krea 2 differentiates from Midjourney and closed APIs is style control. Midjourney's controls live in text flags and the system is opinionated about what "style" means. Krea 2 extracts palette, line work, texture, lighting, and composition from reference images, with continuous strength sliders for each reference. The difference matters for creative studios that need a specific visual direction rather than a default AI aesthetic.
How to Actually Run Krea 2 Locally
The hardware floor is lower than the 12.9B parameter count suggests. Here is what you need.
The Quick Path: ComfyUI + FP8
The fastest route to running Krea 2 locally is through ComfyUI. The community has already produced FP8-quantized weights that shrink the transformer from 24.76 GiB (BF16) to 12.01 GiB, fitting it on a 16GB GPU. The architecture uses sequential processing — the text encoder loads, encodes your prompt, then unloads before sampling begins — keeping peak VRAM within the 16GB constraint.
Minimum hardware:
- GPU: 16GB VRAM (RTX 4060 Ti 16GB, RTX 5080, RTX 4090)
- System RAM: 16GB minimum, 32GB recommended
- Storage: ~18GB for model files
- Software: ComfyUI 0.25.0+ with CUDA 12.8+
Setup in three steps:
- Update ComfyUI to 0.25.0+
- Download the FP8 model files from Comfy-Org/Krea-2 on Hugging Face —
krea2_turbo_fp8_scaled.safetensorsandqwen3vl_4b_fp8_scaled.safetensors - Load the native workflow JSON — no custom nodes required
Enter a prompt, select a resolution, and click Queue. The defaults (8 steps, prompt enhancement enabled) produce a high-quality image with minimal configuration.
The Developer Path: Official Inference Code
For programmatic access, the official GitHub repository provides the inference code. You will need the full BF16 weights from Hugging Face (Raw or Turbo) and a GPU with 24GB+ VRAM.
The Cloud Path
If local hardware is a constraint, day-zero integrations are already live on fal, Replicate, Together AI, Cloudflare, and SGLang. The fal integration was highlighted as "4x cheaper than NBP" in the HN discussion.
For more on running open-weight models on your own hardware, see our guide to running AI models locally.
For LoRA fine-tuning, train on Raw and deploy on Turbo. The transfer is specifically engineered — LoRAs trained on Raw "transfer strongly to Turbo" for production inference. Ostris AI Toolkit and standard HuggingFace diffusers workflows both work.
The License: What You Can and Cannot Do
The Krea 2 Community License is not Apache-2.0 or MIT — it is a custom agreement with clear commercial guardrails.
Free commercial use applies if your total company-wide annual revenue is under $1 million USD and you have fewer than 50 seats. That covers most solo developers, startups, and small studios.
Enterprise licensing is required for organizations above either threshold. The VentureBeat analysis notes this positions Krea 2 as enterprise-grade while maintaining accessibility for the community that drives adoption.
Content filtering is mandatory. Unlike truly permissive open-source licenses, the Krea 2 license legally binds deployers to implement content moderation — open-source classifiers, commercial moderation APIs, or manual review. This is a meaningful requirement for anyone building a product on top of these weights.
The Community Response
The community moved fast. Within hours of the Hugging Face release, Krea 2 was running in ComfyUI with quantized variants appearing almost immediately. SGLang added day-zero support, and early testers were already fine-tuning LoRAs.
Krea's own team emphasized that "the open-source community has always been vital for Krea, and having raw/undistilled models is something we always missed." The release of Raw alongside Turbo — giving the community the undistilled checkpoint that most labs keep internal — was the decision that earned the most goodwill.
Not everything was praise. The HN discussion surfaced VAE quality concerns, with some users reporting an "airbrushed" quality in certain outputs. The Krea team responded directly: "We tried to optimize for realistic focus and not over-sharpening, which leads to a hyper AI-look." Whether that trade-off works depends on your use case — product photography benefits from natural softness, while technical illustration may want sharper edges.
Fine-Tuning: Where Raw Earns Its Name
The dual-checkpoint design is not just a convenience — it is the release's architectural thesis. Raw ships as an undistilled mid-training checkpoint with no post-training alignment baked in, making it unusually malleable for custom fine-tuning. Train a LoRA on Raw targeting your specific aesthetic — product photography, architectural renders, editorial illustration — and then deploy that LoRA on Turbo for production inference at full speed.
The community has already validated the workflow. Apolinario from Hugging Face reported LoRA training and inference working smoothly within days of release, with demos and training notebooks available on Hugging Face Spaces. Tools like Ostris AI Toolkit, kohya-ss/musubi-tuner, and standard HuggingFace diffusers all support the Krea 2 architecture. Four official style LoRAs ship with the release as starting points.
This Raw-to-Turbo transfer pathway matters because it solves a persistent problem in the open-weights image space: most distilled models lose fine-tuning flexibility in exchange for speed. Krea 2 decouples those concerns by design, giving teams a research-grade base model and a production-grade inference engine that share the same latent space.
What This Means for the Open-Weights Race
Krea 2 is the strongest evidence yet that the closed-vs-open gap in image generation is compressing. A 12B model from an independent lab now sits within 0.14 points of GPT Image 2 on quality benchmarks, runs at comparable speed, and ships with the kind of style-control system that closed APIs still lack.
The technical report's roadmap hints at what is next: mixture-of-experts architectures, native 2K–4K resolution with sparse attention, NVFP4 training for further efficiency gains, and multi-teacher on-policy distillation. The AI Weekly coverage noted this positions Krea 2 not as a one-off release but as the foundation for a family of models.
For developers and creative studios evaluating their image-generation stack, the calculus has shifted. The floor you can actually own — download, fine-tune, deploy without API dependency — just moved up to the frontier. Whether that changes your architecture depends on your constraints: if you need style control beyond what closed APIs offer, or if API costs at scale make self-hosting attractive, Krea 2 is now the model to benchmark against.
For a broader comparison of where Krea 2 fits among current options, see our AI image generators roundup. And if you are evaluating open-weight models more broadly, our coverage of GLM-5.2's local setup and DiffusionGemma's block-parallel architecture covers the other recent entrants reshaping the open-weights landscape.
About ComputeLeap Team
The ComputeLeap editorial team covers AI tools, agents, and products — helping readers discover and use artificial intelligence to work smarter.
💬 Join the Discussion
Have thoughts on this article? Discuss it on your favorite platform:
Related Articles
Unlimited-OCR vs Mistral OCR 4: Which One Wins?
Baidu and Mistral both shipped OCR models the same day. One is open-weight and parses 40-page PDFs in one shot. The other costs $4/1K pages.
GLM-5.2 Is Cheap Because It's Subsidized, Not Efficient
GLM-5.2 burns 2x the tokens of its predecessor. The real cost edge is provider pricing — and it's repriceable overnight.
Z.ai Open-Sourced slime: GLM-5.2 Post-Training Stack
Z.ai released slime, the RL post-training framework behind GLM-5.2. Full OPD in 2 days. Here's why the factory matters more than the model.
The ComputeLeap Weekly
Get a weekly digest of the best AI infra writing — Claude Code, agent frameworks, deployment patterns. No fluff.
Weekly. Unsubscribe anytime.