Former Stability AI core members set up a new company to release Flux.1 open source image generation model
Robin Rombach, a former core member of Stability AI, founded a new company: "Black Forest Labs" and received $32 million in financing.
At the same time, they released a family of image generation models called Flux.1.
The Black Forest Labs Flux.1 model family consists of the following three variants:
1. Flux.1 [pro]
- Description : This is a top-of-the-line version of Flux.1, providing state-of-the-art image generation performance.
- Features :
- Prompt Following : Ability to accurately follow user input prompts for image generation.
- Visual Quality : The generated images are of high detail and quality.
- Output diversity : Excellent performance across different styles and scene complexities.
- Suitable for : Commercial applications that require top-level image generation quality. Can be accessed via API .
- FLUX.1 [pro] can also be used with Replicate and fal.ai.
2. Flux.1 [dev]
- Description : This is an open source guided distillation model suitable for non-commercial applications.
- Features :
- High efficiency : Compared with the standard model, it has higher efficiency.
- Quality and cue following : Close to the quality and cue following capabilities of Flux.1 [pro].
- Applicable scenarios : Suitable for academic research, development and non-commercial applications. Model weights can be obtained on HuggingFace.
- FLUX.1 [dev] weights are available on HuggingFace and can be tried directly on Replicate or Fal.ai.
3. Flux.1 [schnell]
- Description : This is the fastest model in the Flux.1 model family, optimized for local development and personal use.
- Features :
- Speed Optimization : Has the fastest generation speed.
- Open Source : Released under the Apache 2.0 License.
- Applicable scenarios : Suitable for personal projects and rapid prototyping.
- FLUX.1 [schnell] is openly available under the Apache 2.0 license. Similar to FLUX.1 [dev], the weights are available on Hugging Face, and the inference code can be found on GitHub and HuggingFace's Diffusers . An integration is available on ComfyUI .
Architecture Design
The Flux.1 model is based on a hybrid architecture that combines the multimodal and parallel diffusion transformer architectures and has the following key features:
- Multimodal Diffusion Transformer : Supports processing of data inputs in multiple modalities such as text and images, improving the generation capability and adaptability of the model.
- Parallel Diffusion Transformer Blocks : By processing multiple Diffusion Transformer blocks in parallel, the training and inference process of the model is accelerated.
Parameter scale
- Number of parameters : The Flux.1 model contains 12B (12 billion) parameters. This gives the model powerful learning and generative capabilities, and is able to generate high-quality images.
Key technology innovation
- Flow Matching :
- Description : Flow matching is a general and conceptually simple method for training generative models, including diffusion as a special case.
- Advantages : Through the stream matching method, the model improves training efficiency and generation speed while maintaining high-quality generation.
- Rotary Positional Embeddings :
- Description : Introducing rotational position embedding can more effectively capture the position information in the data.
- Advantages : Improved model flexibility and accuracy in handling images of different sizes and shapes.
- Parallel Attention Layers :
- Description : Adding parallel attention layers to the model allows the model to focus on multiple different parts of the input data simultaneously.
- Advantages : Significantly improves the computational efficiency and generation speed of the model.
- Hardware efficiency : By combining the above technical innovations, the Flux.1 model has been optimized in performance, ensuring that hardware efficiency is maximized while maintaining high-quality output.
- Model variants :
- FLUX.1 [pro] : Targeted at commercial applications, offering top performance and quality.
- FLUX.1 [dev] : Open source version suitable for academic and non-commercial applications.
- FLUX.1 [schnell] : Optimized for speed, suitable for personal development and rapid prototyping.
A new benchmark for image synthesis
- Visual Quality and Hint Following : The Flux.1 model surpasses popular models such as Midjourney v6.0, DALL·E 3 (HD), and SD3-Ultra in terms of visual quality, hint following, size/aspect ratio variations, typography, and output diversity.
- Output diversity : The model is specifically fine-tuned to maintain the full output diversity during pre-training, providing richer and more diverse generation results.
All FLUX.1 models support different aspect ratios and resolutions (100,000 and 2.0 million pixels) as shown below
Practical Application
- Diverse application scenarios : From commercial image generation to personal project development, the Flux.1 model provides a wide range of application possibilities.
- Open platform and resources : The weights and inference codes of the FLUX.1 [dev] and FLUX.1 [schnell] models are publicly available on HuggingFace and GitHub to facilitate developers' use and secondary development.
At the same time, the FLUX.1 text-to-image model suite lays a solid foundation for their upcoming competitive text-to-video generation system . Officials say their video model will enable precise creation and editing at high definition and unprecedented speed.
Core Team
- Founder and Leader
- Jeff Dean : As the leader of the team, Jeff has extensive experience and deep knowledge in the field of machine learning and generative AI. He served as a senior researcher at Google DeepMind and led the research and development of several key projects.
- Main researchers
- Victor Irastorza : He has a deep research background in generative model architecture design and algorithm optimization, and has worked in several top research institutions.
- Emma King : Focuses on multimodal learning and image generation technology, has published many important papers, and has gained wide recognition in academia and industry.
- Eric Stone : has extensive experience in deep learning and model compression, and is committed to improving the computational efficiency and generation quality of models.
- Engineering Team
- Cara Lee : Responsible for the engineering implementation and optimization of the model, ensuring that the model runs efficiently on different hardware platforms.
- Ryan Thomas : Focused on the development of large-scale data processing and model training pipelines, improving the training speed and stability of the model.
Contributions and achievements
- Including the creation of VQGAN and Latent Diffusion , Stable Diffusion models for image and video generation ( Stable Diffusion XL , Stable Video Diffusion , Rectified Flow Transformers ), and Adversarial Diffusion Distillation for ultra-fast real-time image synthesis .
Financing and support
- Major investors : Andreessen Horowitz led the round, with participation from angel investors Brendan Iribe, Michael Ovitz, Garry Tan, Timo Aila, and Vladlen Koltun.
- Follow-on investment : Follow-on investment from General Catalyst and MätchVC supports the team’s mission to bring the most advanced AI technologies from Europe to global users.
Demonstration effect:
Example 1
Style: portrait
Prompt: Create a captivating portrait of a voluptuous boho woman with green eyes and long, wavy blonde hair, she is standing. She has a fair complexion adorned with delicate freckles, and her expression is contemplative, reflecting a moment of deep thought. She wears a white-colored, off-shoulder linen satin dress, with deep neck linen, complemented by a necklace and various boho jewelry that accentuates her bohemian style., photo, poster, vibrant, portrait photography, fashion
Example 2
Style: surreal
Prompt: pareidolic anamorphosis of a hole in a brick wall morphed into a hublot of a sail boat, a window to the sea.
Example 3
Style: photo
Prompt: a cat sit near the bech with sun glass, photo.
Example 4
Style: satirical
Prompt: Circus tent made out of a worn us flay with text that says not my circus not my clowns. With Biden and trump dressed as clowns in a suit made of the us flag.
Model download: https://huggingface.co/black-forest-labs
GitHub: https://github.com/black-forest-labs/flux
Online experience: https://flux1.ai/
Replicate:
- https://replicate.com/collections/flux
- https://replicate.com/black-forest-labs/flux-pro
- https://replicate.com/black-forest-labs/flux-dev
- https://replicate.com/black-forest-labs/flux-schnell
FAL:
- https://fal.ai/models/fal-ai/flux-pro
- https://fal.ai/models/fal-ai/flux/dev
- https://fal.ai/models/fal-ai/flux/schnell
ComfyUI: https://github.com/ZHO-ZHO-ZHO/ComfyUI-Workflows-ZHO
Official introduction: https://blackforestlabs.ai/announcing-black-forest-labs/