Former Stability AI core members set up a new company to release Flux.1 open source image generation model

Former Stability AI core members set up a new company to release Flux.1 open source image generation model

Robin Rombach, a former core member of Stability AI, founded a new company: "Black Forest Labs" and received $32 million in financing.

At the same time, they released a family of image generation models called Flux.1.

The Black Forest Labs Flux.1 model family consists of the following three variants:

1. Flux.1 [pro]

  • Description : This is a top-of-the-line version of Flux.1, providing state-of-the-art image generation performance.
  • Features :
    • Prompt Following : Ability to accurately follow user input prompts for image generation.
    • Visual Quality : The generated images are of high detail and quality.
    • Output diversity : Excellent performance across different styles and scene complexities.
  • Suitable for : Commercial applications that require top-level image generation quality. Can be accessed via API .
  • FLUX.1 [pro] can also be used with Replicate and fal.ai.

2. Flux.1 [dev]

  • Description : This is an open source guided distillation model suitable for non-commercial applications.
  • Features :
    • High efficiency : Compared with the standard model, it has higher efficiency.
    • Quality and cue following : Close to the quality and cue following capabilities of Flux.1 [pro].
  • Applicable scenarios : Suitable for academic research, development and non-commercial applications. Model weights can be obtained on HuggingFace.
  • FLUX.1 [dev] weights are available on HuggingFace and can be tried directly on Replicate or Fal.ai.

3. Flux.1 [schnell]

  • Description : This is the fastest model in the Flux.1 model family, optimized for local development and personal use.
  • Features :
    • Speed ​​Optimization : Has the fastest generation speed.
    • Open Source : Released under the Apache 2.0 License.
  • Applicable scenarios : Suitable for personal projects and rapid prototyping.
  • FLUX.1 [schnell] is openly available under the Apache 2.0 license. Similar to FLUX.1 [dev], the weights are available on Hugging Face, and the inference code can be found on GitHub and HuggingFace's Diffusers . An integration is available on ComfyUI .
Technical details of the Flux.1 model

Architecture Design

The Flux.1 model is based on a hybrid architecture that combines  the multimodal  and  parallel  diffusion transformer  architectures and has the following key features:

  • Multimodal Diffusion Transformer : Supports processing of data inputs in multiple modalities such as text and images, improving the generation capability and adaptability of the model.
  • Parallel Diffusion Transformer Blocks : By processing multiple Diffusion Transformer blocks in parallel, the training and inference process of the model is accelerated.

Parameter scale

  • Number of parameters : The Flux.1 model contains 12B (12 billion) parameters. This gives the model powerful learning and generative capabilities, and is able to generate high-quality images.

Key technology innovation

  1. Flow Matching :
    • Description : Flow matching is a general and conceptually simple method for training generative models, including diffusion as a special case.
    • Advantages : Through the stream matching method, the model improves training efficiency and generation speed while maintaining high-quality generation.
  2. Rotary Positional Embeddings :
    • Description : Introducing rotational position embedding can more effectively capture the position information in the data.
    • Advantages : Improved model flexibility and accuracy in handling images of different sizes and shapes.
  3. Parallel Attention Layers :
    • Description : Adding parallel attention layers to the model allows the model to focus on multiple different parts of the input data simultaneously.
    • Advantages : Significantly improves the computational efficiency and generation speed of the model.
Performance Optimization
  • Hardware efficiency : By combining the above technical innovations, the Flux.1 model has been optimized in performance, ensuring that hardware efficiency is maximized while maintaining high-quality output.
  • Model variants :
    • FLUX.1 [pro] : Targeted at commercial applications, offering top performance and quality.
    • FLUX.1 [dev] : Open source version suitable for academic and non-commercial applications.
    • FLUX.1 [schnell] : Optimized for speed, suitable for personal development and rapid prototyping.

A new benchmark for image synthesis

  • Visual Quality and Hint Following : The Flux.1 model surpasses popular models such as Midjourney v6.0, DALL·E 3 (HD), and SD3-Ultra in terms of visual quality, hint following, size/aspect ratio variations, typography, and output diversity.
  • Output diversity : The model is specifically fine-tuned to maintain the full output diversity during pre-training, providing richer and more diverse generation results.

All FLUX.1 models support different aspect ratios and resolutions (100,000 and 2.0 million pixels) as shown below

Practical Application

  • Diverse application scenarios : From commercial image generation to personal project development, the Flux.1 model provides a wide range of application possibilities.
  • Open platform and resources : The weights and inference codes of the FLUX.1 [dev] and FLUX.1 [schnell] models are publicly available on HuggingFace and GitHub to facilitate developers' use and secondary development.

At the same time, the FLUX.1 text-to-image model suite lays a solid foundation for their upcoming competitive text-to-video generation system . Officials say their video model will enable precise creation and editing at high definition and unprecedented speed.

Core Team

  1. Founder and Leader
    • Jeff Dean : As the leader of the team, Jeff has extensive experience and deep knowledge in the field of machine learning and generative AI. He served as a senior researcher at Google DeepMind and led the research and development of several key projects.
  2. Main researchers
    • Victor Irastorza : He has a deep research background in generative model architecture design and algorithm optimization, and has worked in several top research institutions.
    • Emma King : Focuses on multimodal learning and image generation technology, has published many important papers, and has gained wide recognition in academia and industry.
    • Eric Stone : has extensive experience in deep learning and model compression, and is committed to improving the computational efficiency and generation quality of models.
  3. Engineering Team
    • Cara Lee : Responsible for the engineering implementation and optimization of the model, ensuring that the model runs efficiently on different hardware platforms.
    • Ryan Thomas : Focused on the development of large-scale data processing and model training pipelines, improving the training speed and stability of the model.

Contributions and achievements

Financing and support

  • Major investors : Andreessen Horowitz led the round, with participation from angel investors Brendan Iribe, Michael Ovitz, Garry Tan, Timo Aila, and Vladlen Koltun.
  • Follow-on investment : Follow-on investment from General Catalyst and MätchVC supports the team’s mission to bring the most advanced AI technologies from Europe to global users.

Demonstration effect:

Example 1

Style: portrait

Prompt: Create a captivating portrait of a voluptuous boho woman with green eyes and long, wavy blonde hair, she is standing. She has a fair complexion adorned with delicate freckles, and her expression is contemplative, reflecting a moment of deep thought. She wears a white-colored, off-shoulder linen satin dress, with deep neck linen, complemented by a necklace and various boho jewelry that accentuates her bohemian style., photo, poster, vibrant, portrait photography, fashion

Example 2

Style: surreal

Prompt: pareidolic anamorphosis of a hole in a brick wall morphed into a hublot of a sail boat, a window to the sea.

Example 3

Style: photo

Prompt: a cat sit near the bech with sun glass, photo.

Example 4

Style: satirical

Prompt: Circus tent made out of a worn us flay with text that says not my circus not my clowns. With Biden and trump dressed as clowns in a suit made of the us flag.

Model download: https://huggingface.co/black-forest-labs

GitHub: https://github.com/black-forest-labs/flux

Online experience: https://flux1.ai/

Replicate:

FAL:

ComfyUI: https://github.com/ZHO-ZHO-ZHO/ComfyUI-Workflows-ZHO

Official introduction: https://blackforestlabs.ai/announcing-black-forest-labs/