Getting Started with Workflows - Denoise Latents

Modified on Tue, 10 Sep, 2024 at 3:22 PM

Workflows provide an extremely flexible tool for creating image generation pipelines, automating manual interface steps that you’d need to execute in order to perform standard control + transformation techniques.

This document will detail some of the core elements of utilizing workflows for the standard image generation pipeline, as well as some of the related nodes that can be used in the process.

When getting started with Workflows, you can open the Library, and use a Default Workflow to get started. This can help provide a strong foundation for any edits you’d like to make to the standard Denoising process.

To explain how these workflows operate, you’ll want to understand the core element of the currently supported Diffusion architecture - The Denoise Latents node.

Denoise Latents Node

The core processing node for creating an image is the Denoise Latents node - This executes the integrated process of Denoising based on the inputs that you provide.

The key inputs are detailed below, as well as which are optional.

Conditioning

The key guidance that shapes the image through the generation process is the Conditioning - Conditioning is the output of your prompts being encoded, and can be provided as a Positive conditioning (elements that the output should be biased towards), and a Negative conditioning (elements that the output should be biased against).

Conditioning inputs are typically generated by “Prompt” nodes for the relevant model type that you are generating with. Currently, there is a separate Conditioning node for an SD1.5 Model (Prompt) and SDXL Model (SDXL Prompt).

The core inputs for these are text strings that will then be output into the Conditioning type object that can be passed into either the Positive or Negative Conditioning input.

For SDXL Conditioning nodes, we recommend including the full, duplicated Prompt for both the Positive and Style inputs. This is most easily exposed by providing the output of a String Primitive node.

Setting the Generation Model

During generation, you’ll need to specify the set of model weights that you want to utilize during generation. This is configured by adding a Main Model node (or SDXL Main Model) and outputting the UNet into the Denoise Latents node.

If you’d like to utilize a Concept Model (or LoRA), you would first add a LoRA Model node, pass the UNet weights through the LoRA node to patch in the weights of your concept model, and then output the resulting weights into the Denoise Latents node. This can be repeated/chained multiple times for different LoRA nodes.

You’ll want to ensure that your conditioning nodes utilize the CLIP outputs (which are used to encode text into the conditioning output) from the last model node into your Prompt nodes.

Noise

The Noise node generates the core/base image latents that will be processed through the Denoising process. The size of the image that you’ll generate is defined by the width/height of the Noise node.

The Noise node can be combined with a Random Integer to have a randomized noise seed in each generation.

Latents (Optional)

If you’re looking to utilize existing latents (for example, doing an “Image > Image” transformation), you can elect to pass in Latents into the Denoising node from an Image to Latents node. You can also pass in latents that have been partially processed by a previous Denoise Latents node to change conditioning or models at different parts of the denoising pipeline. You will want to ensure that your Denoising Start is increased beyond 0, in order to control the Denoising Strength.

Controls (Optional)

Controls allow you to use ControlNet models in order to provide structural guidance to the generation process. To utilize a ControlNet Model, you’ll typically want to introduce two nodes to your workflow that provide additional information to the denoising process:

ControlNet:

The ControlNet node selects a ControlNet model to interpret processed images and control the generation. You can connect the output directly into either the Control input, or a Collect node in order to provide multiple Control inputs into a single Denoise Latents Node. You can learn more about each ControlNet model by reading the support article on ControlNet models.

Processor Node:

Each ControlNet model expects a specific type of image in order to effectively control the generation process. For each ControlNet model, there are Processor nodes which can prepare any image input into the system to be utilized for that ControlNet model. For example, you can use a Canny processor for the Canny ControlNet.

IP Adapter

IP Adapter allows for Image Prompts to be passed in using the IP Adapter node. Similar to the ControlNet node, you can either pass the IP Adapter output directly into the Denoise Latents node, or you can use a Collect node to provide multiple IP Adapter image prompts directly into the Denoise Latents process.

T2I Adapter

T2I Adapters allow you to use T2I Adapter models to provide structural guidance to the generation process. Our standard recommendation is to use ControlNet instead of T2I Adapters, unless there’s a specific T2I adapter you need access to use. This is because T2I Adapters are primarily designed for smaller resource overhead, which may not be as big of a concern using our product.

Similar to the ControlNet node, you can either pass the T2I Adapter output directly into the Denoise Latents node, or you can use a Collect node to provide multiple T2I Adapters directly into the Denoise Latents process.

Denoise Mask

When generating an image, a Denoise Mask can be created using the “Denoise Mask” or “Gradient Mask” nodes. This process is how Invoke executes workflows like Inpainting on the Canvas - The Denoising occurs within the area specified by the mask (the black section of a black/white mask), with the unmasked section continually being passed into the generation process to maintain the unmasked section’s content.

For more information on the below, check out General Settings

Steps

This controls the number of steps executed during the Denoising process, assuming a start/end of 0/1 respectively. Partial denoising runs with a later start or early end will use the correct % of steps from your configured value, based on the scheduler selected.

CFG Scale

The CFG Scale operates in the same manner as in the Generation Tab - Higher values will drive higher adherence to prompt terms. The recommended value to avoid sub-optimal generations is typically between 5-8.

Denoising Start & End

Denoising Start & End control how much, and which parts, of the denoising process are executed during the Denoising process. This is represented by 0 (the beginning) and 1 (the end) of the process.

If a denoising process will be started at a value > 0, it is assumed that Latents will be provided as an input in order to perform either an Image to Image workflow (Noise + Latents) or continue the denoising process with different settings/inputs (latents only).

Scheduler

The Scheduler selection controls two elements of the denoising process - how information is sampled from images, and how steps are executed.

CFG Rescale Multiplier

The CFG Rescale Multiplier is an advanced setting for certain models trained specifically for it. You only need to utilize it if you have a model requiring the setting - simply follow the instructions for that model.