Using ControlNet

Modified on Wed, 24 Jan 2024 at 07:41 PM

ControlNet

ControlNet is a powerful set of features developed by the open-source community (notably, Stanford researcher Lvmin Zhang) that allows you to apply a secondary neural network model to your image generation process in Invoke.


With ControlNet, you can get more control over the output of your image generation, providing you with a way to direct the network towards generating images that better fit your desired style or outcome.




Using ControlNet

To use ControlNet, you can simply select the desired model and adjust both the ControlNet and Pre-processor settings to achieve the desired result. You can also use multiple ControlNet models at the same time, allowing you to achieve even more complex effects or styles in your generated images.


Each ControlNet has two settings that are applied to the ControlNet.

Weight - Strength of the Controlnet model applied to the generation for the section, defined by start/end.

Start/End  - 0 represents the start of the generation, 1 represents the end. The Start/end setting controls what steps during the generation process have the ControlNet applied.

Additionally, each ControlNet section can be expanded in order to manipulate settings for the image pre-processor that adjusts your uploaded image before using it in when you Invoke.


ControlNet can be used for all generation methods within InvokeAI; Text2Image,Image2Image, on Unified Canvas and within Workflows.


How it works

ControlNet works by analyzing an input image, pre-processing that image to identify relevant information that can be interpreted by each specific ControlNet model, and then inserting that control information into the generation process. This can be used to adjust the style, composition, or other aspects of the image to better achieve a specific result.


Models

ControlNet comes with a variety of pre-trained models that can be used to achieve different effects or styles in your generated images. These models include:


Canny:

When the Canny model is used in ControlNet, Invoke will attempt to generate images that match the edges detected. 


Canny edge detection works by detecting the edges in an image by looking for abrupt changes in intensity. It is known for its ability to detect edges accurately while reducing noise and false edges, and the preprocessor can identify more information by decreasing the thresholds.




M-LSD

M-LSD is another edge detection algorithm used in ControlNet. It stands for Multi-Scale Line Segment Detector. 


It detects straight line segments in an image by analyzing the local structure of the image at multiple scales.  It can be useful for architectural imagery, or anything where straight-line structural information is needed for the resulting output. 



Lineart

The Lineart model in ControlNet generates line drawings from an input image. The resulting pre-processed image is a simplified version of the original, with only the outlines of objects visible.The Lineart model in ControlNet is known for its ability to accurately capture the contours of the objects in an input sketch. 

Lineart Anime: A variant of the Lineart model that generates line drawings with a distinct style inspired by anime and manga art styles.


Depth: A model that generates depth maps of images, allowing you to create more realistic 3D models or to simulate depth effects in post-processing.


Normal Map (BAE): A model that generates normal maps from input images, allowing for more realistic lighting effects in 3D rendering.



Image Segmentation: A model that divides input images into segments or regions, each of which corresponds to a different object or part of the image. (More details coming soon)


Openpose: The OpenPose control model allows for the identification of the general pose of a character by pre-processing an existing image with a clear human structure. With advanced options, Openpose can also detect the face or hands in the image. 



Mediapipe Face

The MediaPipe Face identification processor is able to clearly identify facial features in order to capture vivid expressions of human faces.


Content Shuffle

The Content Shuffle processor shuffles the image in random ways. The Content Shuffle ControlNet model then learns how to reconstruct the image, interpreting elements in the image in slightly different ways


This can be useful for reimagining content


Tile (experimental) 

The Tile model fills out details in the image to match the image, rather than the prompt. The Tile Model is a versatile tool that offers a range of functionalities. Its primary capabilities can be boiled down to two main behaviors:

- It can reinterpret specific details within an image and create fresh, new elements.
- It has the ability to disregard global instructions if there's a discrepancy between them and the local context or specific parts of the image. In such cases, it uses the local context to guide the process.

The Tile Model can be a powerful tool in your arsenal for enhancing image quality and details. If there are undesirable elements in your images, such as blurriness caused by resizing, this model can effectively eliminate these issues, resulting in cleaner, crisper images. Moreover, it can generate and add refined details to your images, improving their overall quality and appeal. 



Pix2Pix (experimental)

With Pix2Pix, you can input an image into the controlnet, and then "instruct" the model to change it using your prompt. For example, you can say "Make it winter" to add more wintry elements to a scene.




Each of these models can be adjusted and combined with other ControlNet models to achieve different results, giving you even more control over your image generation process.


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article