Control Net

Adding Conditional Control to Text-to-Image Diffusion Models

../_images/he.png

controlnet block

完整的controlnet只在encoder部分。

../_images/sd.png

full controlnet

模型命名

../_images/cn_fmt.png

controlnet model naming method

{项目名称,通常为control}{版本和修改次数,例如v11表示1.1版本,v11f1表示1.1版本的修改1次}{p表示产品版,e表示实验版,u表示正在训练}{基于的预训练模型,常见的sd15为stable diffison 1.5}_{图像控制方法,例如canny为草图}.pth 例如 control_v11p_sd15_canny.pth 表示 control 项目下,1.1版本的产品版,基于stable diffison 1.5,使用 canny 标注引导的模型文件

Sudden Converge Phenomenon

Hint

确定模型收敛的步数,然后通过梯度累计增大batchsize,重新训练。

Because we use zero convolutions, the SD should always be able to predict meaningful images. (If it cannot, the training has already failed.)

You will always find that at some iterations, the model suddenly be able to fit some training conditions. This means that you will get a basically usable model at about 3k to 7k steps (future training will improve it, but that model after the first “sudden converge” should be basically functional).

Note that 3k to 7k steps is not very large, and you should consider larger batch size rather than more training steps. If you can observe the “sudden converge” at 3k step using batch size 4, then, rather than train it with 300k further steps, a better idea is to use 100× gradient accumulation to re-train that 3k steps with 100× batch size. Note that perhaps we should not do this too extremely (perhaps 100x accumulation is too extreme), but you should consider that, since “sudden converge” will always happen at that certain point, getting a better converge is more important.

Because that “sudden converge” always happens, lets say “sudden converge” will happen at 3k step and our money can optimize 90k step, then we have two options: (1) train 3k steps, sudden converge, then train 87k steps. (2) 30x gradient accumulation, train 3k steps (90k real computation steps), then sudden converge.

In my experiments, (2) is usually better than (1). However, in real cases, perhaps you may need to balance the steps before and after the “sudden converge” on your own to find a balance. The training after “sudden converge” is also important.

But usually, if your logic batch size is already bigger than 256, then further extending the batch size is not very meaningful. In that case, perhaps a better idea is to train more steps. I tried some “common” logic batch size at 64 or 96 or 128 (by gradient accumulation), it seems that many complicated conditions can be solved very well already.

Model - Preprocessor

a photo of an astronaut riding a horse on mars

https://ai.plainenglish.io/controlnet-a-revolutionizing-game-changing-tool-for-image-generation-dea5ae1f0144

Old model repo: https://huggingface.co/webui/ControlNet-modules-safetensors New 1.1: https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main

Copy from: https://github.com/Mikubill/sd-webui-controlnet/discussions/564#discussioncomment-5719326

Tried to update this mapping for ControlNet 1.1:

model

preprocessor(s)

control_v11p_sd15_canny

canny

control_v11p_sd15_mlsd

mlsd

control_v11f1p_sd15_depth

depth_midas, depth_leres, depth_zoe

control_v11p_sd15_normalbae

normal_bae

control_v11p_sd15_seg

seg_ofade20k, seg_ofcoco, seg_ufade20k

control_v11p_sd15_inpaint

inpaint_global_harmonious?

control_v11p_sd15_lineart

lineart_standard (?), lineart_realistic, lineart_coarse

control_v11p_sd15s2_lineart_anime

lineart_anime

control_v11p_sd15_openpose

openpose (body), openpose_face, openpose_faceonly, openpose_full (body+hand+face), openpose_hand

control_v11p_sd15_scribble

scribble_hed, scribble_pidinet

control_v11p_sd15_softedge

softedge_pidinet, softedge_pidisafe, softedge_hed, softedge_hed_safe

control_v11e_sd15_shuffle (experimental)

shuffle

control_v11e_sd15_ip2p (experimental)

-

control_v11u_sd15_tile (unfinished)

tile_gaussian?

preprocessor

ControlNet 1.1 model

invert (from white bg & black line)

canny

control_v11p_sd15_canny

depth_leres

control_v11f1p_sd15_depth

depth_midas

control_v11f1p_sd15_depth

depth_zoe

control_v11f1p_sd15_depth

inpaint_global_harmonious

control_v11p_sd15_inpaint

lineart_anime

control_v11p_sd15s2_lineart_anime

lineart_coarse

control_v11p_sd15_lineart

lineart_realistic

control_v11p_sd15_lineart

lineart_standard (from white bg & black line)

control_v11p_sd15_lineart (?)

mediapipe_face

mlsd

control_v11p_sd15_mlsd

normal_bae

control_v11p_sd15_normalbae

normal_midas

abandoned in 1.1

openpose

control_v11p_sd15_openpose

openpose_face

control_v11p_sd15_openpose

openpose_faceonly

control_v11p_sd15_openpose

openpose_full

control_v11p_sd15_openpose

openpose_hand

control_v11p_sd15_openpose

scribble_hed

control_v11p_sd15_scribble

scribble_pidinet

control_v11p_sd15_scribble

scribble_xdog

seg_ofade20k

control_v11p_sd15_seg

seg_ofcoco

control_v11p_sd15_seg

seg_ufade20k

control_v11p_sd15_seg

shuffle

control_v11e_sd15_shuffle

softedge_hed

control_v11p_sd15_softedge

softedge_hedsafe

control_v11p_sd15_softedge

softedge_pidinet

control_v11p_sd15_softedge

softedge_pidisafe

control_v11p_sd15_softedge

threshold

tile_gaussian

control_v11u_sd15_tile?

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_ip2p.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_ip2p.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_shuffle.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_shuffle.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_inpaint.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_inpaint.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_mlsd.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_mlsd.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_normalbae.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_normalbae.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_scribble.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_scribble.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_seg.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_seg.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15s2_lineart_anime.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15s2_lineart_anime.yaml

# https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix
https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/chilloutmix_NiPrunedFp32Fix.safetensors
https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/Korean-doll-likeness.safetensors
https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/Japanese-doll-likeness.safetensors
https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/Taiwan-doll-likeness.safetensors

dog on grassland lowers, bad anatomy, bad hands, text, error, missing fingers

How work?

Tile

https://github.com/lllyasviel/ControlNet-v1-1-nightly/blob/78631203a6739cde76a728062b475549a24f94c6/annotator/util.py#L30 input resize, min side to resolution


Reference

  1. controlnet 1.1版本更新内容及参数条件详解