Control Net¶

Adding Conditional Control to Text-to-Image Diffusion Models

完整的controlnet只在encoder部分。

模型命名¶

../_images/cn_fmt.png — controlnet model naming method¶

{项目名称，通常为control}{版本和修改次数，例如v11表示1.1版本，v11f1表示1.1版本的修改1次}{p表示产品版，e表示实验版，u表示正在训练}{基于的预训练模型，常见的sd15为stable diffison 1.5}_{图像控制方法，例如canny为草图}.pth 例如 control_v11p_sd15_canny.pth 表示 control 项目下，1.1版本的产品版，基于stable diffison 1.5，使用 canny 标注引导的模型文件

Sudden Converge Phenomenon¶

Hint

确定模型收敛的步数，然后通过梯度累计增大batchsize，重新训练。

Because we use zero convolutions, the SD should always be able to predict meaningful images. (If it cannot, the training has already failed.)

You will always find that at some iterations, the model suddenly be able to fit some training conditions. This means that you will get a basically usable model at about 3k to 7k steps (future training will improve it, but that model after the first “sudden converge” should be basically functional).

Note that 3k to 7k steps is not very large, and you should consider larger batch size rather than more training steps. If you can observe the “sudden converge” at 3k step using batch size 4, then, rather than train it with 300k further steps, a better idea is to use 100× gradient accumulation to re-train that 3k steps with 100× batch size. Note that perhaps we should not do this too extremely (perhaps 100x accumulation is too extreme), but you should consider that, since “sudden converge” will always happen at that certain point, getting a better converge is more important.

Because that “sudden converge” always happens, lets say “sudden converge” will happen at 3k step and our money can optimize 90k step, then we have two options: (1) train 3k steps, sudden converge, then train 87k steps. (2) 30x gradient accumulation, train 3k steps (90k real computation steps), then sudden converge.

In my experiments, (2) is usually better than (1). However, in real cases, perhaps you may need to balance the steps before and after the “sudden converge” on your own to find a balance. The training after “sudden converge” is also important.

But usually, if your logic batch size is already bigger than 256, then further extending the batch size is not very meaningful. In that case, perhaps a better idea is to train more steps. I tried some “common” logic batch size at 64 or 96 or 128 (by gradient accumulation), it seems that many complicated conditions can be solved very well already.

Model - Preprocessor¶

a photo of an astronaut riding a horse on mars

https://ai.plainenglish.io/controlnet-a-revolutionizing-game-changing-tool-for-image-generation-dea5ae1f0144

Old model repo: https://huggingface.co/webui/ControlNet-modules-safetensors New 1.1: https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main

Copy from: https://github.com/Mikubill/sd-webui-controlnet/discussions/564#discussioncomment-5719326

Tried to update this mapping for ControlNet 1.1:

model	preprocessor(s)
control_v11p_sd15_canny	canny
control_v11p_sd15_mlsd	mlsd
control_v11f1p_sd15_depth	depth_midas, depth_leres, depth_zoe
control_v11p_sd15_normalbae	normal_bae
control_v11p_sd15_seg	seg_ofade20k, seg_ofcoco, seg_ufade20k
control_v11p_sd15_inpaint	inpaint_global_harmonious?
control_v11p_sd15_lineart	lineart_standard (?), lineart_realistic, lineart_coarse
control_v11p_sd15s2_lineart_anime	lineart_anime
control_v11p_sd15_openpose	openpose (body), openpose_face, openpose_faceonly, openpose_full (body+hand+face), openpose_hand
control_v11p_sd15_scribble	scribble_hed, scribble_pidinet
control_v11p_sd15_softedge	softedge_pidinet, softedge_pidisafe, softedge_hed, softedge_hed_safe
control_v11e_sd15_shuffle (experimental)	shuffle
control_v11e_sd15_ip2p (experimental)	-
control_v11u_sd15_tile (unfinished)	tile_gaussian?

preprocessor	ControlNet 1.1 model
invert (from white bg & black line)
canny	control_v11p_sd15_canny
depth_leres	control_v11f1p_sd15_depth
depth_midas	control_v11f1p_sd15_depth
depth_zoe	control_v11f1p_sd15_depth
inpaint_global_harmonious	control_v11p_sd15_inpaint
lineart_anime	control_v11p_sd15s2_lineart_anime
lineart_coarse	control_v11p_sd15_lineart
lineart_realistic	control_v11p_sd15_lineart
lineart_standard (from white bg & black line)	control_v11p_sd15_lineart (?)
mediapipe_face
mlsd	control_v11p_sd15_mlsd
normal_bae	control_v11p_sd15_normalbae
normal_midas	abandoned in 1.1
openpose	control_v11p_sd15_openpose
openpose_face	control_v11p_sd15_openpose
openpose_faceonly	control_v11p_sd15_openpose
openpose_full	control_v11p_sd15_openpose
openpose_hand	control_v11p_sd15_openpose
scribble_hed	control_v11p_sd15_scribble
scribble_pidinet	control_v11p_sd15_scribble
scribble_xdog
seg_ofade20k	control_v11p_sd15_seg
seg_ofcoco	control_v11p_sd15_seg
seg_ufade20k	control_v11p_sd15_seg
shuffle	control_v11e_sd15_shuffle
softedge_hed	control_v11p_sd15_softedge
softedge_hedsafe	control_v11p_sd15_softedge
softedge_pidinet	control_v11p_sd15_softedge
softedge_pidisafe	control_v11p_sd15_softedge
threshold
tile_gaussian	control_v11u_sd15_tile?

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_ip2p.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_ip2p.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_shuffle.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_shuffle.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_inpaint.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_inpaint.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_mlsd.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_mlsd.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_normalbae.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_normalbae.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_scribble.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_scribble.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_seg.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_seg.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.yaml

https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15s2_lineart_anime.pth
https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15s2_lineart_anime.yaml

# https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix
https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/chilloutmix_NiPrunedFp32Fix.safetensors
https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/Korean-doll-likeness.safetensors
https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/Japanese-doll-likeness.safetensors
https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/Taiwan-doll-likeness.safetensors

dog on grassland lowers, bad anatomy, bad hands, text, error, missing fingers

How work?¶

Tile¶

https://github.com/lllyasviel/ControlNet-v1-1-nightly/blob/78631203a6739cde76a728062b475549a24f94c6/annotator/util.py#L30 input resize, min side to resolution

Reference¶

controlnet 1.1版本更新内容及参数条件详解