# Control Net [Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) ```{figure} /_static/aigc/he.png controlnet block ``` 完整的controlnet只在encoder部分。 ```{figure} /_static/aigc/sd.png full controlnet ``` ## 模型命名 ```{figure} /_static/aigc/cn_fmt.png controlnet model naming method ``` {项目名称,通常为control}{版本和修改次数,例如v11表示1.1版本,v11f1表示1.1版本的修改1次}{p表示产品版,e表示实验版,u表示正在训练}{基于的预训练模型,常见的sd15为stable diffison 1.5}_{图像控制方法,例如canny为草图}.pth 例如 control_v11p_sd15_canny.pth 表示 control 项目下,1.1版本的产品版,基于stable diffison 1.5,使用 canny 标注引导的模型文件 ## Sudden Converge Phenomenon ```{hint} 确定模型收敛的步数,然后通过梯度累计增大batchsize,重新训练。 ``` Because we use zero convolutions, the SD should always be able to predict meaningful images. (If it cannot, the training has already failed.) You will always find that at some iterations, the model **suddenly** be able to fit some training conditions. This means that you will get a basically usable model at about 3k to 7k steps (future training will improve it, but that model after the first "sudden converge" should be basically functional). Note that 3k to 7k steps is not very large, and you should consider larger batch size rather than more training steps. If you can observe the "sudden converge" at 3k step using batch size 4, then, rather than train it with 300k further steps, a better idea is to use 100× gradient accumulation to re-train that 3k steps with 100× batch size. Note that perhaps we should not do this too extremely (perhaps 100x accumulation is too extreme), but you should consider that, since "sudden converge" will always happen at that certain point, getting a better converge is more important. Because that "sudden converge" always happens, lets say "sudden converge" will happen at 3k step and our money can optimize 90k step, then we have two options: (1) train 3k steps, sudden converge, then train 87k steps. (2) 30x gradient accumulation, train 3k steps (90k real computation steps), then sudden converge. In my experiments, (2) is usually better than (1). However, in real cases, perhaps you may need to balance the steps before and after the "sudden converge" on your own to find a balance. The training after "sudden converge" is also important. But usually, if your logic batch size is already bigger than 256, then further extending the batch size is not very meaningful. In that case, perhaps a better idea is to train more steps. I tried some "common" logic batch size at 64 or 96 or 128 (by gradient accumulation), it seems that many complicated conditions can be solved very well already. ## Model - Preprocessor a photo of an astronaut riding a horse on mars https://ai.plainenglish.io/controlnet-a-revolutionizing-game-changing-tool-for-image-generation-dea5ae1f0144 Old model repo: https://huggingface.co/webui/ControlNet-modules-safetensors New 1.1: https://huggingface.co/lllyasviel/ControlNet-v1-1/tree/main > Copy from: https://github.com/Mikubill/sd-webui-controlnet/discussions/564#discussioncomment-5719326 Tried to update this mapping for ControlNet 1.1: | model | preprocessor(s) | |------------------------------------------|--------------------------------------------------------------------------------------------------| | control_v11p_sd15_canny | canny | | control_v11p_sd15_mlsd | mlsd | | control_v11f1p_sd15_depth | depth_midas, depth_leres, depth_zoe | | control_v11p_sd15_normalbae | normal_bae | | control_v11p_sd15_seg | seg_ofade20k, seg_ofcoco, seg_ufade20k | | control_v11p_sd15_inpaint | inpaint_global_harmonious? | | control_v11p_sd15_lineart | lineart_standard (?), lineart_realistic, lineart_coarse | | control_v11p_sd15s2_lineart_anime | lineart_anime | | control_v11p_sd15_openpose | openpose (body), openpose_face, openpose_faceonly, openpose_full (body+hand+face), openpose_hand | | control_v11p_sd15_scribble | scribble_hed, scribble_pidinet | | control_v11p_sd15_softedge | softedge_pidinet, softedge_pidisafe, softedge_hed, softedge_hed_safe | | control_v11e_sd15_shuffle (experimental) | shuffle | | control_v11e_sd15_ip2p (experimental) | - | | control_v11u_sd15_tile (unfinished) | tile_gaussian? | | preprocessor | ControlNet 1.1 model | |---------------------------------------------------|-----------------------------------| | invert (from white bg & black line) | | canny | control_v11p_sd15_canny | | depth_leres | control_v11f1p_sd15_depth | | depth_midas | control_v11f1p_sd15_depth | | depth_zoe | control_v11f1p_sd15_depth | | inpaint_global_harmonious | control_v11p_sd15_inpaint | | lineart_anime | control_v11p_sd15s2_lineart_anime | | lineart_coarse | control_v11p_sd15_lineart | | lineart_realistic | control_v11p_sd15_lineart | | lineart_standard (from white bg & black line) | control_v11p_sd15_lineart (?) | | mediapipe_face | | | mlsd | control_v11p_sd15_mlsd | | normal_bae | control_v11p_sd15_normalbae | | normal_midas | abandoned in 1.1 | | openpose | control_v11p_sd15_openpose | | openpose_face | control_v11p_sd15_openpose | | openpose_faceonly | control_v11p_sd15_openpose | | openpose_full | control_v11p_sd15_openpose | | openpose_hand | control_v11p_sd15_openpose | | scribble_hed | control_v11p_sd15_scribble | | scribble_pidinet | control_v11p_sd15_scribble | | scribble_xdog | | | seg_ofade20k | control_v11p_sd15_seg | | seg_ofcoco | control_v11p_sd15_seg | | seg_ufade20k | control_v11p_sd15_seg | | shuffle | control_v11e_sd15_shuffle | | softedge_hed | control_v11p_sd15_softedge | | softedge_hedsafe | control_v11p_sd15_softedge | | softedge_pidinet | control_v11p_sd15_softedge | | softedge_pidisafe | control_v11p_sd15_softedge | | threshold | | tile_gaussian | control_v11u_sd15_tile? | ```none https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_ip2p.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_ip2p.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_shuffle.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11e_sd15_shuffle.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_inpaint.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_inpaint.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_mlsd.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_mlsd.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_normalbae.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_normalbae.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_scribble.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_scribble.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_seg.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_seg.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.yaml https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15s2_lineart_anime.pth https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15s2_lineart_anime.yaml # https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/chilloutmix_NiPrunedFp32Fix.safetensors https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/Korean-doll-likeness.safetensors https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/Japanese-doll-likeness.safetensors https://huggingface.co/hirohironog/chilloutmix_NiPrunedFp32Fix/resolve/main/Taiwan-doll-likeness.safetensors ``` dog on grassland lowers, bad anatomy, bad hands, text, error, missing fingers ## How work? ### Tile https://github.com/lllyasviel/ControlNet-v1-1-nightly/blob/78631203a6739cde76a728062b475549a24f94c6/annotator/util.py#L30 input resize, min side to resolution ----- ## Reference 1. [controlnet 1.1版本更新内容及参数条件详解](https://blog.csdn.net/qcwlmqy/article/details/130355876)