31 | Abstract
32 | Recent advancements in text-to-image (T2I) diffusion models have enabled the creation of high-quality images from text prompts, but they still struggle to generate images with precise control over specific visual concepts. Existing approaches can replicate a given concept by learning from reference images, yet they lack the flexibility for fine-grained customization of the individual component within the concept. In this paper, we introduce component-controllable personalization, a novel task that pushes the boundaries of T2I models by allowing users to reconfigure and personalize specific components of concepts. This task is particularly challenging due to two primary obstacles: semantic pollution, where unwanted visual elements corrupt the personalized concept, and semantic imbalance, which causes disproportionate learning of visual semantics. To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics and Dual-Stream Balancing (DS-Bal) to establish a balanced learning paradigm for visual semantics. Extensive comparisons, ablations, and analyses demonstrate that MagicTailor not only excels in this challenging task but also holds significant promise for practical applications, paving the way for more nuanced and creative image generation.
33 |
34 |
35 |
36 | ## 🔥 Updates
37 | - 2024.10: Our code is released! Feel free to [contact me](mailto:dhzhou@link.cuhk.edu.hk) if anything is unclear.
38 | - 2024.10: [Our paper](https://arxiv.org/pdf/2410.13370) is available. The code is coming soon!
39 |
40 |
41 | ## 🛠️ Installation
42 | 1. Install the conda environment:
43 | ```
44 | conda env create -f environment.yml
45 | ```
46 | 2. Install other dependencies (here we take CUDA 11.6 as an example):
47 | ```
48 | conda activate magictailor
49 | pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
50 | ```
51 | 3. Clone the Grounded-SAM repository:
52 | ```
53 | cd {PATH-TO-THIS-CODE}
54 | git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git
55 | ```
56 | 4. Follow the section of ["Install without Docker"](https://github.com/IDEA-Research/Grounded-Segment-Anything) to set up Grounded-SAM (please make sure that the CUDA version used for the installation here is the same as that of PyTorch).
57 |
58 | > ❗You can skip Step 3 and 4 if you just want to have a quick try using the example images we provide.
59 |
60 | ## 🔬 Training and Inference
61 |
62 | ### Preparing Data
63 | Directly use the example images in `./examples`, or you can prepare your own pair:
64 | 1. Create a folder named `{CONCEPT}_{ID}+{COMPONENT}_{ID}`, where `{CONCEPT}` and `{COMPONENT}` are the category names for the concept and component respectively, and `{ID}` is the customized index (you can set it to whatever you want) that helps you distinguish.
65 | 2. Put the reference images into this folder, and rename them as `0_{CONCEPT}_{ID}.jpg` and `1_{COMPONENT}_{ID}.jpg` for the images of the concept and component respectively.
66 | 3. Finally, the data will be organized like:
67 | ```
68 | person_a+hair_a/
69 | ├── 0_person_a0.jpg
70 | ├── 0_person_a1.jpg
71 | ├── 0_person_a2.jpg
72 | ├── 1_hair_a0.jpg
73 | ├── 1_hair_a1.jpg
74 | └── 1_hair_a2.jpg
75 | ```
76 |
77 | ### Training
78 | You can train MagicTailor with default hyperparameters:
79 | ```
80 | python train.py --instance_data_dir {PATH-TO-PREPARED-DATA}
81 | ```
82 | For example:
83 | ```
84 | python train.py --instance_data_dir examples/person_k+hair_c
85 | ```
86 | > ❗Please check the quality of the masks output by Grounded-SAM to ensure that the model runs correctly.
87 |
88 | Alternatively, you can also train it with customized hyperparameters, such as:
89 | ```
90 | python train.py
91 | --instance_data_dir examples/person_k+hair_c \
92 | --phase1_train_steps 200 \
93 | --phase2_train_steps 300 \
94 | --phase1_learning_rate 1e-4 \
95 | --phase2_learning_rate 1e-5 \
96 | --lora_rank 32 \
97 | --alpha 0.5 \
98 | --gamma 32 \
99 | --lambda_preservation 0.2
100 | ```
101 | You can refer to [our paper](https://arxiv.org/pdf/2410.13370) or `train.py` to understand the meaning of the arguments.
102 | Adjusting these hyperparameters helps yield better results.
103 |
104 | Moreover, we also provide a detailed training script in `scripts/train.sh` for research or development purposes, supporting further modification.
105 |
106 | ### Inference
107 | After training, a model will be saved in `outputs/magictailor`. Placeholder tokens `