Image Synthesis using Dual Guidance and Multi-Human Generation

in

To start, let’s break down what this fancy term means. Image synthesis is the process of creating new images from scratch or manipulating existing ones to generate something entirely different. Dual guidance refers to combining two sources of information text prompts and reference human images to guide image synthesis while preserving human identity through an identity input. Multi-human generation, on the other hand, involves generating multiple humans in a single image.

Now that we’ve got that out of the way, Let’s get right into it with how this works!

First, you need to download our model from GitHub (link provided below) and extract it using your favorite extraction tool. Once extracted, open up a terminal or command prompt in the folder where you saved everything.

Next, navigate to the folder containing the images you want to manipulate or generate new ones from scratch. Let’s say we have an image of a person standing alone in front of a building (Figure 1). We want to add two more people to this scene while preserving their identities and maintaining the layout of the original image.

To do that, you need to run our model using the following command:

# This script is used to generate images by adding two more people to an existing image while preserving their identities and maintaining the layout of the original image.

# The first line specifies the interpreter to be used for executing the script.

#!/bin/bash

# The next line imports the necessary python script for generating images.

import generate_images.py

# The following line specifies the input image of a person standing alone in front of a building.

input-image="input/person_standing.jpg"

# The next two lines specify the identities of the two people to be added to the image.

identity-input1="identity/person1.jpg"
identity-input2="identity/person2.jpg"

# The following line adds a text description to the image.

text="two people standing in front of a building"

# The last line specifies the output folder where the generated image will be saved.

output-folder="output"

# Finally, the python script is executed with the specified parameters.

python generate_images.py --input-image $input-image \
                           --identity-input $identity-input1 \
                           --identity-input $identity-input2 \
                           --text "$text" \
                           --output-folder $output-folder

Let’s break down what each argument does:

– `–input-image` specifies the path to the original image you want to manipulate or generate new images from scratch.
– `–identity-input` specifies the paths to the reference human images that will guide our model in preserving their identities while generating new ones.
– `–text` specifies a text prompt that will be used as guidance for our model’s image synthesis process.
– `–output-folder` specifies where you want your generated images to be saved.

After running this command, our model will generate multiple human images while preserving their identities and maintaining the layout of the original image (Figure 10).

Now that we’ve covered how to use our model, some of its advantages! Our method is capable of synthesizing multi-human images thanks to our developed multi-identity cross-attention mechanisms. This means you can add multiple people to a scene without losing their identities or having them blend together.

We also conducted extensive experimental evaluations, both qualitative and quantitative, which have shown the advantages of our method (Figure 2). Our model is capable of generating high-quality images that preserve human identity while maintaining layout consistency with the original image.

SICORPS