What is an AI Image Generator?
An AI image generator is a system that takes some sort of input, typically a text prompt or a seed image, and uses artificial intelligence techniques such as transformer neural networks to create images. The images can be of anything, and these systems are often used to generate images of things that do not exist in the real world, or to create images in the style of particular artists, mediums, or periods/movements.
AI image generation has been the subject of research in the AI community for years, but the field exploded in early 2022 with the release of the first high-quality commercial AI image generators such as DALLE-2 from OpenAI, Stable Diffusion from Stability AI, and Freeway ML. These systems allowed users to create high resolution, photorealistic images by simply providing a textual prompt, and the results were often extremely impressive.
How are AI Image Generators Used?
AI image generators can be used for all sorts of purposes by a variety of different types of people. Common users and use cases for AI image generation include:
- Graphic Designers / Accelerating Productivity
Creating visuals for marketing or other purposes, a graphic designer can use an AI image generator to create initial ideas or entire assets. For example, a designer might use an AI to generate a set of images to be used as part of an ad campaign.
The designer will likely build upon these AI generated concepts using familiar tools such as Photoshop or Adobe Illustrator, but having an AI system available to accelerate the creation of assets for use in a composition can be a huge productivity boost!
- Artists / Experimenting and pushing creative boundaries
Some artists use AI image generators as a tool to help them experiment and push creative boundaries. For example, an artist might use an AI to generate a series of images based on a particular concept as part of an exploration of that concept.
An artist might also use an AI to generate a series of images in a style that is unfamiliar to them, in order to better understand that style and potentially incorporate it into their own work.
- Interior Designers / Visualizing ideas
Interior designers often need to quickly generate images to visualize ideas for clients. For example, an interior designer might use an AI image generator to create a series of images showing different furniture arrangements in a room.
These images can then be used to help the client better understand the designer's vision for the space.
- Architects / Designing buildings
Architects often need to generate images to visualize their ideas for buildings. For example, an architect might use an AI image generator to create a series of images showing different exterior facades for a building.
These images can then be used to help the architect better understand the client's tastes before spending significant time on a final design.
- Concept Artists / Brainstorming ideas
Concept artists often need to generate a large number of images to brainstorm ideas for characters, locations, or objects. For example, a concept artist might use an AI image generator to create a series of images of different animals to help them come up with ideas for a new character design.
- Product Designers / Generating new concepts
Product designers often need to generate images of new product concepts. When coming up with creative new ideas, an AI image generator can be a valuable tool. For example, a product designer might use an AI image generator to create a series of images of different design concepts for a manufacturer's next big product release.
These images can be used to help the designer refine their ideas and develop a final product.
Major Modes of AI Image Generators
There are three major modes of AI image generators: text-to-image, image-to-image, and image-inpainting:
Text-to-image is where you provide a textual prompt to the system, and it will generate an image based on that prompt. For example, you could provide the prompt "a dragon flying over a castle" and the system would generate an image of a dragon flying over a castle.
Image-to-image is where you provide an image as input, and the system will generate a new image based on that input. For example, you could provide an image of a cat, and the system would generate a new image of a cat that is slightly different from the input image. These systems can also be combined with a prompt, in which case the textual input is used to modify the original source image. For instance, maybe you decide to put sunglasses and a hat on the cat in your photo.
Image-inpainting is where you provide an image with a transparent mask (typically created by removing part of the image with an image editor), and the system will fill in the mask. Inpainting can often occur with, or without a textual prompt. When performed without a prompt, image inpainting typically generates a plausible output for the masked area--for example, if you were performing inpainting of a picnic in a park and you erased part of the field, inpainting would likely fill the area in with grass, dirt, or whatever the surrounding pixels look like. When performed with a textual prompt, inpainting can generate an image based on the provided text, even if that text does not appear in the source image. For example, if you have an image of a dog and you erase its neck area, you could provide the text "the dog has a bowtie" and the system would generate a bowtie in the masked neck area of the dog.
As mentioned earlier, text-to-image is where you provide a textual prompt to the system, and it will generate an image based on that prompt. This is the most common type of AI image generation, and the one that is most often used to generate images of things that do not exist in the real world.
The ability of the system to properly understand the textual prompt is extremely important in this mode. If the system does not understand the prompt, it will not be able to generate a meaningful image. For example, if you provide the prompt "a dragon flying over a castle" and the system does not understand what a dragon or castle is, or what flying looks like, it will not be able to generate an image of a dragon flying over a castle.
Early text-to-image systems were rudimentary and could generate only low resolution images (sometimes as small as 16x16 pixels!) -- but modern systems such as DALLE-2 or Freeway ML output 1024x1024 resolution images by default, and are often capable of even larger generations using AI upscaling or tiling techniques.
While text-to-image systems have seen an explosion of popularity in 2022, image-to-image builds upon this with the ability to modify any source image (a drawing, photograph, rendering, or anything else) with a textual prompt. For example, you could take a photograph of a coffee cup, and transform the inside of the cup into multi-colored latte foam art of a clown's face.
Image-to-image systems sometimes use a concept called a "style transfer" to generate the output image. A style transfer is where you take the content of one image and the style of another image and generate a new image that contains the content of the first image but the style of the second image. For example, you could take the content of an image of a dog and apply the style of an impressionist painting to generate a new image that contains the content of the dog but is styled like an impressionist painting.
Image-inpainting builds upon the concept of image-to-image by enabling specific portions of the source imagery to be modified through the use of masks. Masks can be created in a number of ways, but the most common method is to simply use an image editor to remove the desired portion of the image (by drawing a circle or other shape around it and then deleting that area).
Masks can also be generated automatically using computer vision techniques. For example, you could take an image of a person and have the system automatically generate a mask for the person's body or clothing. This can be used to enable the user to modify only a person's clothing without affecting the rest of the image.
Once a mask has been generated, the image-inpainting system will then generate new content to fill in the masked area. This content can be generated based on the surrounding pixels (known as "contextual inpainting"), or it can be generated based on a textual prompt (known as "semantic inpainting").
Contextual inpainting is often used to generate plausible results for masked areas. For example, if you have a landscape photo and you erase a tree from the photo, contextual inpainting would likely generate a new tree to fill in the masked area, or potentially sky if that seemed more appropriate. Semantic inpainting is often used to generate results that are based on a textual prompt, even if that text does not appear in the source image. Textual prompts can be used to perform all sorts of edits to a source image, such as inserting implausible objects (a giant hamburger in the middle of the road?), adjusting facial features (smiles, eyes, expressions) and just about anything else.
Styling Images with AI
The ability to style images is another important aspect of AI image generation. Often, users will want to generate images in a particular style, such as an impressionist painting or a photograph from a certain era. Some systems, such as Freeway ML, come with built-in styles that can be applied to images with just a few clicks. Other systems require users to provide their own style images or prompt modifiers, which can be done by either taking photographs or downloading images from the internet, or researching specific artistic styles or mediums.
Styling can often be performed during the image generation process (eg, "i want a picture of a low-poly dog", or "an oil painting of a space ship"), or after the fact by using an AI image generator's editing tools to modify an existing image.
In the above example, we take an image of a dog and use an Freeway ML's AI image editing tool to turn it into a "low-poly" styled dog.
Understanding Textual Prompts
As mentioned earlier, the ability of the system to properly understand textual prompts is extremely important in text-to-image and image-to-image mode. If the system does not understand the prompt, it will not be able to generate a meaningful image. Some prompts can be fairly basic ("a dog wearing a cape", "a smiling kid") -- others can be quite sophisticated or abstract ("4k, hyper realistic, photo realism, human face with goat features wearing human clothing, anthropomorphic goat face and eyes with human features, half goat half human, close up, professional photo, gorgeous lighting, pretty bokeh, wearing human clothing, both eyes visible").
Choosing the right AI image generation system is important depending on the complexity of your prompts. While a variety of systems are capable of understanding simple prompts, a vanishingly small number excel at longer or more sophisticated prompts. Freeway ML is one of these systems, and is often used by artists, graphic designers, and others who need to generate a large number of images with complex or abstract prompts.
Searching and Refining Old Image Generations
For those who create quite a number of images every week, having the ability to go back through their history and search through old creations, modify them and enhance or rework them, is great.
The ability to mark images as favorites can also be quite useful as a tool for flagging images which may be useful at a later date.
There are many factors to consider when choosing an AI image generation tool, and the space is evolving rapidly. If you haven't checked out Freeway ML yet, give it a try! Freeway is a comprehensive AI image generator which has one of the best combinations of features which enable people to enhance their creativity and productivity. It comes with a large number of built-in styles which can be applied to images with just a few clicks, generates high resolution 1024x1024 imagery right from the start. Freeway's editing tools are also top notch, and enable users to modify existing images with global enhance, prompt edit, or variations functions, as well as AI assisted mask-based inpainting. Finally, Freeway ML's search tool is extremely useful for those who need to go back and find old images or modify them for new purposes. If you try Freeway, be sure to join our Twitter, Discord, or Reddit communities and give us some feedback! We love hearing what you think.