Text-to-art AI generators have been heavily scrutinized as of late, with people citing concerns of inauthenticity and unethicality, while others view the program as a revolutionary tool to help artists and other creatives, in much the same way Photoshop was criticized after its release in 1990.
Despite its creation in the 1990s, AI art has exploded in recent years. Public awareness of this technology peaked, when a man by the name of Jason Allen, won an art competition with a piece he created via Midjourney, an advanced text-to-art AI program, which, expectedly, received mixed reactions.
For those unfamiliar, AI text-to-art generators are art generators, powered by artificial intelligence. Artists write algorithms not to follow a set of rules, but to “learn” a specific aesthetic by learning thousands of images. The algorithm then tries to generate new images in keeping with previously learned aesthetics. These generators revolve around processing images and recognizing aspects such as texture, color, and text. These models can modify existing images or create original images. AI Generators rely on different types of deep learning. The most common are: General Adversarial Networks (GAN), Convolutional Neural Networks (CNN), and Neural Style Transfer (NST), which will be discussed in depth below.
General Adversarial Networks
This system has two components. First, the generator tries to produce original images. The second, the discriminator contains a database of images and “discriminates” whether the image is truly original. Essentially, the two are advisories with the generator trying to outsmart the discriminator. A variation of this system is the VQGAN+CLIP system, which can produce images from natural language prompts. Two popular generators that use this system are DALL-E 2 and IMAGEN.
Convolutional Neural Networks
This system functions much like a brain.
“A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.”
It automatically detects important features without any human intervention. They also use three-dimensional data for image classification and object recognition. Convolutional Neural Networks are composed of multiple layers, a convolution layer, sweeps out features from an image, then computes a dot between the filter value and image pixel values. Then the pooling layer replaces the output with a max summary. This makes the output more efficient while maintaining image quality. Lastly, is the fully connected layer, which is the layer that is connected to the internet. CNN systems also utilize backpropagation, which helps models learn better weights and biases, additionally, it helps the network learn better.
Neural Style Transfer (NST)
NFTs are probably the type most people are familiar with, whether we realize it or not. Neural Style Transfer systems don’t spit out original images, they stylize existing images. This means each user may not necessarily receive an original image, unlike other generators that utilize other deep-learning systems. For example, a user could input a selfie and get your selfie back, but in the style of Picasso or Van Gogh.
If you’re considering using a text-to-art AI generator, or you just want to know about all the different ones that have been emerging as of late, keep reading.
1. Deep AI
Deep AI creates an image from scratch from a text description. Deep AI utilities convolutional neural networks.
Cons: The AI isn’t yet robust enough to create photorealistic images.
2. DALL-E Image generator
This software’s name is a portmanteau of Pixar’s robot character, WALL-E, and surrealist painter Salvador Dali. DALL-E is a powerful AI text-to-image generation. DALL-E has a unique feature, called inpainting, which is where DALL-E can take an original image and replace part of the photo with an AI image. DALL-E is also unique because it can respond to both text and image input. It can also create things in relation to each other. Cons: It’s still in beta, and not open to the public yet.
3. Nightcafe’s Stable Diffusion
Unlike DALL-E, Nightcafe is currently available to the public and is widely considered able to go toe-to-toe with DALL-E. Nightcafe also offers diverse styles to choose from, 12 in all. Everything from photos, anime, modern comic style, CGI characters, to pop art and Neo-impressionism.
4. Runway ML
Runway ML is unique because it utilizes machine learning and uses AI to create animations, and edit videos. It even boasts the impressive ability to be able to remove the background sans greenscreen.
The popular social media site, Tik Tok offers a text-to-image AI feature. TikTok employs a simple interface, which excels in abstract art. Though Tik Tok has limited capabilities, it may well be intentional, so that users as young as thirteen cannot create troubling (and community guideline-violating) photorealistic images.
GLIDE is a generative model based on text-guided diffusion models for more realistic image generation. GLIDE uses Guided Language-to-image Diffusion for Generating and Editing aka GLIDE was released in December of 2021. GLIDE makes use of a minor sampling delay. The best part about GLIDE is that it can convert line drawings into photorealistic drawings. This is great news for those of us, like myself, who have awful artistic abilities and want to see our creations come to life.
Midjourney rose to fame after someone won an art contest using art they created via Midjourney, which many people felt was unethical. The submission was called Theatre d’Opera Spatial. Midjourney is a major player in the AI text-to-image AI space. Midjourney’s output is often darker in theme than average art AI’s.
Imagen uses a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen uses DrawBench comprehensive benchmark and side-by-side human evaluation for the highest levels of accuracy. Imagen also systematically tests for compositionality, cardinality, spatial relations, long-form text, rare words, and challenging prompts. It was found human raters strongly prefer Imagen over other methods in both image-text alignment and image fidelity.
9. Hot Pot
Hotpot creates images as well as paintings and illustrations. Hot Pot also helps create social media graphics, app graphics, and AI graphics as well as features for NFT creation, color generation, art personalizer, logo and game asset creators, and basic photo editing tools such as face enhancer and object remover. Great for small businesses, who need help creating the basic tenets of social media in-house.
10. Jasper Art
Jasper, which started as an AI copywriting app, recently released a new feature, called Jasper Art. With the addition of Jasper Art, Jasper is now a one-stop shop for small businesses and content creators who want professional social media sites and websites.
11. Dream by Wombo
Dream is free to use, but not as robust as other AI generators, requiring more inputs for accuracy.
12. Starry AI
Starry AI creates images from text with a very simple interface. The best part? Starry AI is best for abstract art, photos and images. It is available for mobile use, in both the Android and Apple app stores.
Pixray is a great option if you want to turn text into art photos. Pixray utilizes its perception engine and ability to navigate latent space and therefore creates breathtaking concept art.
14. Deep Dream AI
Like DALL-E Deep Dream generator utilizes a neural network to interpret and generate images.
Deep Dream excels in the creation of abstract art.
15. Art Breeder
Unlike other generators, Art Breeder allows the user to “crossbreed” two images together or edit the “genes” of images that are produced.
Photosonic is a new on-the-scene AI art and image generator, which can be used to generate a variety of images including realistic, abstract, cartoon, & painting styles.
17. Hypotenuse AI
Hypotenuse differs from the rest because the AI has been trained by artists and graphic designers to help create specific images.
Upscaler is easy to use providing the user with ten free downloads to play with. As well as a step-by-step guide to getting the very best and most accurate image possible. Upscaler uses a convolution neural network through a graphics processing unit server. Upscaler also generates images closest to the input condition while keeping the picture natural.
HeyFriday concentrates on art photos in a myriad of styles.
Simplified offers everything from AI image generation to AI long-form writing as well as AI content writing. They also utilize AI and automation for graphic design and video animations.
Utilizing 10,000+ templates, Fotor makes it easy to create NFTs which are directly downloadable.
22. Big Sleep
Okay, fair warning this one requires technical knowledge, so it’s ideal for Python developers. Big Sleep is Python based which allows users to use Python script to generate realistic images from scratch. Big Sleep also requires a module download, so it’s a bit more complex than its plug-and-play counterparts.
23. Text-To-Image AI
This API is developed by strong AI-friendly languages such as Python and Ruby. Not only will users find all the necessary coding information of the API you need for casual use. This basic text-to-image tool should suffice for most people.
24. Computer Vision Explorer
Computer Vision Explorer includes pose estimation via Human Centric Vision, it also offers Scene Geometry as well as, image segmentation, object detection, image captioning, and more, in addition to the standard text-to-image AI.
Craiyon uses a machine learning model. Craiyon was developed as a lighter alternative to DALL-E. Not only can it reproduce images but it was also trained to combine concepts to create new images from any prompts it has references for.
Pinegraph is unique because it includes drawing tools and allows the user to generate an image, draw on it and ask the computer to generate another image, with the drawn modifications.
Stability was made in collaboration with Runway ML, utilizing the stable diffusion model.
Part Two: Interesting AI that’s worth mentioning but doesn’t technically fit in the classic text-to-image AI.
This particular text-to-image generator is unique. This application utilizes AI to analyze your selfie and produce a new photo, one with barely any similarities to the original, for identity protection.
2. Dummy Image
Dummy image is an image generator with premade images and customizable options, users can also choose the output format from GIF, PNG to JPG.
3. Glitch Image Generator
A hand-turned generator, adjustable parameters for mode, amount, opacity, etc. Doesn’t produce an original image, but rather, a derivative image. A good option for artists who want to experiment with different styles within their art.
Meta, announced Make-A-Video, using text description or images, Make-A-Video can render a video on demand. Meta took image synthesis data and applied unlabeled video training data. So the model learns a sense of where a text or image prompt might exist in time or space. Then it can predict what comes after the image and display the scene in motion for a short period.
Okay, so this one is impractical, but it’s fun. This generator lets you create brand new Pokemon from any description including celebrities and famous characters.
Despite its criticisms, AI art generators are increasing in popularity and have become a useful tool for entrepreneurs, content creators, small business owners, and creatives alike.
January 7, 2023 at 11:57 pm
Thank you for a very comprehensive article that helped me understand what AI is, how it works and what applications are available to the average consumer. Your articles keep me up to date in these rapidly changing times and are always fun to read – wonderful work Ms. Canelakes!