Basics of Generative AI
Generative AI has become widely discussed topic now-a-days primarily due to the release of ChatGPT. It is fascinating to witness the development of programs that can generate text, pictures, and audio that closely resemble human behavior. It has the potential to revolutionize many industries, including art, entertainment, and marketing, by providing new ways to create and engage with audiences.
Given the advancement of the technology, it has become a necessity to know the basics terminologies of Generative AI and its use cases. Thefore, the intent of this blog is to familiarise with the basic concpets of generative AI. Next time if someone asks you what generative AI is, you will be able to brag…
Discriminative Models VS Generative Models
It is important to learn about the other counterpart of the Generative Models. Traditionally, Data Scientists have been working with the Discriminative Models where the task is to identify if an observation belongs to class 1 or 2 or 3 or any other. One of the very simple example would be to identify if a picture is of cat or dog.
On the other hand, Generative Model is an advancement to the field of AI. It has the potential of creating new information based on the input information. For example, if a set of cat images are supplied to it, it can generate a new image of cat that is different from the supplied images.
Though there are a lot of differences between the two types of models, highlighting a few below:
The next question is how are these generative models able to perform the task of generating something new out of the old images. As the traditional models are quite simple and are clearly based on the past data and labels. The answer to this question lies in latent spaces.
Latent Space
Latent spaces are a very important concept of the deep learning and majority of the users of the domain are unaware of its usage and the concept itself.
Word “Latent” has been derivd from the Latin word latēre which means “to lie hidden”. It is the process of simple dimensionality reduction by keeping the maximum information intact.
A very simple example is how we describe our people to others. For example, if one of the friends has pink hair, we tend to describe her as the one with pink hair. If more than two of the friends have different hair color, they will be often confused with each other.
Considering another example of cat and dog images, depending on the size of the image there are thousands of data points to capture. These points are required to be reduced in a way that they are still representative for the original images. In latent space, objects that are similar to each other will remain closer to each other. An illustration has been presented below:
The idea of latent space is used in the case of auto-encoders as well. Here, information is reduced to a smaller dimention representation which is thereby capable of regenerating the original image. A small representation of auto-encoders is presented below:
But how is latent space used for new image generation?
The answer is intuitive, my manipulation of the latent space. Small manipulations in the latent spaces lead to different images. The manipulation should be optimal enough to keep data point closer to its alike and father away from the other types.
In the above image, different images have been generated by doing manipulations with the latent space. Let’s look at an examples of how this can be done.
Considering a transformations from no beard to beard. As per the explanation of the projections in the latent space above, it is understandable that people with no beard and people with beard can be represented as two points in the space, as depicted below:
Mean of these two types can be calculated and is depicted by the black dots in the respective clusters. Now, if we subtract mean of beard from the mean of no-beard, it gives us a vector that points from no-beard to beard.
In the process of converting “no-beard face” to a “face with beard” would mean a transfomation towards “face with beard”. This requires following manipulation:
x_new = x_old + f * vector_beard
where,
x_old -> latent space representation of the image with “no-beard face”
x_new -> new transformed point in the latent space
f -> intensity of transaformation
vector_hair -> vector pointing from “no-beard face” to “face with beard”
Depending on transformation intensity, intensity of beard on face will increase.
That is it for today, I will talk about more concepts related to Generative AI in the upcoming posts. Stay Tuned!