Understanding Contrastive Learning for Self-Supervised Image Classification

in

Here’s how it works: let’s say you have an image of a dog (our “anchor” sample). We create another version of that same image by flipping it horizontally or changing the brightness, for example (our “positive” sample). Then we grab a completely different image from our dataset (let’s call this one our “negative” sample), and feed all three into our model.

The goal is to make sure that similar images (like our anchor and positive samples) are grouped together, while dissimilar ones (our negative sample) are kept apart. To do this, we use a fancy math formula called the “cosine similarity” which measures how closely related two vectors (in this case, our image embeddings) are to each other.

Here’s an example: let’s say we have three images of dogs (our anchor and positive samples), and one image of a cat (our negative sample). We feed all four into our model, which creates “embeddings” for each image basically, a numerical representation that captures the most important features.

Our goal is to make sure that the embeddings for our two dog images are as similar as possible, while the embedding for our cat image is as different as possible. To do this, we use a loss function called “contrastive loss” which measures how well we’re doing at keeping similar images close together and dissimilar ones far apart.

The formula looks like this: (Cosine Similarity of Anchor and Positive) (Maximum Cosine Similarity of Anchor with Negatives) + 0.07 * (Log Sum Exp of Cosine Similarities between Anchors and Negatives). This might sound complicated, but it basically means that we’re trying to maximize the similarity between our anchor and positive samples while minimizing their similarity to other images in our dataset.

The “temperature” hyperparameter (which is set to 0.07 or 0.1) controls how much penalty we put on harder negative samples basically, it makes sure that our model doesn’t get too comfortable with easy-to-distinguish pairs and starts learning more complex features instead.

Overall, contrastive learning has a lot of potential for improving the accuracy and efficiency of self-supervised image classification tasks, especially when working with large datasets or limited resources like memory or computing power.

SICORPS