Project 2:
Fun with Filters and Frequencies!
Here is something fun: Look at the image below, it looks like the girl is sticking out her tongue, right?
But if you look closer and enlarge it by klicking on the image, you will see that she is not actually giving you the tongue. You can actually even see her lower lip.
Now look at the enlarged image and move away until you almost see her giving you the tongue, but not quite. If you look at her eyes at this distance, you can see her giving you the tongue from the corner of your eyes, but if you look at her mouth, her tongue disappears again.
This is a fun example of how our brain can be tricked by the frequencies of an image. The image is a combination of a high-pass and low-pass filter, which we will learn about in this project.
Google also just recently released their new Pixel 9 phone, which has the new feature Add Me, that gets everyone into a single photo, by combining 2 seperate photos of the same scene.
We will learn about how we can lay 2 images on top of each other using a mask, while creating a seamless transition between them.
Here I took 2 pictures of my money and combined them into one image.
Now I can show off and pretend that I have 2 monies!
The process of creating the image above. I used a mask to combine the two images.
The result looks so real!
Using gaussian and laplacian stacks we can blend 2 images together more smoothly. Another example:
Going from this bad simple blending of an apple and an orange
To this smooth blending of an apple and an orange
(Please don't use the images of people here anywhere else, thanks!)
1. Fun with Filters
1.1 Finite Difference Operator
There is a way to detect edges by looking at the intensity differences of neighboring pixels. We can use the Finite difference operator D_x = np.array([[1, -1]]) and D_y = np.array([[1], [-1]]) to detect the edges in the x and y direction, by convolving it with the image.
Original Image
Convoluted with dx, which are basically all the vertical edges
Convoluted with dy, which are basically all the horizontal edges
We get our gradient magnitude image by taking the square root of the sum of the squares of the 2 convoluted dx dy images, like this: grad_mag = np.sqrt(convoluted_dx**2 + convoluted_dy**2) and end up with our gradient magnitude shown above.
To make our image more clear, we can threshold it. We can set a threshold value, and if the gradient magnitude is above this value, we set it to 255 (which is white), else to 0. As a threshold I found out that 60 works pretty well for this image.
1.2 Derivative of Gaussian (DoG) Filter
The problem here, is that we still see some noise in the images in form of small white dots, which are
not edges.
What we could do is to blur the image first, to get rid of the noise. Here I just used a simple Gaussian
blur with
a kernel size of 5 and a sigma of 3. I used the blur before running the Finite Difference Operator.
We can use the following function to create a gaussian kernel:
gaussian_kernel = cv2.getGaussianKernel(5,3).T * cv2.getGaussianKernel(5,3)
By blurring the image beforehand, we can see that the noise is gone.
We can simplify things by using just one convolution instead of two by making a
derivative of Gaussian
filters by convolving the Gaussian with D_x and D_y.
Verify that you get the same result as before!
As we can see, the result with the dog filter is almost the same as before.
What differences do you see?
Okay this result is already pretty good. This is because we got rid of the noise that was in the grass and in some other parts of the image, that was caused by some pixels having a very different intensity then its surrounding pixels. this has now been cancelled out by blurrying the image a little.
What was very important here is that for creating the dog filter kernel, we used the mode full, so that no information is lost. Full adds 0 padding, which means, that with our gaussian kernel of size 5x5, if we apply the 1x2 D_x kernel on it for convolution, we will pad it on the left and right side with 0 each, so that it results in a 5x7. After that we apply the convolution with the D_x kernel, so that it will end up as a 5x6 dx DoG Kernel. Same with D_y.
2. Fun with Frequencies!
2.1 Image "Sharpening"
We can use frequencies to create any image. The high frequencies often contain the image details whereas the lower frequencies contain the overall structure of the image. We can increase the high frequencies of an image to sharpen an image.
Our original taj mahal image
This is the high frequency part of the image that we got, by blurring the image with a kernel size of 5 and a sigma of 3, and then subtracting the blurred image from the original image.
We add the high frequency part to the original image to sharpen it. I use the formula: sharpened = original + alpha * high frequency part Here I have used a alpha of 0.5
Here I have used a alpha of 1 to sharpen it
Here I have used a alpha of 2 to sharpen it, which is the best result in my opition.
Here I have used a alpha of 4 to sharpen it, which feels too much.
Lets try it with an image of my choice
Here is our original image. If you find this image confusing, take a closer look, the girl is actually holding a bag of popcorn :D
Here I have used a alpha of 3. I feel like the bag of popcorn is more visible now.
Lets take the taj mahal image again and blur it. We then want to sharpen the blurred image and compare it to the original image.
We will now use this blurried version of the taj mahal image to sharpen it.
This is the result of sharpening our blurried image
This is the original image. If we compare the sharpened blurried image to our original image, we can see that the sharpening does not make the image quality better than the original image.
final-final.min())/(final.max()-final.min()
2.2 Hybrid Images
Lets try to create hybrid images as described in this SIGGRAPH 2006 paper by Oliva, Torralba, and Schyns Hybrid images are images that we interpret differently from the distance and from close up, because they are a combination of low and high frequencies. At a distance, we can only see the low frequencies, and from close up, we can see the high frequencies.
We choose 2 images that we want to combine and blur one of them with a gaussian filter to get the low frequencies. Here I choose a kernelsize = sigma * 6 + 1 like suggested in the lecture, but added a 1 to it because I wanted my gaussian to have a peak in the middle, instead of a flat top.
If the 2 images we want to combine are not the same size or are not alligned, we can use the provided python script to resize and allign them beforehand. For that we only have to select 2 points in each image that we want to combine. The script will then allign the images by using the selected points.
For the derek and nutmeg hybrid image, I set the sigma to 9 (our cutoff frequency) and used derek as the low frequency image and nutmeg as the high frequency image. Finally, I combined both images by adding the low frequency image to the high frequency image and then dividing by 2.
Derek Nutmeg
This is our high frequency image of nutmeg that has been correctly alligned.
This is our low frequency image of derek that has been correctly alligned.
This is our hybrid image of derek and nutmeg. If you look at it from a distance, you will see Derek, but if you look at it from close up, you will see the cat.
Since we have IPhone releases every year and they barely change, I wanted to blend the different iPhone 14 and 15 models together for fun, here is the result:
Is she sticking out her tongue? (Favorite)
This is our high frequency image of notongue image that has been correctly alligned.
This is our low frequency image of tongue image that has been correctly alligned.
(You can klick on this image to enlargen it)
This is our hybrid image of the tongue and no tongue image. If you look at it from a
distance, you will
see the girl sticking out the tongue, but if you look at it from close up, you will not
see the tongue at all.
2D fourier transform of the images
I could have grayscaled this but it looks so cool in color, so I kept it like this.
This is our 2D fourier transform of the high frequency image notongue image
This is our 2D fourier transform of the low frequency image tongue image
This is our 2D fourier transform of the final image tongue image
New or old iPhone? (Failure)
This is a failure, this hybrid image is not as good as the others. I feel like the reason is that the 2 base images are to clear and don't have any features, the combined notch of the hybrid image at the top just looks kind of weird. I think its better to use base images that have a lot more features, for creating hybrid images.
This is our high frequency image of the old iPhone that has been correctly alligned.
This is our low frequency image of the new iPhone that has been correctly alligned.
Our final hybrid image, is it the new or the old iPhone?
Is this the Ikea shark?
Ikea shark!
Real shark!
Our final hybrid image, is it the Ikea shark or the real shark?
Bells and Whisles regarding color
I have actually tried to use color to enhance this effect, both on the high frequency component and also on the low frequency component. From my observations, I can say that adding color makes the hybrid image effact not as good as black and white images. I kept all my previous results in black and white because I think they give the better results. The reason being that black and white leaves more for the imagination and color makes some parts of the image appear weird and abnormal.
Here is my favorite result with the girl sticking out her tongue, but with color:
This time we convolve and combine the high and low frequencies over all three color channels
While in black and white, the part under the girls lower lip appears normal, the shade we can attribute to maybe a shadow or part of her chin, the color adds more realizim and makes the part under her lower lip appear weird, since it is tongue colored.
2.3 Gaussian and Laplacian Stacks
Here we will try to blend 2 images together using gaussian and laplacian stacks and also a mask. We use a multi resolution blending as described in this paper by Burt and Adelson. We use an image spline to join 2 images together by gently distorting them. We blend them together at their different frequencies to creat a much smoother seam. We want to create a gaussian and laplacian stack first, with which we can blend the 2 images together.
The gaussian stack is created by blurring the image with a gaussian filter. I use a stack size of 6 images, convolute them with a gaussian filter with a sigma = (i + 1) * 2, with i being the index of the stack. I set the kernel size to be kernelsize = sigma * 6 + 1.
To create the laplacian stack I take the ith gaussian layer of the stack and calculate
the laplacian stack using stacked_gau[i] - stacked_gau[i + 1].
We end up with a laplacian stack of size 6 too.
Then we apply the same method, we created the gaussian stack with, to the mask. This mask gives us
information
on how strongly to blend the 2 images together. We use the following calculation to blend the 2 left and
right images together:
left = apple_stack_lap[i] * half_mask_stack_gau[i]
right = orange_stack_lap[i] * (1 - half_mask_stack_gau[i])
final = left + right
The left and middle columns show the original apple and orange images weighted by the mask.
This is the mask we are using, that has been convoluted iteratively with the gaussian filter.
This is level 0, the high frequency part of our laplacian stack.
This is level 2, the mid frequency part of our laplacian stack.
This is level 4, the low frequency part of our laplacian stack.
This is level 5, the lowest frequency part of our laplacian stack.
This is our final blended image.
The Oranapplewatermelon
Let's try it with an irregular filter and my favorite fruit, the watermelon and create a Oranapplewatermelon:
This is the mask we are using, that has been convoluted iteratively with the gaussian filter.
Our Oranapple Laplacian stack
Our Watermelon Laplacian stack
This is our final blended image.
2 Monies
Having money is great but what if we had 2 monies? Lets take 2 photos of our money in 2 different places and use a mask to fuse the together, so that it appears as if we have 2 monies!
This is the mask we are using
After running the same process as before...
We end up with 2 monies!
The result looks so real!
The most important thing I learned
While I learned a lot about manipulating images, how images are displayed and can be sharpened or combined by playing with frequencies, I think that the most important thing that I learned is that things are easier than they seem. I always thought image processing was very complex, but after learning about the principles and how it works, it is actually easier than I thought, once you understand the basics.