Filters & Frequencies

Ziteng (Ender) Ji

Introduction

This project explores spatial filtering and frequency-domain techniques for image processing. We first implement 2D convolution from scratch and use simple finite-difference kernels to compute image gradients, magnitudes, and thresholded edge maps. We then compare plain finite differences to Derivative-of-Gaussian (DoG) filtering to show how smoothing suppresses noise while preserving salient edges. Building on these tools, we implement unsharp masking to enhance high-frequency detail, and create hybrid images by low passing one image and high passing another so that perception changes with viewing distance. Finally, we construct Gaussian and Laplacian stacks and use multi-resolution blending with a smoothed mask to seamlessly combine images. Throughout, we avoid pyramid helper functions, visualize intermediate results, and emphasize clear qualitative comparisons between methods.

Filters

Convolutions From Scratch

I implement two “same-size” 2D convolutions that differ only in how the inner accumulation is computed. Before convolving, the kernel is flipped both vertically and horizontally (flip2d) to perform true convolution (not correlation), and the input is zero-padded by half the kernel size in each dimension so that each output pixel is aligned with the kernel center. In the four-loop version (conv2d_four_loops), I iterate over every output location $(y, x)$ and, for each, run two inner loops over the kernel indices $(k_y, k_x)$ , explicitly accumulating padded[y+ky, x+kx] * k[ky, kx]. In the two-loop version (conv2d_two_loops), I keep only the outer loops over $(y, x)$ ; the inner double loop is replaced by slicing the corresponding $(k_h \times k_w)$ window from the padded image and computing the dot product in one line as np.sum(window * k). Both implementations therefore share the same padding and alignment logic; the two-loop variant is simply a partial vectorization of the inner accumulation, which makes it faster while remaining functionally identical. As a sanity check, convolving a discrete impulse returns the kernel itself (up to padding/cropping), which I verify by comparing the impulse response’s center patch to the original kernel with near-zero RMSE.

Compared to scipy.signal.convolve2d, my implementation produces the same “same size, true convolution” result by explicitly flipping the kernel and using zero padding (pad2d_zero). The four-loop version accumulates one multiply-add per kernel element at each output pixel, while the two-loop version reduces only the inner double loop to a vectorized np.sum(window * kf). At the boundaries, my code strictly uses zero padding the image with half the kernel size on each side, so every output pixel is computed with a fully centered kernel. Pixels “outside” the original image are treated as zeros, which can dampen filter responses near the edges. After convolution, the padded margins are discarded, yielding a same-size output aligned with the original image.

two for loops version

four for loops version

original image (taken in Alaska)

result from 9x9 box filter

The dx and dy are not really visible, because most pixels aren’t on an edge, so the derivative values are near zero almost everywhere. I plot zero as mid-gray and positive/negative changes as slightly lighter/darker, so the picture looks mostly gray with faint thin lines, which further compresses the contrast and makes edges look subtle.. Please click on the image for better visualization. Additionally, I also provide a enhanced version of dx and dy. I did this by taking absolute values so any strong change shows up bright, then stretching the contrast and optionally thresholding to highlight edges.

enhanced dx

enhance dy

Finite Difference Operator

I compute image derivatives by convolving the grayscale input $I$ with simple finite-difference kernels $D_x =[−1 1]$ (row vector) and $D_y =D_x^⊤$ . The partials are obtained as $I_x =I ∗ D_x$ and $I_y =I ∗ D_y$ (2D convolution with zero padding), which I then visualize using a symmetric mapping to $[0, 1]$ so positive and negative slopes are both visible. I form the gradient magnitude as $\lvert\nabla I\rvert=\sqrt{I_x^2+I_y^2}$ , normalize it to $[0, 1]$ for display, and convert it into a binary edge map by thresholding $E = 1[∣∇I∣ > τ]$ . The threshold $τ$ can be set explicitly or chosen automatically from the gradient distribution using quantiles, which qualitatively balances noise suppression against preserving real edges. The implementation uses scipy.signal.convolve2d when available (falling back to a NumPy two-loop convolution), and produces $I_x$ , $I_y$ , $∣∇I∣$ , and several candidate edge maps to compare visually.

gradient

edge

Derivative of Gaussian (DoG) Filter

I first smooth the image with a normalized 2D Gaussian $G$ and then apply finite differences as in Part 1.2, $I_{blur} = I ∗ G$ , $I_x^{(2)} = (I ∗ G) ∗ D_x$ , $I_y^{(2)} =(I ∗ G) ∗ D_y$ , and $∣∇I∣^{(2)} = \sqrt{(I_x^{(2)})^2 +(I_y^{(2)})^2}$ . Compared to raw differences, the pre-smoothed gradients are visibly less noisy and the binarized edge maps require a higher threshold to avoid thick edges, yet they better suppress texture noise and speckle while preserving real boundaries. I then form Derivative-of-Gaussian (DoG) filters in one step by convolving the Gaussian kernel with the difference operators $DoG_x = G ∗ D_x$ , $DoG_y = G ∗ D_y$ (saved and visualized as signed images), and apply them directly $I_x^{(dog)} = I ∗ DoG_x$ , $I_y^{(dog)} = I ∗ DoG_y$ , and we have $∣∇I∣^{(2)} = \sqrt{(I_x^{(2)})^2 +(I_y^{(2)})^2}$ . By associativity of convolution, $I ∗ (G ∗ D_x)=(I ∗ G) ∗ D_x$ (and similarly for $D_y$ ), so the one-step DoG and two-step smooth-then-differentiate pipelines match; I verify this numerically by reporting near-zero RMSE(Root Mean Squared Error, which measures the typical size of the difference between two arrays (e.g., two images) in the same units as the data) between $I_x^{(2)} vs. I_x^{(dog)}$ , $I_y^{(2)} vs. I_y^{(dog)}$ , and their gradient magnitudes, and I produce difference heatmaps for completeness. As in 1.2, I threshold $∣∇I∣^{(2)}$ (or $∣∇I∣^{(dog)}$ ) using either a fixed $τ$ or quantile-based $τ = quantile(∣∇I∣,q)$ to generate qualitatively clean edge maps that balance noise suppression and edge completeness.

Compared to plain finite differences, the DoG results are noticeably cleaner, background texture and speckle are suppressed and the gradient maps look smoother. The trade off is that very fine, low contrast details can be slightly attenuated, so a higher threshold is usually needed to binarize the DoG gradient magnitude to keep edges thin without reintroducing noise. In short, finite differences emphasize all high frequencies (including noise), while DoG emphasizes salient edges by first smoothing, yielding more stable, perceptually better edge maps.

gradient

edge

DoG x

DoG y

Frequencies

Image “Sharpening”

For this section, I implement classical unsharp masking by first low-pass filtering the image with a normalized Gaussian $G$ to get $I_{blur} = I ∗ G$ , extracting high frequencies $H = I - I_{blur}$ , and then boosting them: the two-step sharpened result is $I_{sharp}^{(2)} = I + αH = I + α(I−I∗G)$ . I also fold this into a single convolution with the unsharp kernel $K$ , where $K = (1 + α)δ − αG$ ⇒ $I_{sharp}^{(1)} = I ∗ K$ , which construct as $K = −αG$ and add $(1 + α)$ to the center coefficient (so $∑K = 1$ ). I apply both pipelines channel-wise (if RGB), save the blurred image, a visualization of the high-pass $H$ , and the sharpened outputs. In practice, increasing $α$ strengthens edges and textures while risking halos/noise; $σ$ in $G$ controls which frequencies are treated as “detail.” Here I show results on the provided blurry image and additional images of my choice, demonstrating that adding a scaled high-frequency residual makes the images perceptually sharper while keeping overall brightness stable. I also how varying the sharpening amount changes the result by running my code with different $α$ values.

original

original

high frequency

high frequency

blurry

blurry

sharpened

sharpened

alpha = 0.3

alpha = 0.6

alpha = 1.5

alpha = 2.0

Additional Result: Statue of Liberty

original

blurred the original

sharpen the blurred image

The original image was taken by Sony Alpha 7 III camera during my trip to New York, so it is really crisp, after blurring the original image a bit and then sharpen it using my algorithm, it is obvious that the algorithm works well and the effect it has on the blurred picture. However, the sharpened image is not as good as the original image.

Hybrid Images

For this section, I build hybrid images by first aligning the two inputs (interactive point-pair alignment in align_images) to ensure corresponding semantic parts overlay well, then separating frequencies and recombining. The low-frequency layer comes from a Gaussian blur of the first image $A: LP = A ∗ G_{𝜎_{low}}$ . The high-frequency layer comes from a high-pass of the second image $B: HP = B − (B ∗ G_{σ_{high}})$ , which is equivalent to convolving $B$ with the impulse minus Gaussian kernel $H = δ − G_{σ_{high}}$ . I then form the hybrid image by adding the components (optionally scaling highs) $HY = LP + αHP$ , with α tuned visually. I expose $σ_{low}$ , $σ_{high}$ , and $α$ as parameters and empirically choose cutoffs so that nearby viewing favors $B$ ’s details while distant viewing preserves $A$ ’s structure. For my favorite result (floccoli), I also include frequency analysis panels by plotting the log-magnitude Fourier spectra of each stage for the two inputs, the filtered images (LP, HP), and the final hybrid, clearly showing low pass energy near the origin and high pass energy in the periphery.

derek

nutmeg

dermeg

cat

jaguar

caguar

flower

For floccoli example:

broccoli

floccoli

low frequency layer A

high frequency layer B

Multi-resolution Blending & the Oraple

Gaussian and Laplacian Stacks

For this section, I implement Gaussian and Laplacian stacks (no downsampling) entirely from scratch. Given an input image $I$ , the Gaussian stack applies progressively stronger blurs while keeping the spatial resolution fixed $G_0 = I ∗ G_{σ_0}$ , $G_1 = I ∗ G_{σ_1}$ , … with $σ_{k+1} =σ_k ⋅ s$ (default $s= 2$ ); equivalently, each level is $G_k = I ∗ G_{σ_k}$ . The Laplacian stack captures band-pass detail per scale as differences of adjacent Gaussian levels $L_k = G_k −G_{k+1}$ for $k = 0, …, L−2,$ and the final level holds the residual $L_{L−1} = G_{L−1}$ . Because stacks are never downsampled, every $G_k$ and $L_k$ has the same height and width as $I$ , so grayscale stacks can be stored in a single 3D array. I visualize both stacks (Gaussian as clipped images, Laplacian with signed normalization) Finally, I apply these stacks to the apple image example to reproduce example–style panels (Gaussian on top, Laplacian below), which prepares the exact multi-scale ingredients needed for the next part’s multi-resolution blending.

Multiresolution Blending(Oraple)

For this section, I implement multiresolution blending with stacks by building Laplacian stacks for the two input images 𝐴 , 𝐵 A,B and a Gaussian stack of the mask $M$ . Concretely, I form $G_k^A = A ∗ G_{σ_k}$ , $G_k^B = B ∗ G_{σ_k}$ , and Laplacians $L_k^A = G_k^A − G_{k+1}^A$ , $L_k^B = G_k^B − G_{k+1}^B$ (with the final level the coarsest Gaussian), along with mask levels $M_k = M ∗ G_{σ_k}$ . Blending happens per scale with the smoothed mask $L_k^{blend} = M_k ⊙ L_k^A + (1 − M_k)⊙L_k^B$ (please note that here “⊙” means element wise multiplication of two same shaped arrays/matrices) and the and the final image is reconstructed by summation across levels (no up/downsampling needed) $O=\sum_{k=0}^{L-1} L_k^{\text{blend}}$ . I generate step masks for vertical/horizontal seams $M(x, y) ∈ {0, 1}$ (and a soft-step variant) to create the classic Oraple, then extend to irregular masks (loaded from image files) to produce two additional creative blends. To avoid ringing at the borders, convolutions use symmetric padding; all stacks keep the same spatial size. I visualize the Gaussian stacks for $A, B, M,$ the Laplacian stacks for $A, B$ , and the blended Laplacian stack $L_k^{blend}$ , plus the masked inputs at level 0, reproducing the Figure-10-style panels from the paper.

pepsi

Irregular Mask

below are examples of irregular masks

coke

pepoke

dog

dog mask

cat

cat mask

wolf

wolf mask

cheetah

cheetah mask

irregular mask

dolf

mask

catah

Most Important Thing I learned

This project gave me a practical understanding of how spatial filtering and frequency analysis shape image perception. Building everything from scratch forced me to internalize core concepts including convolution, gradients, smoothing vs. detail enhancement, and how multi-resolution representations capture structure at different scales. I learned to reason about design trade offs (for example, noise suppression vs. edge fidelity) and how parameters like $σ$ and mask smoothness govern what viewers perceive up close versus at a distance.