Filters & Frequencies
Ziteng (Ender) Ji
Introduction
This project explores spatial filtering and frequency-domain techniques for image processing. We first implement 2D convolution from scratch and use simple finite-difference kernels to compute image gradients, magnitudes, and thresholded edge maps. We then compare plain finite differences to Derivative-of-Gaussian (DoG) filtering to show how smoothing suppresses noise while preserving salient edges. Building on these tools, we implement unsharp masking to enhance high-frequency detail, and create hybrid images by low passing one image and high passing another so that perception changes with viewing distance. Finally, we construct Gaussian and Laplacian stacks and use multi-resolution blending with a smoothed mask to seamlessly combine images. Throughout, we avoid pyramid helper functions, visualize intermediate results, and emphasize clear qualitative comparisons between methods.
Filters
Convolutions From Scratch
I implement two “same-size” 2D convolutions that differ only in how the inner accumulation is computed. Before convolving, the kernel is flipped both vertically and horizontally (flip2d
) to perform true convolution (not correlation), and the input is zero-padded by half the kernel size in each dimension so that each output pixel is aligned with the kernel center. In the four-loop version (conv2d_four_loops
), I iterate over every output location and, for each, run two inner loops over the kernel indices , explicitly accumulating padded[y+ky, x+kx] * k[ky, kx]
. In the two-loop version (conv2d_two_loops
), I keep only the outer loops over ; the inner double loop is replaced by slicing the corresponding window from the padded image and computing the dot product in one line as np.sum(window * k)
. Both implementations therefore share the same padding and alignment logic; the two-loop variant is simply a partial vectorization of the inner accumulation, which makes it faster while remaining functionally identical. As a sanity check, convolving a discrete impulse returns the kernel itself (up to padding/cropping), which I verify by comparing the impulse response’s center patch to the original kernel with near-zero RMSE.
Compared to scipy.signal.convolve2d
, my implementation produces the same “same size, true convolution” result by explicitly flipping the kernel and using zero padding (pad2d_zero
). The four-loop version accumulates one multiply-add per kernel element at each output pixel, while the two-loop version reduces only the inner double loop to a vectorized np.sum(window * kf)
. At the boundaries, my code strictly uses zero padding the image with half the kernel size on each side, so every output pixel is computed with a fully centered kernel. Pixels “outside” the original image are treated as zeros, which can dampen filter responses near the edges. After convolution, the padded margins are discarded, yielding a same-size output aligned with the original image.
The dx and dy are not really visible, because most pixels aren’t on an edge, so the derivative values are near zero almost everywhere. I plot zero as mid-gray and positive/negative changes as slightly lighter/darker, so the picture looks mostly gray with faint thin lines, which further compresses the contrast and makes edges look subtle.. Please click on the image for better visualization. Additionally, I also provide a enhanced version of dx and dy. I did this by taking absolute values so any strong change shows up bright, then stretching the contrast and optionally thresholding to highlight edges.
Finite Difference Operator
I compute image derivatives by convolving the grayscale input with simple finite-difference kernels (row vector) and . The partials are obtained as and (2D convolution with zero padding), which I then visualize using a symmetric mapping to so positive and negative slopes are both visible. I form the gradient magnitude as , normalize it to for display, and convert it into a binary edge map by thresholding . The threshold can be set explicitly or chosen automatically from the gradient distribution using quantiles, which qualitatively balances noise suppression against preserving real edges. The implementation uses scipy.signal.convolve2d
when available (falling back to a NumPy two-loop convolution), and produces , , , and several candidate edge maps to compare visually.
Derivative of Gaussian (DoG) Filter
I first smooth the image with a normalized 2D Gaussian and then apply finite differences as in Part 1.2, , , , and . Compared to raw differences, the pre-smoothed gradients are visibly less noisy and the binarized edge maps require a higher threshold to avoid thick edges, yet they better suppress texture noise and speckle while preserving real boundaries. I then form Derivative-of-Gaussian (DoG) filters in one step by convolving the Gaussian kernel with the difference operators , (saved and visualized as signed images), and apply them directly , , and we have . By associativity of convolution, (and similarly for ), so the one-step DoG and two-step smooth-then-differentiate pipelines match; I verify this numerically by reporting near-zero RMSE(Root Mean Squared Error, which measures the typical size of the difference between two arrays (e.g., two images) in the same units as the data) between , , and their gradient magnitudes, and I produce difference heatmaps for completeness. As in 1.2, I threshold (or ) using either a fixed or quantile-based to generate qualitatively clean edge maps that balance noise suppression and edge completeness.
Compared to plain finite differences, the DoG results are noticeably cleaner, background texture and speckle are suppressed and the gradient maps look smoother. The trade off is that very fine, low contrast details can be slightly attenuated, so a higher threshold is usually needed to binarize the DoG gradient magnitude to keep edges thin without reintroducing noise. In short, finite differences emphasize all high frequencies (including noise), while DoG emphasizes salient edges by first smoothing, yielding more stable, perceptually better edge maps.
Frequencies
Image “Sharpening”
For this section, I implement classical unsharp masking by first low-pass filtering the image with a normalized Gaussian to get , extracting high frequencies , and then boosting them: the two-step sharpened result is . I also fold this into a single convolution with the unsharp kernel , where ⇒ , which construct as and add to the center coefficient (so ). I apply both pipelines channel-wise (if RGB), save the blurred image, a visualization of the high-pass , and the sharpened outputs. In practice, increasing strengthens edges and textures while risking halos/noise; in controls which frequencies are treated as “detail.” Here I show results on the provided blurry image and additional images of my choice, demonstrating that adding a scaled high-frequency residual makes the images perceptually sharper while keeping overall brightness stable. I also how varying the sharpening amount changes the result by running my code with different values.
Additional Result: Statue of Liberty
The original image was taken by Sony Alpha 7 III camera during my trip to New York, so it is really crisp, after blurring the original image a bit and then sharpen it using my algorithm, it is obvious that the algorithm works well and the effect it has on the blurred picture. However, the sharpened image is not as good as the original image.
Hybrid Images
For this section, I build hybrid images by first aligning the two inputs (interactive point-pair alignment in align_images
) to ensure corresponding semantic parts overlay well, then separating frequencies and recombining. The low-frequency layer comes from a Gaussian blur of the first image . The high-frequency layer comes from a high-pass of the second image , which is equivalent to convolving with the impulse minus Gaussian kernel . I then form the hybrid image by adding the components (optionally scaling highs) , with α tuned visually. I expose , , and as parameters and empirically choose cutoffs so that nearby viewing favors ’s details while distant viewing preserves ’s structure. For my favorite result (floccoli), I also include frequency analysis panels by plotting the log-magnitude Fourier spectra of each stage for the two inputs, the filtered images (LP, HP), and the final hybrid, clearly showing low pass energy near the origin and high pass energy in the periphery.
Multi-resolution Blending & the Oraple
Gaussian and Laplacian Stacks
For this section, I implement Gaussian and Laplacian stacks (no downsampling) entirely from scratch. Given an input image , the Gaussian stack applies progressively stronger blurs while keeping the spatial resolution fixed , , … with (default ); equivalently, each level is . The Laplacian stack captures band-pass detail per scale as differences of adjacent Gaussian levels for and the final level holds the residual . Because stacks are never downsampled, every and has the same height and width as , so grayscale stacks can be stored in a single 3D array. I visualize both stacks (Gaussian as clipped images, Laplacian with signed normalization) Finally, I apply these stacks to the apple image example to reproduce example–style panels (Gaussian on top, Laplacian below), which prepares the exact multi-scale ingredients needed for the next part’s multi-resolution blending.




Multiresolution Blending(Oraple)
For this section, I implement multiresolution blending with stacks by building Laplacian stacks for the two input images 𝐴 , 𝐵 A,B and a Gaussian stack of the mask . Concretely, I form , , and Laplacians , (with the final level the coarsest Gaussian), along with mask levels . Blending happens per scale with the smoothed mask (please note that here “⊙” means element wise multiplication of two same shaped arrays/matrices) and the and the final image is reconstructed by summation across levels (no up/downsampling needed) . I generate step masks for vertical/horizontal seams (and a soft-step variant) to create the classic Oraple, then extend to irregular masks (loaded from image files) to produce two additional creative blends. To avoid ringing at the borders, convolutions use symmetric padding; all stacks keep the same spatial size. I visualize the Gaussian stacks for the Laplacian stacks for , and the blended Laplacian stack , plus the masked inputs at level 0, reproducing the Figure-10-style panels from the paper.




Most Important Thing I learned
This project gave me a practical understanding of how spatial filtering and frequency analysis shape image perception. Building everything from scratch forced me to internalize core concepts including convolution, gradients, smoothing vs. detail enhancement, and how multi-resolution representations capture structure at different scales. I learned to reason about design trade offs (for example, noise suppression vs. edge fidelity) and how parameters like and mask smoothness govern what viewers perceive up close versus at a distance.