Panorama
Ziteng (Ender) Ji
Introduction
This project is about turning multiple photos of the same scene into coherent, perspective-correct results using projective geometry and image warping. We estimate planar homographies from point correspondences to relate views, then use them to map pixels between images so that surfaces align in a common frame. With these tools, we rectify slanted objects to a fronto-parallel view and stitch overlapping photographs into seamless mosaics that widen the field of view beyond a single shot. Simple feathering or multi-scale blending reduces visible seams and exposure differences so the composites look natural. Overall, the focus is on understanding how camera motion induces projective transformations and using that insight to align, warp, and blend images into clean visual outcomes.
Manual
Images used in this project
Recover Homographies
I estimate a planar homography (with 8 DoF, ) from point correspondences () using two implementations, a direct least-squares solver and a normalized DLT variant. Given pairs , I build the linear system with one row per coordinate,
then solve by np.linalg.lstsq, assemble , and scale so . So we will have
This matrix maps a homogeneous source point to . Geometrically, the top two rows encode the affine effects (rotation/scale/shear plus translation), while introduces perspective foreshortening; if , H reduces to a purely affine transform. To improve numerical stability beyond the minimal (which is noise sensitive), I collect correspondences (via a Matplotlib ginput UI or by loading a CSV), making the system overdetermined and solved in least squares. I also provide a normalized DLT, each point set is centered and isotropically scaled to mean distance via similarity transforms ; I then build the DLT matrix, take the last right-singular vector from SVD as , and denormalize with before fixing the scale. For deliverables, my script visualizes the clicked correspondences side-by-side with indices, and prints and saves the first rows of (and the full matrices), it also outputs the recovered . Images are read as 8-bit (PIL with HEIF support), I also add the feature that points are stored/loaded from CSV so you don’t have to click the point every time.
Doe Library & Campanilli

North Reading Room (Doe Library)


Warp the Image & Rectification
For this section, I implement inverse warping with two from-scratch interpolators and an explicit alpha mask to avoid holes. For a given homography (source → target), I first predict the output canvas by mapping the four source corners through , taking the min/max to form an integer bounding box, then create a regular integer grid of output pixel centers (we treat integer coordinates as pixel centers). Each output location is back-projected with to continuous source coords . For nearest neighbor, I round to , copy that pixel if it lies in bounds, and set alpha=1 there (else 0). For bilinear, I take the four neighbors with , compute weights , and form the weighted sum channel-wise, marking only where all four neighbors are valid; outputs are clipped to and cast back to the input dtype. Both warpers return (image, α, meta), where meta records the output bbox and the saved alpha visualizes coverage; I also report the valid-pixel fraction to compare hole behavior. I provide a driver that accepts from disk, builds from a CSV of correspondences (so no need to click all the points manually again), or runs a rectification mode, the user clicks 4+ points on a planar object and I map them to a user-specified rectangle of size (rect_w, rect_h) (for I use the rectangle’s four corners; for I distribute targets around the perimeter). In terms of the speed Nearest Neighbor is faster than Bilinear, but they do produce similar quality.




Blend the Images into a Mosaic
For this section, I build mosaics by first choosing a reference image and estimating homographies for each non-reference view (loaded from text files or computed from CSV correspondences via my solver), then predicting a global canvas by projecting all source corner points through their ’s, taking the min/max to get , and applying a translation offset so every warp lands in positive pixel coordinates. Each image is then inverse-warped into this common canvas using either my nearest-neighbor or bilinear routine (same code as A.3), producing a warped RGB and a binary valid mask. To reduce edge seams, I assign each original image a feather alpha map that is 1 at the center and falls off linearly toward the borders (computed from the minimum distance to image edges and normalized by half the shorter side); this soft alpha is warped to the canvas (bilinear) and multiplied by the valid mask to form the final per-image weights. For blending, I do simple weighted averaging (feathering), accumulate and across all warped images (channel-wise in float64), then divide with an to avoid zero-division and clip/cast to uint8. The script supports one shot stacking (all images warped into the same canvas at once), saves all warped images and their alphas, and outputs the final mosaic plus a panel figure that shows every warped layer and the final result; I also save a contact sheet of the source photos for each mosaic. This feathered weighted averaging removes hard seams and most exposure steps while remaining fast and robust. For even stronger high-frequency “ghosting” suppression a small Laplacian-pyramid blend could be substituted, but the feathered approach is sufficient for the results I show with their corresponding source images and documented homographies.
Doe Library & Campanilli


Hayns Reading Room (Doe Library)


Wheeler Hall & Campanilli


Automatic
Harris Corner Detection
I detect corners with a single scale Harris detector and then thin them with Adaptive Non-Maximal Suppression. From a grayscale image normalized to , I compute Sobel gradients , , form the second-moment terms , , , smooth each with a Gaussian (= window_sigma, default 1.5) to obtain , and evaluate the Harris response with
and kappa (default 0.04). I also normalize to , then perform a non-max suppression (NMS) keeping pixels above a quantile threshold (harris_quantile, default 0.995) and selecting local maxima; I cap this pre-set to max_candidates strongest responses. For ANMS, I follow the standard radius rule, for each candidate with response , compute the suppression radius over corners whose responses satisfy (robustness c = anms_robust, default 0.9); if none exist, I fall back to the nearest neighbor. I then keep the anms_keep (default 1000) points with largest radii to ensure spatial diversity. Here I show the denser, clumped detections before ANMS versus the well-distributed set after ANMS.
Feature Descriptor Extraction
For each keypoint from B.1, I build a blurred base image by converting to grayscale in and applying a separable Gaussian (default ). Around each keypoint I sample a window using bilinear interpolation on a sub-pixel grid centered at the keypoint (half-pixel offsets), then downsample by average pooling over non-overlapping cells to form an axis-aligned patch. I then bias/gain normalize each descriptor by flattening to 64D, subtracting the mean, and dividing by the standard deviation (with for stability), yielding zero-mean/unit-std descriptors. To avoid boundary artifacts, keypoints within 20 pixels of the border (more generally, < win/2) are discarded so the full window is in-bounds; I also cap the processed set to --keep keypoints for efficiency. For the image in the section above, I am getting the features below.

Feature Matching
I match the part B.2 descriptors by computing all pairwise squared Euclidean distances between the zero-mean/unit-std 64-D vectors from the two images. For each descriptor in image 1 I find the nearest () and second-nearest neighbors in image 2 using a fast partial sort, and apply the ratio test with a tunable threshold (--ratio, default 0.8). I optionally enforce mutual (cross-check) consistency (--mutual) by requiring that image 2’s nearest neighbor of the chosen match also points back to the same image 1 feature. For the Implementation details, distances use for speed. I also guard small denominators in the ratio. Moreover, I trim descriptor/point lists to the common length to keep indices aligned. In the end, I write a B3_matches.csv table with per match, and produce a visualization (shown below) that stacks the two images side by side, plots the matched keypoints, and draws connecting lines (capped by --max_plot, default 300).


RANSAC for Robust Homography
From the B.3 matches , I run 4-point RANSAC to estimate a homography robustly, each iteration randomly samples 4 correspondences (skipping degenerate quads with near-zero area), fits (both normalized DLT and tried; best kept), and scores all matches with a reprojection error (default symmetric transfer error ) against an inlier threshold of 3 px. I keep the model with the largest inlier count, then re-estimate on all inliers using the chosen estimator. The implementation exposes --iters (default 3000), --thresh (px), --method (normalized/Ahb), and --one_sided (use only) for more possibilities that you can test on your own. It also logs inlier stats and saves per-pair inlier visualizations overlaying green (inlier) vs. red (outlier) matches. For mosaicing, I choose a reference image, set , and compose each non-ref to the global canvas (computed by warping all image corners, then applying a translation offset). Each image is inverse-warped (bilinear by default; --nn optional) and blended with feathered alpha averaging (alpha falls off to edges) to produce an automatic mosaic. Below I show the compaison between stitching manually and automatically.








































