Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection
Ziteng (Ender) Ji
Introduction
This project reconstructs color photographs from Sergei Prokudin-Gorskii’s digitized glass plates by splitting each plate into three grayscale channels (B, G, R) and aligning G and R to B using only x-y translations. We score candidate shifts with simple similarity metrics (e.g., L2/SSD or NCC) and report the chosen displacement vectors. To handle large, high-resolution scans efficiently, we accelerate alignment with a coarse to fine image pyramid. The result is a single, well-aligned RGB image with minimal visual artifacts.
Background:
Sergei Mikhailovich Prokudin-Gorskii pioneered early color photography by capturing three sequential exposures of the same scene through blue, green, and red filters on a single glass plate. Decades later, the Library of Congress digitized these B-G-R plates, revealing remarkable views of the late Russian Empire but also exposing practical issues: the three channel images are vertically stacked, often misaligned, and can differ in intensity or contrast. Reconstructing a faithful color photo therefore requires separating the plate into its three grayscale channels and precisely aligning the green and red images to the blue reference using a translation model.
Methodology
I split each glass plate into three grayscale images B,G,R (top→bottom) and treat alignment as integer translations of G and R onto the base channel B. For any candidate shift , I compare only the valid overlap between the two images(no wraparound) then choose the shift that maximizes a similarity score.
Before scoring, each image is optionally converted to an edge map with a Sobel filter to suppress color differences, then standardized to zero-mean, unit-variance:
Similarity Metric
I support SSD and NCC; both are maximized:
By default I use NCC on edges, which is insensitive to global brightness and works well when channel intensities differ.
Single Scale Exhaustive Search
Given a center and radius (default = 15), I evaluate all integer shifts . For each candidate, I compute the valid overlapping slices, apply the interior crop, score with SSD/NCC, and keep the argmax. This yields for and for .
Coarse to Fine Pyramid
To handle large .tif scans efficiently, I build a Gaussian pyramid by repeatedly downscaling by (anti-aliased) until the minimum dimension is . Starting from the coarsest level, I run the exhaustive search (wide radius at the coarsest, at finer levels) and propagate shifts upward by doubling the previous estimate to seed the next level. The final level returns the pixel-accurate integer shift.
Warping & Composition
I apply the estimated shifts with safe overlap (no wrapping) to obtain , then stack into RGB as . The result is clipped to and saved as 8-bit (img_as_ubyte
) to avoid format issues.
Conclusion: Why These Choices?
Edges + NCC increase robustness to channel-dependent brightness; interior cropping avoids border artifacts; integer translations match the assignment model; and the pyramid preserves accuracy while cutting the brute-force search cost by orders of magnitude.
Result:
if you are using my codebase, please run python proj1.py xxx.jpg
or python proj1.py xxx.tif

G:(12, 54), R:(9, 111)
Problems Encountered
A recurring challenge was ambiguous alignment in low-texture regions. On plates with large sky/water areas or smooth façades, many candidate shifts produce nearly identical scores, and raw pixel SSD often snaps to a wrong local maximum; border artifacts further confuse the match. I mitigated this by (i) computing scores on Sobel edges with zero-mean/unit-variance normalization to emphasize structure and suppress brightness differences, (ii) discarding a fixed 10% interior crop to remove noisy borders, and (iii) using a coarse-to-fine pyramid so fine-level search starts near the correct basin. These steps eliminated most gross failures, though in extremely textureless scenes a small ±1 pixel residual can remain.