Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection

Ziteng (Ender) Ji

Introduction

This project reconstructs color photographs from Sergei Prokudin-Gorskii’s digitized glass plates by splitting each plate into three grayscale channels (B, G, R) and aligning G and R to B using only x-y translations. We score candidate shifts with simple similarity metrics (e.g., L2/SSD or NCC) and report the chosen displacement vectors. To handle large, high-resolution scans efficiently, we accelerate alignment with a coarse to fine image pyramid. The result is a single, well-aligned RGB image with minimal visual artifacts.

Background:

Sergei Mikhailovich Prokudin-Gorskii pioneered early color photography by capturing three sequential exposures of the same scene through blue, green, and red filters on a single glass plate. Decades later, the Library of Congress digitized these B-G-R plates, revealing remarkable views of the late Russian Empire but also exposing practical issues: the three channel images are vertically stacked, often misaligned, and can differ in intensity or contrast. Reconstructing a faithful color photo therefore requires separating the plate into its three grayscale channels and precisely aligning the green and red images to the blue reference using a translation model.

Methodology

I split each glass plate into three grayscale images B,G,R (top→bottom) and treat alignment as integer translations of G and R onto the base channel B. For any candidate shift (Δy,Δx)(Δy,Δx), I compare only the valid overlap between the two images(no wraparound) then choose the shift that maximizes a similarity score.

Before scoring, each image is optionally converted to an edge map with a Sobel filter to suppress color differences, then standardized to zero-mean, unit-variance:

I^=Sobel(I)μσ+ε\hat{I}=\frac{\operatorname{Sobel}(I)-\mu}{\sigma+\varepsilon}

Similarity Metric

I support SSD and NCC; both are maximized:

SSSD(A,B)=(y,x)(A(y,x)B(y,x))2,SNCC(A,B)=ABA2B2S_{\text{SSD}}(A,B)=-\sum_{(y,x)}\left(A(y,x)-B(y,x)\right)^2,\qquad S_{\text{NCC}}(A,B)=\frac{\sum A\cdot B}{\|A\|_2 \|B\|_2}

By default I use NCC on edges, which is insensitive to global brightness and works well when channel intensities differ.

Single Scale Exhaustive Search

Given a center (cy,cx)(c_y , c_x) and radius rr (default rr = 15), I evaluate all integer shifts (Δy,Δx)[cy±r]×[cx±r](Δy,Δx)∈[c_y ± r]×[c_x ± r]. For each candidate, I compute the valid overlapping slices, apply the interior crop, score with SSD/NCC, and keep the argmax. This yields (ΔyG,ΔxG)(Δy_G , Δx_G) for GBG→B and (ΔyR,ΔxR)(Δy_R , Δx_R) for RBR→B.

Coarse to Fine Pyramid

To handle large .tif scans efficiently, I build a Gaussian pyramid by repeatedly downscaling by 0.50.5 (anti-aliased) until the minimum dimension is 400≤ 400. Starting from the coarsest level, I run the exhaustive search (wide radius at the coarsest, radius=4radius = 4 at finer levels) and propagate shifts upward by doubling the previous estimate to seed the next level. The final level returns the pixel-accurate integer shift.

Warping & Composition

I apply the estimated shifts with safe overlap (no wrapping) to obtain G,RG ′, R ′ , then stack into RGB as [R,G,B][R ′, G ′, B]. The result is clipped to [0,1][0, 1] and saved as 8-bit (img_as_ubyte) to avoid format issues.

Conclusion: Why These Choices?

Edges + NCC increase robustness to channel-dependent brightness; interior cropping avoids border artifacts; integer translations match the assignment model; and the pyramid preserves accuracy while cutting the brute-force search cost by orders of magnitude.

Result:

if you are using my codebase, please run python proj1.py xxx.jpg or python proj1.py xxx.tif

G:(2, 5), R:(3, 12)

G:(2, -3), R:(2, 3)

G:(3, 3), R:(3, 6)

G:(4, 25), R:(-4, 58)

G:(24, 49), R:(40, 107)

G:(17, 60), R:(17, 124)

G:(17, 42), R:(23, 90)

G:(22, 38), R:(36, 77)

G:(-2, -3), R:(-8, 76)

G:(-17, 41), R:(-29, 92)

G:(10, 80), R:(13, 177)

G:(29, 78), R:(37, 176)

G:(-6, 49), R: (-24, 96)

G:(12, 54), R:(9, 111)

G:(-8, 48), R:(-15, 111)

G:(-17, 41), R:(-29, 92)

G:(-16, 33), R:(-25, 78)

G:(-12, 24), R:(-22, 96)

Problems Encountered

A recurring challenge was ambiguous alignment in low-texture regions. On plates with large sky/water areas or smooth façades, many candidate shifts produce nearly identical scores, and raw pixel SSD often snaps to a wrong local maximum; border artifacts further confuse the match. I mitigated this by (i) computing scores on Sobel edges with zero-mean/unit-variance normalization to emphasize structure and suppress brightness differences, (ii) discarding a fixed 10% interior crop to remove noisy borders, and (iii) using a coarse-to-fine pyramid so fine-level search starts near the correct basin. These steps eliminated most gross failures, though in extremely textureless scenes a small ±1 pixel residual can remain.