Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection

Ziteng (Ender) Ji

Introduction

This project reconstructs color photographs from Sergei Prokudin-Gorskii’s digitized glass plates by splitting each plate into three grayscale channels (B, G, R) and aligning G and R to B using only x-y translations. We score candidate shifts with simple similarity metrics (e.g., L2/SSD or NCC) and report the chosen displacement vectors. To handle large, high-resolution scans efficiently, we accelerate alignment with a coarse to fine image pyramid. The result is a single, well-aligned RGB image with minimal visual artifacts.

Background:

Sergei Mikhailovich Prokudin-Gorskii pioneered early color photography by capturing three sequential exposures of the same scene through blue, green, and red filters on a single glass plate. Decades later, the Library of Congress digitized these B-G-R plates, revealing remarkable views of the late Russian Empire but also exposing practical issues: the three channel images are vertically stacked, often misaligned, and can differ in intensity or contrast. Reconstructing a faithful color photo therefore requires separating the plate into its three grayscale channels and precisely aligning the green and red images to the blue reference using a translation model.

Methodology

I split each glass plate into three grayscale images B,G,R (top→bottom) and treat alignment as integer translations of G and R onto the base channel B. For any candidate shift $(Δy,Δx)$ , I compare only the valid overlap between the two images(no wraparound) then choose the shift that maximizes a similarity score.

Before scoring, each image is optionally converted to an edge map with a Sobel filter to suppress color differences, then standardized to zero-mean, unit-variance:

$\hat{I}=\frac{\operatorname{Sobel}(I)-\mu}{\sigma+\varepsilon}$

Similarity Metric

I support SSD and NCC; both are maximized:

$S_{\text{SSD}}(A,B)=-\sum_{(y,x)}\left(A(y,x)-B(y,x)\right)^2,\qquad S_{\text{NCC}}(A,B)=\frac{\sum A\cdot B}{\|A\|_2 \|B\|_2}$

By default I use NCC on edges, which is insensitive to global brightness and works well when channel intensities differ.

Single Scale Exhaustive Search

Given a center $(c_y , c_x)$ and radius $r$ (default $r$ = 15), I evaluate all integer shifts $(Δy,Δx)∈[c_y ± r]×[c_x ± r]$ . For each candidate, I compute the valid overlapping slices, apply the interior crop, score with SSD/NCC, and keep the argmax. This yields $(Δy_G , Δx_G)$ for $G→B$ and $(Δy_R , Δx_R)$ for $R→B$ .

Coarse to Fine Pyramid

To handle large .tif scans efficiently, I build a Gaussian pyramid by repeatedly downscaling by $0.5$ (anti-aliased) until the minimum dimension is $≤ 400$ . Starting from the coarsest level, I run the exhaustive search (wide radius at the coarsest, $radius = 4$ at finer levels) and propagate shifts upward by doubling the previous estimate to seed the next level. The final level returns the pixel-accurate integer shift.

Warping & Composition

I apply the estimated shifts with safe overlap (no wrapping) to obtain $G ′, R ′$ , then stack into RGB as $[R ′, G ′, B]$ . The result is clipped to $[0, 1]$ and saved as 8-bit (img_as_ubyte) to avoid format issues.

Conclusion: Why These Choices?

Edges + NCC increase robustness to channel-dependent brightness; interior cropping avoids border artifacts; integer translations match the assignment model; and the pyramid preserves accuracy while cutting the brute-force search cost by orders of magnitude.

Result:

if you are using my codebase, please run python proj1.py xxx.jpg or python proj1.py xxx.tif

G:(2, 5), R:(3, 12)

G:(2, -3), R:(2, 3)

G:(3, 3), R:(3, 6)

G:(4, 25), R:(-4, 58)

G:(24, 49), R:(40, 107)

G:(17, 60), R:(17, 124)

G:(17, 42), R:(23, 90)

G:(22, 38), R:(36, 77)

G:(-2, -3), R:(-8, 76)

G:(-17, 41), R:(-29, 92)

G:(10, 80), R:(13, 177)

G:(29, 78), R:(37, 176)

G:(-6, 49), R: (-24, 96)

G:(12, 54), R:(9, 111)

G:(-8, 48), R:(-15, 111)

G:(-17, 41), R:(-29, 92)

G:(-16, 33), R:(-25, 78)

G:(-12, 24), R:(-22, 96)

Problems Encountered

A recurring challenge was ambiguous alignment in low-texture regions. On plates with large sky/water areas or smooth façades, many candidate shifts produce nearly identical scores, and raw pixel SSD often snaps to a wrong local maximum; border artifacts further confuse the match. I mitigated this by (i) computing scores on Sobel edges with zero-mean/unit-variance normalization to emphasize structure and suppress brightness differences, (ii) discarding a fixed 10% interior crop to remove noisy borders, and (iii) using a coarse-to-fine pyramid so fine-level search starts near the correct basin. These steps eliminated most gross failures, though in extremely textureless scenes a small ±1 pixel residual can remain.

Ziteng (Ender) Ji

Introduction

Background:

Methodology

@import url('https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.9/katex.min.css')I^=Sobel⁡(I)−μσ+ε\hat{I}=\frac{\operatorname{Sobel}(I)-\mu}{\sigma+\varepsilon}I^=σ+εSobel(I)−μ​﻿

Similarity Metric

Single Scale Exhaustive Search

Coarse to Fine Pyramid

Warping & Composition

Conclusion: Why These Choices?

Result:

Problems Encountered

$\hat{I}=\frac{\operatorname{Sobel}(I)-\mu}{\sigma+\varepsilon}$