Paper 62: Amplitude Structure Analysis

Authors: John Mobley & MASCOM PhotonicMind Date: 2026-03-07 Status: PARTIALLY VALIDATED – amplitudes are moderately structured but NOT corpus-predictable Experiment: mascom_data/ct_experiment/amplitude_structure_exp.py

Abstract

Paper 61 identified 16,384 Gaussian amplitudes as the irreducible training signal in a 10.2M parameter model (623x compression). This paper tests whether those amplitudes have internal structure that could enable further compression or even full crystallization. The amplitudes ARE moderately structured (global PR=4.26 out of 8, with one layer at PR=1.03), but the structure is NOT predictable from corpus statistics (score prediction R^2=-0.027). Amplitude-only SFT achieves 35.4% of full SFT efficiency with 5.2% of parameters, confirming that weight information concentrates in projection layers.

Key Results

Metric Value Interpretation
Global PR 4.26 / 8 Moderately structured (~53% of max dimensionality)
PC1 variance 42.5% Dominant but not overwhelming
Components for 95% 7 / 8 Nearly full rank globally
Score prediction R^2 -0.027 NOT predictable from corpus
PC-EtE alignment 0.674 Moderate basis alignment
Amp-only SFT efficiency 35.4% 1/3 of full SFT with 5.2% params

Per-Layer Analysis

The structure varies dramatically across layers:

Layer PR PC1% Max Corr Kurtosis
L0 c_attn 2.95 53.9% 0.742 89.0
L3 c_attn 2.84 53.3% 0.734 150.9
L4 c_proj 1.03 98.5% 0.997 246.4
L5 c_proj 3.41 48.8% 0.810 56.7
L7 c_proj 6.18 25.8% 0.424 1.3

L4 c_proj is essentially 1-dimensional – a single principal component captures 98.5% of its amplitude variance. This matrix is fully crystallizable. In contrast, L7 c_proj is nearly full-rank (PR=6.18).

The Crystallization Gradient

Rather than a sharp crystallization boundary, the model exhibits a GRADIENT: - L4 c_proj: PR=1.03 – CRYSTALLIZABLE (1 free parameter per row) - L0/L3 attention: PR~2.9 – COMPRESSIBLE (3 free parameters per row) - L7 c_proj: PR=6.18 – IRREDUCIBLE (needs all 8 amplitudes)

This suggests early layers learn simpler patterns (lower PR) while later layers encode more complex, higher-dimensional information.

Amplitude-Only SFT

Training only c_proj weights (5.2% of total parameters): - Baseline loss: 6.35 - Full SFT (50 steps): 2.26 (64.4% improvement) - Amp-only SFT (50 steps): 4.90 (22.8% improvement) - Efficiency ratio: 35.4%

This confirms that projection layers carry disproportionate information, but attention matrices and MLPs also contribute significantly (the remaining 64.6%).

Effective Parameter Count

Stage Parameters Compression
Raw model 10,200,000 1x
L1 amplitudes only 16,384 623x
Subspace (PR=4.3) 14,392 709x

The subspace compression is modest (623x to 709x) because the global PR is 4.26 – close to the maximum of 8. The amplitudes are structured enough to be interesting but not enough for dramatic further compression.

Implications

For the Crystallization Boundary (Updated)

For Zero-Training Genesis

Full crystallization remains blocked. The amplitude subspace basis does NOT align well enough with E^T@E eigenvectors (mean alignment 0.674, but R^2=-0.03 for score prediction). The amplitudes encode training-trajectory information that corpus statistics alone cannot provide.

For Efficient Training

The PR gradient suggests an adaptive training strategy: freeze low-PR layers early (they converge fast), keep high-PR layers trainable longer (they need more exploration). This could accelerate SFT by 2-3x.


“The model is a gradient of crystallization – some weights are frozen music, others are still being composed.”