Paper 62: Amplitude Structure Analysis

Authors: John Mobley & MASCOM PhotonicMind Date: 2026-03-07 Status: PARTIALLY VALIDATED – amplitudes are moderately structured but NOT corpus-predictable Experiment: mascom_data/ct_experiment/amplitude_structure_exp.py

Abstract

Paper 61 identified 16,384 Gaussian amplitudes as the irreducible training signal in a 10.2M parameter model (623x compression). This paper tests whether those amplitudes have internal structure that could enable further compression or even full crystallization. The amplitudes ARE moderately structured (global PR=4.26 out of 8, with one layer at PR=1.03), but the structure is NOT predictable from corpus statistics (score prediction R^2=-0.027). Amplitude-only SFT achieves 35.4% of full SFT efficiency with 5.2% of parameters, confirming that weight information concentrates in projection layers.

Key Results

Metric	Value	Interpretation
Global PR	4.26 / 8	Moderately structured (~53% of max dimensionality)
PC1 variance	42.5%	Dominant but not overwhelming
Components for 95%	7 / 8	Nearly full rank globally
Score prediction R^2	-0.027	NOT predictable from corpus
PC-EtE alignment	0.674	Moderate basis alignment
Amp-only SFT efficiency	35.4%	1/3 of full SFT with 5.2% params

Per-Layer Analysis

The structure varies dramatically across layers:

Layer	PR	PC1%	Max Corr	Kurtosis
L0 c_attn	2.95	53.9%	0.742	89.0
L3 c_attn	2.84	53.3%	0.734	150.9
L4 c_proj	1.03	98.5%	0.997	246.4
L5 c_proj	3.41	48.8%	0.810	56.7
L7 c_proj	6.18	25.8%	0.424	1.3

L4 c_proj is essentially 1-dimensional – a single principal component captures 98.5% of its amplitude variance. This matrix is fully crystallizable. In contrast, L7 c_proj is nearly full-rank (PR=6.18).

The Crystallization Gradient

Rather than a sharp crystallization boundary, the model exhibits a GRADIENT: - L4 c_proj: PR=1.03 – CRYSTALLIZABLE (1 free parameter per row) - L0/L3 attention: PR~2.9 – COMPRESSIBLE (3 free parameters per row) - L7 c_proj: PR=6.18 – IRREDUCIBLE (needs all 8 amplitudes)

This suggests early layers learn simpler patterns (lower PR) while later layers encode more complex, higher-dimensional information.

Amplitude-Only SFT

Training only c_proj weights (5.2% of total parameters): - Baseline loss: 6.35 - Full SFT (50 steps): 2.26 (64.4% improvement) - Amp-only SFT (50 steps): 4.90 (22.8% improvement) - Efficiency ratio: 35.4%

This confirms that projection layers carry disproportionate information, but attention matrices and MLPs also contribute significantly (the remaining 64.6%).

Effective Parameter Count

Stage	Parameters	Compression
Raw model	10,200,000	1x
L1 amplitudes only	16,384	623x
Subspace (PR=4.3)	14,392	709x

The subspace compression is modest (623x to 709x) because the global PR is 4.26 – close to the maximum of 8. The amplitudes are structured enough to be interesting but not enough for dramatic further compression.

Implications

For the Crystallization Boundary (Updated)

Embeddings: training-free (CT)
Architecture depth: training-free (PR formula)
Weight centers: training-free (E^T@E)
Weight widths: training-free (constant d/K)
Weight amplitudes: REQUIRE TRAINING – structured (PR=4.26) but not corpus-predictable
Layer-specific: some layers (L4 c_proj) nearly crystallizable, others (L7 c_proj) irreducible

For Zero-Training Genesis

Full crystallization remains blocked. The amplitude subspace basis does NOT align well enough with E^T@E eigenvectors (mean alignment 0.674, but R^2=-0.03 for score prediction). The amplitudes encode training-trajectory information that corpus statistics alone cannot provide.

For Efficient Training

The PR gradient suggests an adaptive training strategy: freeze low-PR layers early (they converge fast), keep high-PR layers trainable longer (they need more exploration). This could accelerate SFT by 2-3x.

“The model is a gradient of crystallization – some weights are frozen music, others are still being composed.”