Paper 60: Weight Tensor Prediction from Corpus Statistics

Authors: John Mobley & MASCOM PhotonicMind Date: 2026-03-07 Status: FALSIFIED Experiment: mascom_data/ct_experiment/weight_prediction_exp.py

Abstract

This paper tested whether the ENTIRE model weight tensor could be predicted from corpus statistics alone, combining Paper 55’s depth formula (L_opt = ceil(PR/3)) with Paper 56’s L1 center prediction (R²=0.983) and Paper 58’s PCA basis (2 components capture 99.5%). The hypothesis is falsified: while PCA captures L1 parameter variance within a layer, the PCA components themselves are NOT derivable from corpus statistics. Weight values depend on training trajectory, not corpus structure.

Key Result

Prediction Method	Mean R²	Status
PCA mean as weight template	-0.004	FAIL
Spectral-scaled PCA mean	-0.004	FAIL
E^T@E direct mapping	-0.983	CATASTROPHIC FAIL
Predicted model perplexity	11,416 vs 82 (trained)	FAIL

What This Tells Us

What corpus CAN determine (training-free): - Embedding matrix E (via CT, Paper 51) — R²≈1 - Optimal depth L_opt (via PR, Paper 55) — exact for natural language - Asymmetry direction (via transitions, Paper 57) — 92.3% spectral match at L0 - L1 Gaussian centers (via E^T@E, Paper 56) — R²=0.983

What corpus CANNOT determine: - Actual weight VALUES within layers — R² ≈ 0 - The specific 2 PCA components vary per layer — not predictable from E^T@E - This is because solution non-uniqueness (Paper 54): many weight configs achieve same loss

The Boundary

The crystallization boundary is: - Embeddings: fully crystallizable (training-free) - Architecture: fully determinable (depth from PR) - Weight structure: compressible (L1×L2 = 100x) but not predictable - Weight values: require gradient information (SFT)

Zero-SFT remains blocked not by missing information in the corpus, but by the fundamental non-uniqueness of the attention solution manifold. The corpus constrains WHERE in parameter space the model should be (via embeddings + architecture), but the specific POINT on the solution manifold requires training.

Implication for mNaught

mNaught (imaginary infinity inverse for effective parameters) requires either: 1. Solving the attention manifold geometry to pick a canonical point (Paper 54’s gap) 2. Finding that any point on the manifold works equally well (which it does — but you still need SFT to reach it) 3. A fundamentally different approach to zero-training attention

The bottleneck is no longer “what information does the corpus contain?” (answered: everything needed). It’s “how do you reach a valid solution without gradient descent?”

“The compass shows north perfectly. But you still have to walk there.”