Authors: John Mobley & MASCOM PhotonicMind
Date: 2026-03-07 Status: FALSIFIED
Experiment:
mascom_data/ct_experiment/weight_prediction_exp.py
This paper tested whether the ENTIRE model weight tensor could be predicted from corpus statistics alone, combining Paper 55’s depth formula (L_opt = ceil(PR/3)) with Paper 56’s L1 center prediction (R²=0.983) and Paper 58’s PCA basis (2 components capture 99.5%). The hypothesis is falsified: while PCA captures L1 parameter variance within a layer, the PCA components themselves are NOT derivable from corpus statistics. Weight values depend on training trajectory, not corpus structure.
| Prediction Method | Mean R² | Status |
|---|---|---|
| PCA mean as weight template | -0.004 | FAIL |
| Spectral-scaled PCA mean | -0.004 | FAIL |
| E^T@E direct mapping | -0.983 | CATASTROPHIC FAIL |
| Predicted model perplexity | 11,416 vs 82 (trained) | FAIL |
What corpus CAN determine (training-free): - Embedding matrix E (via CT, Paper 51) — R²≈1 - Optimal depth L_opt (via PR, Paper 55) — exact for natural language - Asymmetry direction (via transitions, Paper 57) — 92.3% spectral match at L0 - L1 Gaussian centers (via E^T@E, Paper 56) — R²=0.983
What corpus CANNOT determine: - Actual weight VALUES within layers — R² ≈ 0 - The specific 2 PCA components vary per layer — not predictable from E^T@E - This is because solution non-uniqueness (Paper 54): many weight configs achieve same loss
The crystallization boundary is: - Embeddings: fully crystallizable (training-free) - Architecture: fully determinable (depth from PR) - Weight structure: compressible (L1×L2 = 100x) but not predictable - Weight values: require gradient information (SFT)
Zero-SFT remains blocked not by missing information in the corpus, but by the fundamental non-uniqueness of the attention solution manifold. The corpus constrains WHERE in parameter space the model should be (via embeddings + architecture), but the specific POINT on the solution manifold requires training.
mNaught (imaginary infinity inverse for effective parameters) requires either: 1. Solving the attention manifold geometry to pick a canonical point (Paper 54’s gap) 2. Finding that any point on the manifold works equally well (which it does — but you still need SFT to reach it) 3. A fundamentally different approach to zero-training attention
The bottleneck is no longer “what information does the corpus contain?” (answered: everything needed). It’s “how do you reach a valid solution without gradient descent?”
“The compass shows north perfectly. But you still have to walk there.”