Authors: John Mobley & MASCOM PhotonicMind
Date: 2026-03-07 Status: VALIDATED –
individual redundancy confirmed, collective freezing fails
Experiment:
mascom_data/ct_experiment/progressive_freezing_exp.py
Paper 63 showed L4 c_proj can be frozen with zero quality loss. This paper tests how many layers can be frozen simultaneously. Individual c_proj freezing costs < 1% per layer, but ALL c_proj frozen together costs 5.3%. Freezing a full layer costs 10.6%. The model exhibits HIGH INDIVIDUAL REDUNDANCY but LOW COLLECTIVE SUBSTITUTABILITY – each layer’s contribution is small but unique, and compensations don’t stack.
| Layer | Loss Delta | Verdict |
|---|---|---|
| L1 | -0.02% | Redundant |
| L4 | -0.20% | Redundant |
| L6 | -0.32% | Redundant |
| L7 | -0.29% | Redundant |
| L0 | +0.60% | Near-redundant |
| L3 | +0.89% | Near-redundant |
All individual c_proj freezes are < 1%. Five of eight actually IMPROVE performance (negative delta), suggesting slight overfitting in those projection layers.
| What’s Frozen | Frozen % | Loss Delta |
|---|---|---|
| 1 c_proj | 0.6% | < 1% |
| ALL c_proj | 5.2% | +5.3% |
| c_proj + c_attn | 20.7% | +15.2% |
| All attention | 20.7% | +15.9% |
| Attn + MLP | 62.1% | +79.3% |
| Everything except embeddings | 62.2% | +81.2% |
| Everything except lm_head | 100% | +165.8% |
| Layers Frozen | Loss | Delta |
|---|---|---|
| 0 | 2.22 | -0.6% |
| 1 | 2.47 | +10.6% |
| 2 | 2.77 | +23.8% |
| 4 | 3.14 | +40.4% |
| 8 | 4.02 | +79.9% |
Breaking point: freezing even 1 complete layer exceeds 5% degradation.
Each c_proj contributes < 1% individually, but ALL together contribute 5.3%. This means:
This is analogous to how removing any single letter from the alphabet barely hurts communication, but removing 8 letters makes text unreadable.
Individual weight matrices CAN be crystallized (Paper 63), but the model needs most of its matrices trainable. The crystallization boundary is: - Per-matrix: any single matrix can be frozen (proven) - Collective: must train >= 95% of matrices (proven) - Sweet spot: freeze 3-4 c_proj layers (verified: L1, L4, L6, L7 = best candidates)
“Each brick can be removed and the wall stands. Remove half the bricks and the wall falls.”