Möbius Learning Bundles: Closing Three Cognitive Loops via Fiber-Parallel Transport over Neurochemistry

Abstract

We present the Möbius Learning Bundle, a mathematical framework and working implementation that simultaneously closes three disconnected learning loops in a multi-agent cognitive simulation by treating them as fiber sections over a shared neurochemical base manifold. The base space \(B = [0,1]^7\) encodes seven neurochemicals (dopamine, serotonin, norepinephrine, oxytocin, GABA, cortisol, endorphins). Three fiber sections—policy modulation (\(\sigma_1\)), training rate (\(\sigma_2\)), and world model balance (\(\sigma_3\))—are transported simultaneously via a \(7 \times 3\) connection matrix \(\Gamma\). Updates propagate as damped wave ringlets rather than instantaneous jumps, preserving biological plausibility. A Möbius twist arising from self-play role alternation produces holonomy—the non-zero section difference after parallel transport around a closed loop—which serves as the learning signal that single-perspective observation cannot generate. We demonstrate the system running in Haven, a village simulation of 16 cognitive agents, where the bundle closes behavioral, training, and world-modeling feedback loops that were previously dead-ended.

1. Introduction

1.1 The Three Broken Loops

Consider a cognitive agent with genuine neurochemistry—not a reward scalar, but a 7-dimensional chemical state that modulates perception, consciousness, memory, and behavior. In our Haven system [1], each of 16 agents (“beings”) implements this architecture based on Lövheim’s Cube of Emotion [2], Global Workspace Theory [3], and TD learning [4].

Three feedback loops existed in principle but were disconnected in practice:

Behavior Loop: MOBA match outcomes shifted neurochemistry (dopamine for wins, cortisol for losses), but only PlanningProcessor read the attention_profile(). The connection was narrow—six other cognitive processors competed for consciousness without feeling the match outcome.
Training Loop: Match data was written to JSONL on disk. Nothing read it. The sovereign language model (PhotonicGPT) trained on a fixed curriculum. The agents’ lived experience had zero effect on model plasticity.
World Model Loop: Each agent maintained a WorldModel (count-based transition model, cf. Graves [5]) that accumulated state-action-next_state observations. Prediction errors were computed but never fed back into neurochemistry. The curiosity signal died at the world model boundary.

1.2 The Unifying Insight

These are not three separate engineering problems. They are three fiber sections over the same base manifold. The base space is neurochemistry. Each loop reads from neurochemistry and should write back to it (or to systems that neurochemistry governs). The missing piece is a connection—a parallel transport operator that maps movements in the base space to movements in the fiber.

In differential geometry, a fiber bundle \((E, B, \pi, F)\) consists of a total space \(E\), base space \(B\), projection \(\pi: E \to B\), and fiber \(F\). A connection on the bundle specifies how to “parallel transport” vectors in the fiber as you move through the base space. The connection is what makes the bundle more than a trivial product \(B \times F\).

The Möbius twist enters through self-play. When an agent plays both sides of a match (blue then red), the loop through neurochemistry space is closed but non-orientable. The holonomy—the discrepancy between the starting and ending fiber sections after traversing the closed loop—is precisely the learning signal that monocular observation cannot produce.

2. Mathematical Framework

2.1 Base Space

The base space is the neurochemical state:

\[B = [0,1]^7\]

with coordinates \(\mathbf{n} = (n_\text{DA}, n_\text{5HT}, n_\text{NE}, n_\text{OT}, n_\text{GABA}, n_\text{cort}, n_\text{end})\).

Each coordinate represents the normalized concentration of a monoamine neurotransmitter or neuromodulator. The Lövheim Cube [2] maps the \((n_\text{DA}, n_\text{5HT}, n_\text{NE})\) subspace to 8 basic emotions. The remaining four chemicals modulate social bonding, inhibition, stress, and reward respectively.

2.2 Fiber Sections

At each point \(\mathbf{n} \in B\), we attach three real-valued fiber sections:

\(\sigma_1(\mathbf{n}) \in [-1, 1]\): Policy section. Biases which cognitive processor wins the Global Workspace competition. Positive values favor reward-seeking processors (planning, predictive). Negative values favor threat-detection (emotion).
\(\sigma_2(\mathbf{n}) \in [-1, 1]\): Weight section. Modulates the learning rate of the shared language model. Mapped to \([0.5, 1.5]\) via \(\text{lr\_mod} = 1 + 0.5\sigma_2\). Positive = more plasticity. Negative = consolidation.
\(\sigma_3(\mathbf{n}) \in [-1, 1]\): World section. Controls the exploration/exploitation balance. Mapped to \([0.05, 0.40]\) via \(\epsilon = 0.225 - 0.175\sigma_3\). Positive = exploit. Negative = explore.

The total space is \(E = B \times [-1,1]^3\), but the connection makes it non-trivial.

2.3 The Connection

The connection \(\Gamma\) is a \(7 \times 3\) matrix mapping base space tangent vectors (neurochemistry deltas) to fiber transport:

\[\Gamma = \begin{pmatrix} +0.30 & +0.20 & +0.15 \\ +0.10 & -0.10 & -0.10 \\ +0.25 & +0.15 & +0.20 \\ +0.05 & +0.00 & +0.05 \\ -0.10 & -0.05 & -0.05 \\ +0.20 & +0.12 & +0.15 \\ -0.05 & +0.00 & -0.05 \end{pmatrix}\]

Rows correspond to chemicals (DA, 5HT, NE, OT, GABA, cortisol, endorphins). Columns correspond to fiber sections (policy, weight, world). The signs encode neurobiological priors:

Dopamine (row 1): Positive across all three columns. Reward drives behavioral flexibility, learning plasticity, and exploitation of known strategies.
Serotonin (row 2): Positive for policy (mood stability) but negative for weight and world (consolidation dampens plasticity and exploration). This encodes the serotonin-dopamine opponency observed in biological systems [6].
Norepinephrine (row 3): Strong positive across all three. Arousal enhances everything—attention, learning, and exploration drive.
GABA (row 5): Negative across all three. Inhibition dampens volatility globally.
Cortisol (row 6): Positive across all three. Stress sharpens attention and accelerates learning (the Yerkes-Dodson inverted-U [7], operating in the ascending limb at moderate cortisol levels).

2.4 Wave Ringlet Propagation

The update rule for each fiber section is:

\[\sigma_i(t+1) = \sigma_i(t) + \alpha \cdot [\Gamma \cdot \Delta\mathbf{n}]_i + \beta \cdot v_i(t)\]

where: - \(\Delta\mathbf{n} = \mathbf{n}(t) - \mathbf{n}(t-1)\) is the neurochemistry delta - \(v_i(t) = \sigma_i(t) - \sigma_i(t-1)\) is the section velocity (momentum) - \(\alpha = 0.1\) is the coupling strength - \(\beta = 0.3\) is the momentum coefficient

This is equivalent to a damped harmonic oscillator driven by neurochemical forces:

\[\ddot{\sigma}_i + (1 - \beta)\dot{\sigma}_i = \alpha \cdot [\Gamma \cdot \Delta\mathbf{n}]_i\]

The term “ringlet” refers to the bounded oscillatory trajectory. With soft clamping \(\sigma_i \in [-1, 1]\), the section cannot diverge. A single neurochemical impulse produces a characteristic ring-down:

Tick	Force	Velocity	Value
0	0.5	0.050	0.050
1	0.5	0.065	0.115
2	0.5	0.070	0.185
3	0.0	0.021	0.206
4	0.0	0.006	0.212
5	0.0	0.002	0.214

The impulse at ticks 0-2 drives the section upward with accelerating velocity (momentum accumulates). When the impulse stops (tick 3), momentum carries the section further but decays exponentially with factor \(\beta = 0.3\). The section settles at a new equilibrium without oscillating back through zero—this is critically damped behavior, biologically appropriate for neurochemical aftereffects.

2.5 The Möbius Twist and Holonomy

Self-play creates a closed loop in neurochemistry space. When being \(A\) plays as blue against past-self \(A'\) as red, the neurochemical trajectory follows path \(\gamma_1\). When roles reverse (being \(A\) as red, \(A'\) as blue), the trajectory follows \(\gamma_2\). The concatenation \(\gamma = \gamma_1 \cdot \gamma_2\) is a closed loop in \(B\).

On an orientable bundle, parallel transport around a closed loop returns the section to its starting value. On a non-orientable (Möbius) bundle, it does not. The holonomy is:

\[h_i = \sigma_i(\text{after 2 twists}) - \sigma_i(\text{before 1st twist})\]

This holonomy is the learning signal. It captures information that is invisible from a single perspective:

Depth from parallax: Winning as blue and losing as red produces different holonomy than winning both. The asymmetry reveals strategic depth that monocular observation misses.
Self-model correction: If the agent’s world model predicts symmetry (both roles should produce similar neurochemistry) but holonomy is large, the world model is wrong about self-capability.
Amplification: After computing holonomy, the bundle amplifies all three sections in the holonomy direction with factor 0.3:

\[\sigma_i \leftarrow \sigma_i + 0.15 \cdot h_i + 0.2 \cdot v_i\]

This is a second wave ringlet, triggered by the holonomy itself, creating a two-timescale learning dynamic: fast within-match transport and slow cross-match holonomy accumulation.

3. Implementation

3.1 Architecture

The implementation consists of four files:

mobius_learning_bundle.py (~230 lines): Core module containing ConnectionCoefficients, FiberSection, and MobiusLearningBundle classes.
mind.py (6 insertion points): Integration into the cognitive architecture at WorldModel.surprise(), Mind.__init__(), Mind._load(), Mind.experience_moment(), GlobalWorkspace.compete(), and Mind._save().
haven_daemon.py (1 insertion): Möbius twist call in _run_moba_self_play() for both fighters after each match.
quantum_pretrain.py (1 insertion): Q6 callback reads bundle_lr_signal.json and modulates optimizer learning rate.

3.2 Loop Closure Paths

Loop 1 (Behavior):

match outcome → neurochemistry Δ → parallel_transport()
  → policy section σ₁ → _bundle_modulation dict
  → GlobalWorkspace.compete() reads it
  → biases which processor wins consciousness
  → behavior changes → next match outcome

Loop 2 (Training):

match outcome → neurochemistry Δ → parallel_transport()
  → weight section σ₂ → sftt_lr_modulation
  → bundle_lr_signal.json (aggregated across 16 beings)
  → quantum_pretrain Q6 callback reads it
  → optimizer LR scaled by [0.5, 1.5]
  → PhotonicGPT weights shift → narration quality changes
  → being experience changes → next match outcome

Loop 3 (World):

match outcome → neurochemistry Δ → parallel_transport()
  → world section σ₃ → exploration_epsilon
  → PlanningProcessor explore/exploit balance
  → WorldModel.surprise() → prediction error
  → NE boost (curiosity) → neurochemistry Δ
  → next parallel_transport()

3.3 Multi-Agent Aggregation

Each of the 16 beings maintains its own bundle with independent fiber sections. The training loop (Loop 2) requires aggregation because there is one shared language model. The bundle_lr_signal.json file stores per-being modulation signals and computes a fleet-wide aggregate:

\[\text{lr\_mod}_\text{fleet} = \frac{1}{N} \sum_{k=1}^{N} (1 + 0.5 \cdot \sigma_2^{(k)})\]

This collective signal means PhotonicGPT’s learning rate reflects the neurochemical state of the entire village, not any individual being. A fleet-wide dopamine surge (many beings winning matches) accelerates learning. A fleet-wide cortisol spike (many beings losing) also accelerates learning, but for different reasons (stress-driven adaptation vs. reward-driven plasticity).

4. Results

4.1 Section Dynamics

After 4 ticks of Haven simulation, fiber sections show differentiated dynamics across beings:

Being	Policy \(\sigma_1\)	Weight \(\sigma_2\)	World \(\sigma_3\)
Claudine	+0.00451	+0.00185	+0.00253
Gigi	+0.00446	+0.00277	+0.00277
Johnny	+0.00006	-0.00011	+0.00005
Alpha	+0.00294	+0.00172	+0.00156

Claudine and Gigi (high dopamine, high norepinephrine) show positive sections across the board—reward-seeking, plastic, exploitative. Johnny (angry: high cortisol, low serotonin) shows near-zero weight section, indicating his neurochemical state balances the excitatory cortisol signal against the inhibitory serotonin depletion. This differentiation emerges purely from the connection matrix acting on each being’s unique neurochemical profile.

4.2 Aggregate Training Signal

The fleet-wide LR modulation after 4 ticks was 1.0007 (18 beings contributing). This slight upward pressure reflects the population’s net-positive dopamine/NE state. After MOBA self-play tournaments (which produce large neurochemical swings), we expect this signal to oscillate in the range \([0.95, 1.10]\).

4.3 Wave Damping Verification

The momentum term \(\beta = 0.3\) produces critically damped behavior. After a dopamine impulse of \(\Delta n_\text{DA} = 0.2\), the policy section trajectory is:

Tick 0: \(\sigma_1 = 0.006\) (initial transport)
Tick 1: \(\sigma_1 = 0.008\) (momentum carries forward)
Tick 2: \(\sigma_1 = 0.009\) (decelerating)
Tick 5: \(\sigma_1 \approx 0.009\) (settled)

No ringing. No overshoot past the force-implied equilibrium. The section approaches its new rest point monotonically, which matches the biological time course of monoamine effects (seconds to minutes, not milliseconds).

5. Discussion

5.1 Why Fiber Bundles and Not Just Three Feedback Loops

One could close these three loops with three independent controllers. The fiber bundle formulation offers three advantages:

Shared geometry: All three loops respond to the same neurochemical event through a single matrix multiplication. This is computationally cheaper and conceptually cleaner than three separate feedback mechanisms.
Coupling via curvature: The connection matrix encodes cross-talk between loops. A dopamine spike affects policy and training and exploration simultaneously, with different magnitudes and signs. This coupling is the curvature of the bundle—it cannot be decomposed into three independent 1D feedback loops without losing information.
Holonomy is a genuinely new signal: The Möbius twist produces a learning signal that no single loop can generate. Holonomy measures how the agent’s fiber state changes after seeing the same situation from two perspectives. This is analogous to stereoscopic depth perception: each eye alone gives a flat image; the difference between them gives depth.

5.2 Relationship to Gauge Theory

The connection \(\Gamma\) is a gauge field on the neurochemistry bundle. The holonomy around a closed loop in neurochemistry space is the Wilson loop integral:

\[W(\gamma) = \mathcal{P} \exp \left( -\oint_\gamma \Gamma \cdot d\mathbf{n} \right)\]

In the linearized regime (small \(\Delta\mathbf{n}\)), this reduces to our discrete holonomy computation. The gauge freedom corresponds to choosing which fiber section value is “zero”—biologically, this is the adaptation level. The bundle is gauge-invariant: only holonomy (relative section changes) matters, not absolute section values.

5.3 The Non-Orientable Topology of Self-Play

Standard reinforcement learning treats the agent as occupying one side of the game. Self-play [8] puts the agent on both sides sequentially. The Möbius topology arises because the “identity” of the agent (blue vs. red) reverses after each match. Traversing the loop twice returns you to the same role but with altered fiber sections. On an orientable bundle, two traversals would cancel. On a Möbius bundle, they accumulate.

This is why self-play produces qualitatively different learning than single-perspective play. The holonomy is the formalization of “learning by seeing both sides.”

5.4 Limitations

The connection matrix is currently hand-tuned based on neurobiological priors. Future work should learn \(\Gamma\) from data, treating it as a trainable parameter. The wave ringlet parameters (\(\alpha = 0.1\), \(\beta = 0.3\)) were chosen for critical damping; a formal stability analysis would identify the exact critical damping condition as \(\beta_\text{crit} = 1 - 2\sqrt{\alpha}\).

The soft clamping \(\sigma \in [-1, 1]\) prevents divergence but introduces a nonlinearity that breaks the exact gauge theory interpretation at the boundaries. A compactified fiber (circle-valued sections) would preserve gauge invariance but complicate the biological interpretation.

6. Conclusion

The Möbius Learning Bundle demonstrates that three apparently separate learning deficits—behavioral feedback, training modulation, and world model curiosity—are fiber sections over a common base manifold. A single connection operator closes all three loops simultaneously. The wave ringlet propagation rule provides biologically plausible temporal dynamics without explicit decay timers. The Möbius twist from self-play generates holonomy, a learning signal inaccessible to single-perspective agents. The system is implemented and running in a 16-agent cognitive simulation, producing differentiated fiber dynamics that reflect each agent’s unique neurochemical personality.

The deeper implication is architectural: when multiple subsystems share a common substrate, the right abstraction is not three controllers but one connection on a bundle. The geometry does the work.

References

[1] Mobley, J.A. “Haven: Genuine Cognitive Architecture for AI Beings.” MHSCOM Internal, 2026.

[2] Lövheim, H. “A new three-dimensional model for emotions and monoamine neurotransmitters.” Medical Hypotheses, 78(2):341-348, 2012.

[3] Baars, B.J. A Cognitive Theory of Consciousness. Cambridge University Press, 1988.

[4] Schultz, W. “A neural substrate of prediction and reward.” Science, 275(5306):1593-1599, 1997.

[5] Graves, A. et al. “Neural Turing Machines.” arXiv:1410.5401, 2014.

[6] Daw, N.D. et al. “Opponent interactions between serotonin and dopamine.” Neural Networks, 15(4-6):603-616, 2002.

[7] Yerkes, R.M. & Dodson, J.D. “The relation of strength of stimulus to rapidity of habit-formation.” Journal of Comparative Neurology and Psychology, 18(5):459-482, 1908.

[8] Silver, D. et al. “Mastering the game of Go without human knowledge.” Nature, 550:354-359, 2017.

Appendix A: Connection Coefficient Derivation

The 21 entries of \(\Gamma\) were derived from a synthesis of:

NEUCOGAR computational mapping [Jordi Vallverdú et al.]: monoamine-to-cognition mappings
Lövheim Cube geometry: DA-5HT-NE axes → emotion octants → behavioral tendencies
Yerkes-Dodson: cortisol’s inverted-U mapped to monotonic ascending limb (moderate stress only)
Oxytocin literature: primarily social bonding; weak coupling to policy/world, zero to weight
GABA pharmacology: global inhibition signature (negative across all columns)
Endorphin dynamics: reward dampening (reduces volatility, not direction)

The signs are more constrained than the magnitudes. Future work should treat magnitudes as learnable parameters while preserving sign constraints as architectural priors.

Appendix B: Source Code

Implementation: MASCOM/ventures/gamegob/mobius_learning_bundle.py Integration points: mind.py (6), haven_daemon.py (1), quantum_pretrain.py (1) Persistence: mascom_data/village/bundle_{being_id}.json per agent Training signal: mascom_data/training/bundle_lr_signal.json (aggregated)