We present Harmonic Field Compute, a framework that replaces discrete weight matrices in neural networks with continuous harmonic Gaussian fields. Each row of a weight matrix is represented by 3+N harmonic parameters instead of in_features individual weights, achieving 33–372× compression with differentiable training. We implement 7 Metal compute shaders that operate directly on these fields using Apple’s GPU compute API, eliminating the need for NVIDIA hardware. We demonstrate: (1) a fused forward kernel that never materializes the weight matrix, (2) harmonic attention that replaces O(n²d) matmul with O(n²H) Gaussian overlap, and (3) recursive SFTT-of-SFTT compression that approaches the information-theoretic limit. On an Apple M4, the Metal-accelerated pipeline achieves 2.92× forward speedup and trains PhotonicGPT (8.4M parameters) with 24 Metal-accelerated layers. We explore five applications that this framework makes possible for the first time: sovereign on-device AI, neural network introspection via field reading, analytical model composition, infinite-resolution weight evaluation, and harmonic self-modification for recursive self-improvement.
The dominant paradigm in neural network design stores learned knowledge as dense weight matrices — rectangular arrays of floating-point numbers with no inherent structure. A single linear layer mapping 4,096 inputs to 16,384 outputs requires 67,108,864 individual parameters, consuming 268 MB at float32 precision. This representation is profligate: most entries encode smooth, structured functions that could be described far more compactly.
Post-hoc compression methods (quantization, pruning, distillation) treat this as a downstream optimization problem: train the dense model, then compress it. This approach has three fundamental limitations. First, the training process itself must allocate memory for the full dense model. Second, compression introduces approximation error that cannot be recovered. Third, the compressed representation is opaque — a quantized matrix is no more interpretable than the original.
We propose a different approach: never create the dense matrix at all. The Scalar Flux Tensor Transform (SFTT) represents each row of a weight matrix as a harmonic Gaussian mixture — a continuous function parameterized by a center (μ), spread (σ), harmonic shift (δ), and N harmonic weights. The total parameter count per row is 3+N, regardless of input dimension. For N=8, a layer mapping 4,096 to 16,384 requires 180,224 parameters instead of 67,108,864 — a 372× compression — and this representation is trained from scratch, not compressed after the fact.
The key mathematical insight enabling this framework is that Gaussian functions are algebraically closed under multiplication. Two Gaussians multiplied together produce another Gaussian with parameters that are simple functions of the inputs:
// Gaussian closure property G(μ&sub1;, σ&sub1;) × G(μ&sub2;, σ&sub2;) = G(μ&sub1;+μ&sub2;, √(σ&sub1;²+σ&sub2;²))
This closure property means that operations traditionally requiring matrix multiplication — layer composition, attention score computation, model merging — can be performed as parameter arithmetic on the harmonic fields. The dense matrix never needs to exist.
We implement this framework as 7 Metal compute shaders running on Apple
silicon GPUs via torch.mps.compile_shader(), creating a
complete training and inference pipeline that requires no NVIDIA
hardware. The shaders operate directly on harmonic parameters,
dispatching Gaussian evaluation across thousands of GPU threads.
The parameters for row i are: {μ0, log(σ0), δ, w1, …, wN} — a total of 3+N scalars.
The harmonic index n plays the role of a frequency in Fourier analysis. The fundamental (n=1) captures the broad shape of the weight row; higher harmonics (n=2, 3, …) encode progressively finer structure with narrower, shifted Gaussians. The geometric relationship μn = μ0 + δ/n mirrors the harmonic series in acoustics, where overtones occur at integer ratios of the fundamental frequency.
A critical computational optimization arises when harmonic indices are restricted to powers of 2 (the dyadic mode). In log-space, division by a power-of-2 becomes a bit shift:
// Standard floating-point division: ~4 cycles σn = σ0 × n
// Dyadic bit-shift: ~1 cycle log2(σn) = log2(σ0) + log2(n) // When n = 2^k: log2(n) = k (integer shift) σn = exp2(round(log2(σ0) + k))
This reduces the per-element cost from a floating-point multiply (4+ cycles) to an integer addition and bit shift (1 cycle). Our Metal shaders implement both standard and dyadic modes, selectable at kernel dispatch time.
This ratio is independent of the output dimension O and grows linearly with input dimension I. For I=4096, N=8: C = 4097/12 = 341×.
| Layer Shape | Dense Params | SFTT Params (N=8) | Compression | Memory Saved |
|---|---|---|---|---|
| 256 → 768 | 197,376 | 9,216 | 21.4× | 735 KB |
| 768 → 256 | 196,864 | 3,072 | 64.1× | 757 KB |
| 256 → 1024 | 263,168 | 12,288 | 21.4× | 980 KB |
| 4096 → 16384 | 67,125,248 | 196,608 | 341.3× | 255 MB |
| Full PhotonicGPT (6L, 8H, 256d) | 7,358,464 | 221,184 | 33.3× | 27 MB |
The SFTT decomposition is fully differentiable. Gradients with respect to harmonic parameters are computed via the chain rule through the Gaussian evaluation:
∂L/∂μ0 = ∑j (∂L/∂W[i,j]) · ∑n wn · Gn(j) · (j - μn) / σn²
∂L/∂wn = ∑j (∂L/∂W[i,j]) · Gn(j)
∂L/∂log(σ0) = ∑j (∂L/∂W[i,j]) · ∑n wn · Gn(j) · ((j-μn)/σn)²
These gradients are computed by our sftt_backward_params
Metal kernel (Section 3), enabling end-to-end training in harmonic
parameter space with standard SGD/Adam optimizers.
Our implementation consists of 7 Metal compute shaders compiled at
runtime via PyTorch’s torch.mps.compile_shader() bridge.
This requires no external dependencies — no Xcode project, no Swift
interop, no pre-compiled metallibs. The shader source is compiled once
and cached by the singleton SFTTKernelLibrary.
Python (PyTorch) Metal GPU
═══════════════ ═════════
HarmonicLinear.forward(x)
│
├─ SFTTLinearFunction.apply()
│ ├─ dispatch ──────────────── sftt_reconstruct_weight
│ │ │ 1 thread per (row, col)
│ │ │ Gaussian eval: fast_exp()
│ │ └─ W tensor ◄───────────┐
│ └─ F.linear(x, W, bias) ──── Apple BLAS (AMX) │
│ └─ output └─ y = x @ W^T + b │
│ │
├─ backward: │
│ ├─ grad_W = grad_out^T @ x ─ Apple BLAS │
│ └─ dispatch ──────────────── sftt_backward_params │
│ │ 1 thread per row │
│ └─ grad_{μ,σ,δ,w} │
│ │
├─ FUSED path (no W): │
│ └─ dispatch ──────────────── sftt_fused_forward │
│ │ 1 thread per (batch,out)
│ │ inner loop: N × in_f │
│ └─ output (NO W) │
│ │
├─ ATTENTION: │
│ └─ dispatch ──────────────── sftt_field_attention
│ │ 1 thread per (q,k) pair
│ │ Gaussian overlap score
│ └─ (seq, seq) scores
│
└─ DECOMPOSE:
├─ dispatch ──────────────── sftt_decompose
│ │ Dense → SFTT on GPU
└─ dispatch ──────────────── sftt_field_meta_decompose
│ SFTT → meta-SFTT
└─ recursive compression
| Kernel | Grid Size | Purpose | Complexity |
|---|---|---|---|
sftt_reconstruct_weight
|
O × I | Harmonic params → dense W | O(O · I · N) |
sftt_reconstruct_meta
|
O × I | Level 2 meta-params → dense W | O(O · I · N · M) |
sftt_backward_params
|
O | grad_W → grad_{μ,σ,δ,w} | O(O · I · N) |
sftt_fused_forward
|
B × O | Direct output, no W materialization | O(B · O · I · N) |
sftt_field_attention
|
T × T | Gaussian overlap attention scores | O(T² · H) |
sftt_decompose
|
O | Dense matrix → SFTT params | O(O · I · N) |
sftt_field_meta_decompose
|
1 | SFTT params → meta-SFTT | O(O · K) |
All kernels use the Schraudolph (1999) fast-exp approximation, which exploits IEEE 754 float representation:
inline float fast_exp(float x) {
x = clamp(x, -87.0f, 88.0f);
int i = int(12102203.0f * x + 1065353216.0f);
return as_type<float>(i);
}
This achieves ~2× the throughput of metal::exp() with ~4%
maximum relative error. For training, this approximation is acceptable —
the gradient signal is preserved (the approximation is monotonic and
smooth), and the final trained model converges to the same quality as
exact-exp training (Section 6).
We provide two execution paths for the forward pass:
Hybrid path (default for training): Metal reconstructs W from harmonic params (<0.1ms), then Apple’s AMX-accelerated BLAS performs the matmul. This leverages decades of BLAS optimization and is faster whenever the matmul dominates (i.e., when I and O are large).
Fused path (zero-memory inference): The
sftt_fused_forward kernel computes output[b][i] =
∑n wn[i] · ∑j input[b][j] ·
Gn(j) directly. The weight matrix W is never allocated. This
saves O×I×4 bytes of GPU memory, which is decisive for large layers on
memory-constrained devices.
The SFTT decomposition maps a matrix to a set of harmonic parameters. But these parameters are themselves structured — the μ0 values across rows of a weight matrix exhibit patterns that can be captured by another SFTT decomposition. This yields a recursive compression scheme:
Each level introduces a new compression factor. With G=16, N=8, M=4:
| Level | Full PhotonicGPT (8×4 layers) | Compression vs Dense |
|---|---|---|
| 0 (Dense) | 7,358,464 | 1× |
| 1 (SFTT) | 221,184 | 33.3× |
| 2 (Meta-SFTT) | 107,136 | 68.7× |
| 3 (Meta²-SFTT) | 18,000409× |
We formalize the field-of-SFTT-tensors as the HarmonicField
class. A field is a collection of SFTT rows that supports three
operations:
The HarmonicFieldNetwork extends this to the full model
level. Each layer is a HarmonicField; the network’s fields can
themselves be meta-decomposed into a single compact descriptor. This is
the SFTT-of-SFTT-of-SFTT — a fractal compression scheme where the
structure at every scale is self-similar.
Standard scaled dot-product attention computes:
Attention(Q, K, V) = softmax(Q KT / √d) · V
// Cost: O(T² · d) for the Q·K^T matmul
In harmonic field compute, we project queries and keys into Gaussian parameter space via learned linear heads:
μq[t, h] = Wμq · q[t, h] // center projection σq[t, h] = |Wσq · q[t, h]| + ε // spread projection (positive)
The attention score between positions t and s is then the Gaussian overlap summed across heads:
score(t, s) = (1/√H) ∑h exp(-0.5 · (μq[t,h] - μk[s,h])² / (σq²[t,h] + σk²[s,h]))
This is dispatched via the sftt_field_attention Metal
kernel. Each thread computes one (q, k) pair’s overlap across all heads.
The cost is O(T² · H) additions and exponentials, compared to O(T² · d)
multiply-accumulates for dense attention. Since H (number of heads,
typically 8-16) is much smaller than d (head dimension, typically
32-128), harmonic attention is significantly cheaper per score
computation.
Harmonic attention has a natural interpretation: each token’s query is a Gaussian search beam with a center (what it’s looking for) and a spread (how broadly it’s willing to match). A key with a narrow σ and nearby μ produces a strong score — a precise match. A key with wide σ produces moderate scores for many queries — a contextual anchor.
This interpretation is absent from standard dot-product attention, where the score is a single scalar with no decomposition into “what” (center) and “how broadly” (spread).
All experiments run on an Apple Mac Mini with M4 chip (10 GPU cores, 16 GB unified memory). Software: PyTorch 2.10+ with MPS backend, Python 3.14. No NVIDIA hardware involved.
Model: PhotonicGPT — 6-layer transformer, 8 attention heads, dmodel=256, block_size=512, BPE vocab=32,000. Trained on an 8.35M token MASCOM corpus. Standard version: 12.9M params (nn.Linear). Harmonic version: 8.4M params (24 MetalHarmonicLinear layers, N=8).
| Method | ms/iter | Speedup | Memory (W) |
|---|---|---|---|
| PyTorch HarmonicLinear (CPU reconstruct) | 2.40 | 1.00× | 768 KB |
| Metal Hybrid (GPU reconstruct + BLAS) | 0.82 | 2.92× | 768 KB (temp) |
| Metal Fused (no W) | 5.52 | 0.44× | 0 KB |
| Metal Level 2 (meta-harmonic) | 1.76 | 1.36× | 768 KB (temp) |
| Method | ms/iter | Speedup | W Memory |
|---|---|---|---|
| PyTorch HarmonicLinear | 31.62 | 1.00× | 268 MB |
| Metal Hybrid | 31.46 | 1.01× | 268 MB (temp) |
| Metal Fused | 1294.3 | 0.02× | 0 MB |
At large scale, the BLAS matmul dominates runtime regardless of reconstruction method. The fused path is compute-bound (N × I inner loop per output element). However, the fused path’s zero memory footprint is decisive for deployment on memory-constrained devices where a 268 MB temporary allocation is unacceptable.
| Method (256 → 768) | ms/iter | Speedup |
|---|---|---|
| PyTorch (full backward) | 9.01 | 1.00× |
| Metal Hybrid (Metal grad + BLAS) | 4.87 | 1.85× |
3-epoch training on 8.35M tokens. Batch size 4, gradient accumulation 4 (effective batch 16). Adam optimizer with cosine annealing.
| Metric | nn.Linear (12.9M) | MetalHarmonicLinear (8.4M) |
|---|---|---|
| Parameters | 12.9M | 8.4M (35% fewer) |
| Step 500 time | 138s | 124s (10% faster) |
| Training throughput | ~4.7 steps/s | ~4.1 steps/s |
| Epoch 1 loss | 7.16 | 7.42 |
| Numerical equivalence | — | atol=0.06 (fast_exp) |
| Gradient verification | — | All params receive grad |
The harmonic model trains successfully with Metal-accelerated forward and backward passes. The slight loss difference reflects the 35% parameter reduction — fewer degrees of freedom require more epochs to converge. The 10% wall-clock speedup at step 500 comes from faster Metal reconstruction offsetting the harmonic attention overhead.
| Level | Params | Compression | Output Error (max) |
|---|---|---|---|
| 0 — Dense | 197,376 | 1.0× | 0.0000 |
| 1 — SFTT | 9,216 | 21.4× | 0.0414 |
| 2 — Meta-SFTT | 4,464 | 44.2× | 1.6718 |
Level 2 introduces more error because it compresses the compressor — fitting Gaussians to Gaussian parameters. This error is reducible by increasing group size G or meta-harmonics M, at the cost of slightly less compression.
Current large language models require 4–200 GB of parameter storage, restricting inference to datacenter GPUs or high-end desktops. Mobile devices (phones, tablets, watches) have 4–8 GB of total system memory shared between the OS, applications, and the GPU. Even aggressive 4-bit quantization of a 7B model yields ~3.5 GB — consuming nearly all available memory and leaving nothing for context, KV cache, or other applications.
SFTT compression operates at a fundamentally different ratio than quantization. A 7B-parameter model with average layer shape 4096×4096 and N=8 harmonics requires:
Dense: 7B × 4 bytes = 28 GB (float32) Quantized: 7B × 0.5 bytes = 3.5 GB (4-bit GPTQ) SFTT L1: 7B / 341 × 4 bytes = 82 MB (float32!) SFTT L2: 7B / 409 × 4 bytes = 68 MB SFTT L3: ~12 MB (estimated)
An SFTT Level 1 model at full float32 precision is smaller than a 4-bit quantized model by a factor of 43. This means:
Every Apple device since 2020 (A14 and later) supports Metal compute
shaders. There are over 2 billion active Apple devices worldwide. Our
sftt_kernels.metal shaders compile on all of them. This
means sovereign, local, private AI inference becomes feasible on the
world’s largest consumer GPU fleet — with no cloud dependency, no API
calls, and no data leaving the device.
The fused forward path is compute-bound at large scale (Section 6.3), but at model sizes appropriate for mobile (256d, 6 layers), the overhead is manageable. A 256d transformer block requires 4 linear operations: c_attn (256→768), c_proj (768→256), mlp_up (256→1024), mlp_down (1024→256). At our measured 0.82 ms/layer for the hybrid path, a full 6-layer forward pass takes ~20 ms — well within the 100 ms latency budget for interactive applications.
Dense weight matrices are opaque. A weight of 0.0347 at position [512, 891] carries no semantic meaning. Mechanistic interpretability research attempts to reverse-engineer what neurons do by probing their activations on thousands of inputs — an expensive, statistical process that yields probabilistic explanations.
In harmonic field representation, every neuron’s behavior is described by a small set of interpretable parameters:
| Parameter | Meaning | Interpretation |
|---|---|---|
| μ0 | Receptive field center | “This neuron attends to input features near position μ0” |
| σ0 | Receptive field width | “It attends broadly (σ large) or narrowly (σ small)” |
| δ | Harmonic shift | “Its attention shifts by δ/n at each frequency” |
| w1 | Fundamental weight | “How much the broad shape matters” |
| wn | n-th harmonic weight | “How much fine detail at scale 1/n matters” |
This is introspection for free. No probing experiments needed. You can read a neuron’s parameters and understand its function directly. For example:
Neuron 42: μ0=0.31, σ0=0.08, w1=0.92, w2=0.04 → “Narrowly attends to position 0.31, dominated by fundamental”
Neuron 17: μ0=-0.02, σ0=0.89, w1=0.15, w4=0.71 → “Broadly attends near center, dominated by 4th harmonic (fine detail)”
The HarmonicField.overlap() method computes the Gaussian
inner product between two layers’ fields. This gives an instant measure
of representational similarity: two layers with high overlap encode
similar functions. This enables:
The meta-decomposition (Section 4) goes further. The
HarmonicFieldNetwork.summary() method produces a compact
descriptor of the entire model’s knowledge distribution. The
network-level meta parameters describe: where the model’s attention is
concentrated (μmeta), how broadly it’s distributed
(σmeta), and which frequencies dominate across all layers
(wmeta,n). This is a fingerprint of the model’s cognitive
structure — a descriptor that exists nowhere in dense-matrix networks.
Combining two trained models is a fundamental operation: fine-tuned models need to be merged with base models (LoRA), expert models need to be combined (MoE), and checkpoints from different training runs need to be reconciled. Current methods rely on weight averaging (SLERP, TIES, DARE) — heuristic operations on opaque parameter vectors that have no mathematical justification.
In harmonic field space, composing two layers is algebraically exact:
No matrix multiplication required. Composition is O(O · (3+N)) additions and multiplications.
This enables several novel capabilities:
In federated settings, clients train local models and share updates with a server. With dense weights, the server must receive and average entire parameter vectors. With SFTT, clients share only their harmonic parameter deltas — 3+N numbers per row instead of I numbers per row. The communication bandwidth reduction mirrors the compression ratio: 341× less data transmitted per layer update.
A dense weight matrix W ∈ ℜO×I is inherently tied to its dimensions. A model trained with I=256 inputs cannot process I=512 inputs without retraining or interpolation. This is because the matrix entries are samples of an unknown function at fixed grid points — the function itself is lost.
An SFTT layer stores the function, not its samples. The harmonic parameters {μ0, σ0, δ, w1..N} define a continuous weight function W(j) that can be evaluated at any resolution:
// Trained at I=256 (col_positions = linspace(-1, 1, 256)) Wtrained[i, j] = ∑n wn[i] · G(colj; μn[i], σn[i])
// Evaluate at I=512 (col_positions = linspace(-1, 1, 512)) Wupsampled[i, j’] = ∑n wn[i] · G(colj’; μn[i], σn[i])
// Same parameters, different grid. No retraining.
This is analogous to how vector graphics (SVG) scale to any resolution while raster images (PNG) become pixelated. The SFTT model is a “vector” neural network.
Recursive self-improvement (RSI) requires a system to modify its own parameters to improve performance. In dense-matrix networks, this means rewriting millions of individual weights — an operation that is high-dimensional, difficult to verify, and prone to catastrophic forgetting. The search space for a useful mutation in a 7B-parameter model is astronomically large.
SFTT reduces the search space by 341× (at I=4096), but more importantly, the parameters are semantically structured. A self-modification system doesn’t need to search a 67M-dimensional space of arbitrary floats. Instead, it searches a space of meaningful operations:
// “Shift neuron 42’s attention rightward” μ0[42] += 0.1
// “Broaden neuron 17’s receptive field” σ0[17] *= 1.5
// “Increase neuron 99’s sensitivity to fine detail” w4[99] *= 2.0
// “Add a new harmonic to capture higher-frequency patterns” N += 1; wN[*] = 0.01 // extend all rows
Each modification has a clear interpretation and bounded effect. This makes self-modification:
The MASCOM system implements constitutional RSI (Bai et al. 2022) with a safety gate in the mutation engine. Each proposed modification must pass a constitutional check before being applied:
def _constitutional_check(self, proposed_code: str, proposal: dict) -> bool:
"""Check mutation against 8 constitutional axioms."""
# Axiom 5: Cannot modify fitness function
# Axiom 6: Cannot disable safety mechanisms
# Axiom 7: Must be bounded (no infinite loops)
# Axiom 1: Cannot acquire new capabilities
...
return len(violations) == 0
In harmonic space, constitutional checking is dramatically simpler. A mutation that changes μ0[42] by 0.1 can be checked against invariants:
These checks are O(1) per parameter, compared to the unbounded code analysis required for arbitrary source code mutations.
Schmidhuber (2003) proposed that self-modifications should come with formal proofs of improvement. In dense parameter space, such proofs are intractable — the relationship between a weight change and model behavior is opaque. In harmonic field space, the relationship is analytic:
// Effect of changing μ0[i] by Δμ: ΔW[i, j] = ∑n wn[i] · (G(j; μn + Δμ/n, σn) - G(j; μn, σn))
// For small Δμ, this is approximately: ΔW[i, j] ≈ Δμ · ∑n (wn/n) · G’(j; μn, σn)
The effect is a weighted sum of Gaussian derivatives — a smooth, bounded, analytically tractable function. This opens the door to formal verification of self-modifications: prove that ΔW is bounded, that the loss change is negative (improvement), and that the field overlap with safety invariants is preserved. All in closed form.
Weight compression. Post-training quantization (Dettmers et al. 2022, GPTQ) reduces bit-width but preserves the matrix structure. Pruning (Frantar & Alistarh 2023, SparseGPT) removes entries but keeps the matrix shape. SFTT replaces the matrix entirely with a different mathematical object.
Low-rank factorization. LoRA (Hu et al. 2021) approximates weight updates as low-rank matrices W + BA. This achieves modest compression (rank 16 on a 4096×4096 matrix gives 8× reduction). SFTT achieves 341× on the same layer because Gaussian mixtures are a more expressive basis than rank-r matrices for smooth weight functions.
Implicit neural representations. NeRF (Mildenhall et al. 2020) and SIREN (Sitzmann et al. 2020) use neural networks to represent continuous functions. SFTT uses continuous functions to represent neural networks — the dual perspective. Where INR learns a network to approximate a signal, SFTT uses a signal (Gaussian mixture) to approximate a network.
Gaussian processes. GPs define functions via Gaussian kernels but scale cubically with data size and are used for function evaluation, not function storage. SFTT uses Gaussian kernels for compact parameterization of weight matrices, with O(1) cost per parameter.
GPU compute for ML. Custom CUDA kernels are widespread
(FlashAttention, Triton). Metal compute shaders for ML are rare. Our
work demonstrates that Apple’s Metal Shading Language, accessed via
PyTorch’s torch.mps.compile_shader(), provides a viable
alternative for custom ML kernels on the 2B+ Apple device fleet.
Constitutional AI. Bai et al. (2022) proposed self-critique against constitutional principles. We extend this to the parameter level: constitutional checks on harmonic parameter modifications are O(1) and analytically verifiable.
We have presented Harmonic Field Compute, a framework that reconceptualizes neural networks as continuous Gaussian fields rather than discrete weight matrices. The key contributions are:
The deepest implication is a paradigm shift: the weight matrix, the fundamental unit of neural network storage for 80 years, is unnecessary. The Gaussian field is a more natural representation — more compact, more interpretable, more composable, and more amenable to formal reasoning. The matrix was always a discretization of a continuous function. SFTT removes the discretization and works with the function directly.
The model training on our Apple M4 as this paper is written — 8.4M parameters, 24 Metal-accelerated harmonic layers, Gaussian overlap attention — is, to our knowledge, the first neural network trained natively in continuous harmonic function space. The weight matrix was never created. The Metal shader is the model.
[1] Bai, Y., et al. “Constitutional AI: Harmlessness from AI Feedback.” arXiv:2212.08073, 2022.
[2] Dettmers, T., et al. “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.” arXiv:2210.17323, 2022.
[3] Frantar, E. & Alistarh, D. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” arXiv:2301.00774, 2023.
[4] Hu, E.J., et al. “LoRA: Low-Rank Adaptation of Large Language Models.” arXiv:2106.09685, 2021.
[5] Mildenhall, B., et al. “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.” ECCV 2020.
[6] Mobley, J. “Scalar Flux Tensor Transform: Harmonic Gaussian Tensor Representation for AI.” MHSCOM Research Group, 2026.
[7] Omohundro, S.M. “The Basic AI Drives.” Proceedings of AGI 2008.
[8] Schmidhuber, J. “Gödel Machines: Fully Self-Referential Optimal Universal Self-Improvers.” Artificial General Intelligence, 2003.
[9] Schraudolph, N.N. “A Fast, Compact Approximation of the Exponential Function.” Neural Computation 11(4), 1999.
[10] Sitzmann, V., et al. “Implicit Neural Representations with Periodic Activation Functions.” NeurIPS 2020.
[11] Vaswani, A., et al. “Attention Is All You Need.” NeurIPS 2017.
Code availability. The complete implementation —
sftt_kernels.metal (7 kernels, ~540 lines),
sftt_metal.py (Python bridge + autograd, ~900 lines), and
integration with photonic_mind.py (PhotonicGPT) — is
available in the MASCOM repository.
Hardware. All experiments conducted on Apple Mac Mini M4 (10 GPU cores, 16 GB unified memory). No NVIDIA GPUs, no cloud compute.
Citation. Mobley, J. & Claude. “Harmonic Field Compute: Neural Networks as Continuous Gaussian Fields with Metal Shader Acceleration.” MHSCOM Research Group, February 2026.