All the posts

Thoughts on AI, machine learning, distributed systems, and open-source development

Safer LLMS require open search - Building the AI Memory Layer

AI safety through topology‑aware, energy‑informed retrieval that separates stable facts from risky intuitions.

  • Shows how geometry‑only vector search and semantic caching accumulate retrieval errors, turning context drift into subtle hallucinations.
  • Introduces arrowspace as an “open search” layer where graph Laplacians, energy dispersion, and topology‑quality scores expose and constrain off‑manifold results instead of hiding them inside black‑box similarity.
Read more →

Why arrowspace is game-changing for data operations at scale

Test‑bed milestone for a unified vector, graph, and key‑value engine built on spectral indexing and energy‑informed search.

  • Turns any dataset into a features graph, enabling manifold‑aware search, matching, ranking, and dataset characterization at any lifecycle stage.
  • Designed for high dimensions by default: robust on biotech‑scale sequences, large vocabularies, and model‑sized embedding spaces.

Read more →

Efficient GPT training: a dive into the architecture of a Rust-powered GPT-2

Deep Dive into a Rust implementation of a decoder-only transformer inspired by Karpathy's nanochat.

  • Breaks down the architecture of a modern LLM, explaining the role of key components for an experienced audience.
  • Covers modern techniques such as Rotary Position Embeddings (RoPE), Multi-Query Attention (MQA), RMSNorm, and the use of a Squared ReLU in the MLP.

Read more →

ArrowSpace v0.21.0: Proof of Concept for Energy-Informed Context Search

Milestone release completes the search–matching–ranking pipeline with stabilized energymaps module, delivering spectral vector search that finds matches beyond geometric proximity.

  • Two complete build paths: eigenmaps (spectral indexing from Laplacians) and energymaps (pure energy-first with optical compression, diffusion-split subcentroids, and automatic λτ computation).
  • CVE corpus diffusion sweep (300K docs) achieves Avg MRR 0.75, NDCG@10 0.7239 (η=0.22, steps=8) with stable 75–83s build times, confirming negligible diffusion overhead and strong spectral ranking quality.

Read more →

DeepSeek-OCR Optical Compression Meets Energy Search: Rust Implementation in ArrowSpace v0.18.0

Rust implementation of DeepSeek-OCR compression achieves 10× token reduction, while ArrowSpace v0.18.0 introduces energy-informed retrieval that replaces cosine similarity with spectral graph properties.

  • DeepEncoder architecture (SAM + CLIP + projector) replicated in Rust using burn.dev with cross-platform GPU support and five resolution modes from 64 to 400 tokens.
  • Energy search with diffusion parameter sweep on CVE corpus achieves NDCG@10 ≈ 0.99 (η=0.05, steps=6) and MRR=1.0 (η=0.05, steps=4) without any cosine similarity.

Read more →

taumode: Beyond Cosine Similarity on the CVE dataset

Evaluation on a CVE corpus spanning 1999 to 2025 shows spectral modes preserve head agreement with cosine while enhancing long‑tail relevance for analyst discovery.

  • Dataset loader sweeps years 1999 to 2025, generating 384‑D embeddings and shared candidate pools for cosine, hybrid, and taumode.
  • taumode achieves the highest Tail/Head ratio (≈0.9593) with the lowest tail variability across queries.

Read more →

Road for `arrowspace` to scale: Condense, Project, and Sparsify

This release rethinks how `arrowspace` builds and queries graph structure from high‑dimensional embedding up to 10⁵ items and 10³ features.

    The Laplacian computation now:
  • condenses data with clustering and density‑aware sampling,
  • projects dimensionality proportionally to the problem size (centroids) and keeps queries consistent with that projection, and
  • sparsifies the graph with a fast spectral method to preserve structure while slashing cost.

Read more →

Three Improvements That Opens up to Graph-Based Spectral Analysis

`ArrowSpace` has evolved with three critical enhancements that improve both performance and analytical capabilities for high-dimensional data processing. These improvements address fundamental challenges in graph construction, data scaling, and computational efficiency—delivering measurable gains that matter to production systems

Read more →

The Next Evolution in AI Memory: Energy-Informed Vector Search

Vector databases have become the backbone of modern AI workflows, particularly in RAG systems. But traditional approaches are fundamentally limited—they miss the deeper structural patterns that define how information relates within domains. Discover how ArrowSpace introduces energy-informed indexing through taumode, enabling AI systems with memory that truly understands domain contexts through spectral signatures and graph Laplacian energy.

Read more →