Blog - AI Research Engineering

November 26, 2025

Safer LLMS require open search - Building the AI Memory Layer

AI safety through topology‑aware, energy‑informed retrieval that separates stable facts from risky intuitions.

Shows how geometry‑only vector search and semantic caching accumulate retrieval errors, turning context drift into subtle hallucinations.
Introduces arrowspace as an “open search” layer where graph Laplacians, energy dispersion, and topology‑quality scores expose and constrain off‑manifold results instead of hiding them inside black‑box similarity.

November 12, 2025

Why `arrowspace` is game-changing for data operations at scale

Test‑bed milestone for a unified vector, graph, and key‑value engine built on spectral indexing and energy‑informed search.

Turns any dataset into a features graph, enabling manifold‑aware search, matching, ranking, and dataset characterization at any lifecycle stage.
Designed for high dimensions by default: robust on biotech‑scale sequences, large vocabularies, and model‑sized embedding spaces.

Read more →

November 07, 2025

Efficient GPT training: a dive into the architecture of a Rust-powered GPT-2

Deep Dive into a Rust implementation of a decoder-only transformer inspired by Karpathy's nanochat.

Breaks down the architecture of a modern LLM, explaining the role of key components for an experienced audience.
Covers modern techniques such as Rotary Position Embeddings (RoPE), Multi-Query Attention (MQA), RMSNorm, and the use of a Squared ReLU in the MLP.

Read more →

October 29, 2025

ArrowSpace v0.21.0: Proof of Concept for Energy-Informed Context Search

Milestone release completes the search–matching–ranking pipeline with stabilized energymaps module, delivering spectral vector search that finds matches beyond geometric proximity.

Two complete build paths: eigenmaps (spectral indexing from Laplacians) and energymaps (pure energy-first with optical compression, diffusion-split subcentroids, and automatic λτ computation).
CVE corpus diffusion sweep (300K docs) achieves Avg MRR 0.75, NDCG@10 0.7239 (η=0.22, steps=8) with stable 75–83s build times, confirming negligible diffusion overhead and strong spectral ranking quality.

Read more →

October 24, 2025

DeepSeek-OCR Optical Compression Meets Energy Search: Rust Implementation in ArrowSpace v0.18.0

Rust implementation of DeepSeek-OCR compression achieves 10× token reduction, while ArrowSpace v0.18.0 introduces energy-informed retrieval that replaces cosine similarity with spectral graph properties.

DeepEncoder architecture (SAM + CLIP + projector) replicated in Rust using burn.dev with cross-platform GPU support and five resolution modes from 64 to 400 tokens.
Energy search with diffusion parameter sweep on CVE corpus achieves NDCG@10 ≈ 0.99 (η=0.05, steps=6) and MRR=1.0 (η=0.05, steps=4) without any cosine similarity.

Read more →

October 17, 2025

Fast (not approximate?) Nearest Neighbours

Version 0.16.0 is out with quite relevant news and encouraging results for `arrowspace` to be one of the fastest approximate nearest neighbours algorithm available in the open.

Read more →

October 13, 2025

taumode: Beyond Cosine Similarity on the CVE dataset

Evaluation on a CVE corpus spanning 1999 to 2025 shows spectral modes preserve head agreement with cosine while enhancing long‑tail relevance for analyst discovery.

Dataset loader sweeps years 1999 to 2025, generating 384‑D embeddings and shared candidate pools for cosine, hybrid, and taumode.
taumode achieves the highest Tail/Head ratio (≈0.9593) with the lowest tail variability across queries.

Read more →

October 10, 2025

Road for `arrowspace` to scale: Condense, Project, and Sparsify

This release rethinks how `arrowspace` builds and queries graph structure from high‑dimensional embedding up to 10⁵ items and 10³ features.

condenses data with clustering and density‑aware sampling,
projects dimensionality proportionally to the problem size (centroids) and keeps queries consistent with that projection, and
sparsifies the graph with a fast spectral method to preserve structure while slashing cost.

Read more →

October 6, 2025

Three Improvements That Opens up to Graph-Based Spectral Analysis

`ArrowSpace` has evolved with three critical enhancements that improve both performance and analytical capabilities for high-dimensional data processing. These improvements address fundamental challenges in graph construction, data scaling, and computational efficiency—delivering measurable gains that matter to production systems

Read more →

October 1, 2025

The Next Evolution in AI Memory: Energy-Informed Vector Search

Vector databases have become the backbone of modern AI workflows, particularly in RAG systems. But traditional approaches are fundamentally limited—they miss the deeper structural patterns that define how information relates within domains. Discover how ArrowSpace introduces energy-informed indexing through taumode, enabling AI systems with memory that truly understands domain contexts through spectral signatures and graph Laplacian energy.

Read more →

All the posts