Testing arrowspace on a new wave of embeddings

I ran new experiments on the CVE dataset to compare ArrowSpace search with two embedding backends: the original 384‑dimensional DenoisingAutoEncoder (test 15) and Perplexity’s newest 1024‑dimensional embeddings based on pretrained Qwen3 base models (0.6B parameters, “pplx‑embed”) (test 16). The goal is to see how ArrowSpace behaves as embedding dimensionality grows, and how much extra quality we actually gain relative to the cost in memory, training, and inference.

TLDR;

ArrowSpace is consistently better than plain cosine on highly semantic vector spaces, and its advantages remain coherent as embedding dimensionality grows.

Even when we upgrade to strong 1024‑dim embeddings, ArrowSpace’s spectral modes (hybrid/taumode) keep their edge on structure‑aware retrieval and tail behaviour, while the cost gap between 384 and 1024 dimensions stays significant.

Previous blog posts on testing ArrowSpace on CVE here.

Code and data:

1. Experimental setups (test 15 vs 16)

So the main difference is purely the base embedding model (384 vs 1024 dimensions); the ArrowSpace spectral layer and evaluation protocol are held constant.

2. Global quality metrics

From the per‑run summaries:

Interpretation: with 1024‑dim embeddings, cosine and hybrid rankings are much closer (NDCG ≈ 0.97 vs ≈ 0.75), meaning the raw embedding space is already better aligned and cosine leverages the added dimensionality. Taumode stays very close to hybrid in both settings (≈ 0.98 in both), so ArrowSpace preserves the neighbourhood structure across dimensions and continues to add structure on top of whatever the base embedding provides.

Fig.1 MRR-Top0 metric

Interpretation: moving to 1024‑dim roughly 3–4× increases topology‑weighted reciprocal rank for cosine, and about 2× for hybrid/taumode. The richer embedding clearly improves overall graph‑consistent ranking, but ArrowSpace remains consistently ahead of vanilla cosine in both regimes, with taumode ≄ hybrid ≄ cosine.

Net effect: the 1024‑dim model improves absolute retrieval quality and topology‑weighted ranking, but ArrowSpace + 384‑dim already yields a very coherent, stable ranking where taumode and hybrid nearly match, and ArrowSpace’s taumode remains the best of the three in both regimes. This is a strong confirmation of the strengths of Graph Wiring: ArrowSpace consistently improves on pure cosine, and that advantage is stable as you scale up embedding capacity.

Fig. 2 summary metrics for 384-dim and 1024-dim

Note on Fig.2: on average Test 16 performs better except for the average Tail/Head score.

3. Per‑query ranking behaviour

Looking at the detailed comparison metrics:

For test 16 (1024‑dim):

This supports the view that ArrowSpace’s spectral layer adds topology‑aware smoothing and robustness that is especially helpful on tricky natural‑language patterns, independent of base dimension. This happens thanks to the structural information carried by ArrowSpace’s Graph Laplacian, as demonstrated in the paper [“Epiplexity And Graph Wiring: An Empirical Study for the design of a generic algorithm”] (https://github.com/tuned-org-uk/graph-wiring-epiplexity/blob/main/paper/Epiplexity_A_measure_on_Graph_Wiring.pdf).

Fig. 3 metrics per-query

Note on Fig.3: the additional embeddings provide the additional information needed to make also query 14 to perform as the others with cosine (taumode still performs better also for this query in terms of head/tail topological quality).

Fig.4 spearman agreement between the different modes in test 15 and test 16

Note on Fig. 4: Additional embeddings makes cosine to agree more with taumode, hinting that taumode for real embeds structural information that was not present in the 384-dim embeddings; by tripling the number of dimensions the graph information can be partially encoded in plain embedding space. Why add so many dimensions for somenthing that can be encoded in a sparse matrix?

4. Tail behaviour and stability

Tail metrics (head_mean, tail_mean, tail_to_head_ratio, tail_cv, tail_decay_rate) show:

So in both dimensions, ArrowSpace spectral indexing is improving tail coherence and score regularity — better calibrated similarity across the top‑k list — which is exactly what you want for downstream re‑ranking, multi‑hop retrieval, and multi‑agent workflows.

5. Cost and efficiency implications

Given identical infrastructure and dataloader, increasing the embedding size from 384 to 1024 has these implications:

Because ArrowSpace’s spectral graph effectively concentrates structural information (epiplexity) into its λτ indices and topology factors, you recover much of the benefit of higher‑dimensional representations while working in a lower‑dimensional ambient space. In other words: you can let ArrowSpace encode graph structure instead of pushing ever more complexity into the embedding model.

Fig.5 cost factors in test 15 and test 16

6. Is the 1024‑dim gain worth it?

Putting the measurements and costs together:

So for a production system where infrastructure and GPU budget matter, you can reasonably say:

In more straightforward terms:

7. Computing costs

Using ArrowSpace with 384‑dim embeddings instead of 1024‑dim gives roughly a 2.5–3× advantage across compute, storage, and throughput for a typical medium‑scale workload, while preserving most of the retrieval quality thanks to the spectral index. Here some back-of-the-envelop computations about scaling costs.

Storage

Compute (training and indexing)

Net advantage on a “medium company” workload

For a realistic mid‑size deployment (millions of documents, tens of QPS, periodic re‑indexing), remembering that adding a document to ArrowSpace does not require recomputing the whole index but just computing a lambda‑score on the new vector:

Because ArrowSpace’s λτ index recovers much of the structural signal that a larger embedding would otherwise encode directly, these savings come with only a modest quality loss relative to using 1024‑dim embeddings without such spectral supervision — and ArrowSpace still maintains its advantage over plain cosine in both settings.

Storage example (100M docs)

Assume float32 embeddings (4 bytes per dimension), no extra overheads:

So moving from 1024 → 384 dims saves ≈ 238 GB of embedding storage on 100M documents, before indexing overhead and replicas.

Throughput and latency impact

Because most ArrowSpace hot‑path operations (cosine, λ‑aware scoring) scale linearly with dimension, 384‑dim vs 1024‑dim gives:

For “50 agent instances for 200 users per day”:

Cost and scaling intuition

Using 1024‑dim did make cosine similarity better (but still not as good as spectral indexing in terms of context‑awareness in the retrieved document space for all ranks) — but at what cost? Can ArrowSpace’s Laplacian be, in the majority of cases, a cheaper compressed proxy instead of tripling the number of dimensions? Spectral indexing is designed to handle higher‑dimensional spaces and does not lose retrieval quality when dimensions grow, but your infrastructure may find it harder to handle the more complex training and heavier inference phase at model level.

At 100M documents:

Throughput is the most problematic expense: storage and raw compute will probably keep getting cheaper, but the cost of serving ever higher dimensions at inference time is a rising concern. That is why the compression of structural information that ArrowSpace provides becomes critical for efficiency and cost savings. If the target scale is “web‑scale retrieval”, having similar or better structural performance using a fraction of the dimensions is a real comparative advantage — especially when ArrowSpace keeps outperforming cosine regardless of how big your embedding model becomes (this should be confirmed on the 4B parameters model but at that point costs moves really on a different order of magnitude).

Conclusions

Just for reference: for an average developer, fine‑tuning the embeddings for Test 15 (384‑dim) was done on a local laptop with a few hours of compute; Test 16 embeddings (the version used is the 0.6B parameters) required 20 GB of RAM for approximately the same amount of time, using a dedicated Colab environment with an A100 machine.

Increasing embeddings dimensions indeed improved the embeddings by providing part of the additional information that was missing from the 384-dim embeddings but at a given cost; so the team behind these embeddings indeed provided value for the cosine pipeline.

ArrowSpace’s Taumode still performs better on average in carrying topological information that is structural to the feature-space of the vector collection.

These numbers may improve even more for the 4B parameters version but the question is if the costs of scaling up so much are worth considering that better results are achieved on the 0.6B version by complementing the search with a few Kb artifact (Graph Laplacian) as ArrowSpace does. This is demonstrated formally in Information theoretical terms in the paper “Epiplexity And Graph Wiring: An Empirical Study for the design of a generic algorithm”.

Thanks for reading, please share and consider sponsoring my research.