Condense, Project, and Sparsify

You can find arrowspace in the:

Road for arrowspace to scale: this release v0.13.3 rethinks how arrowspace builds and queries graph structure from high‑dimensional embedding up to 10⁵ items and 10³ features. The Laplacian computation now: 1) condenses data with clustering and density‑aware sampling, 2) projects dimensionality proportionally to the problem size (centroids) and keeps queries consistent with that projection, and 3) sparsifies the graph with a fast spectral method to preserve structure while slashing cost.

1) Condense the data first: clustering + density‑adaptive sampling

Building a Laplacian directly on millions of items and thousands of features is expensive; most structure can be captured by a compact set of representatives.

Practical notes:

2) Dimensionality Reduction: project to the problem size

After condensing to C centroids, distance preservation only needs to hold between those C points; the Johnson–Lindenstrauss (JL) lemma gives a principled target for the projected dimension r.

Practical notes:

3) Make graphs lighter: SF‑GRASS spectral sparsification

Even after condensing and projecting, dense graphs and Laplacians can still be costly. A simplified, fast spectral sparsification (SF‑GRASS) step prunes edges while approximately preserving the spectrum that matters for diffusion, λ computation, and spectral similarity.

Practical notes:


Usage at a glance

Migration guide

Why this works well together

New Metrics

Here the new metrics for these improvements, let’s have a reference about what they display. These are all partial and indicative of trends are they are computed on a limited dataset (NxF: 3000x384) of synthetic data.

Figure 1: Similarity metrics comparison

top-left: Ranking Metrics

As one of the main objective of arrowspace is to spot alternative pathways for vector similarities, I use different metrics to compare the new measurements to current implementations. Jaccard metric is quite limited for this purpose as it just spots the perfect overlapping between cosine similarities and arrowspace similarities; for widening the landscape I have introduced in the visualisation NDCG, MAP and MRR. You can see how they compare on the usual scale from alpha=0.0 to alpha=1.0 (the weigth of cosine similarity in the taumode computation). What I read in this is that taumode is still equivalent to cosine similarity between alpha=0.6 and alpha=0.8 while making results more interesting as tails may contain alternative pathways.

A bried summary here for you about these metrics:

A matrix about the characteristics of these metrics:

Feature NDCG MAP MRR
Relevance Type Binary or graded Binary only Binary only
Scope Full ranked list All relevant items First relevant only
Discount Logarithmic Precision-based None (single position)
Interpretability Low (complex formula) Medium (area under PR curve) High (average rank)
Use Case Multi-item recommendations Top-K search results Single-answer search
Sensitivity Balanced across ranks Top-heavy First position only

bottom-right: Metric Performance Heatmap

I leave the reading of the other two to you. The bottom-right diagram is interesting as it displays how taumode starts to be reliable from alpha=0.6 on all the metrics used.

Next steps include but they are not limited to:

Interested in learning more? Whether you’re evaluating ArrowSpace for your data infrastructure, considering sponsorship, or want to discuss integration strategies, please check the Contact page.

Please consider sponsoring my research and improve your company’s understanding of LLMs and vector databases.

Book a call on Calendly to discuss how ArrowSpace can accelerate discovery in your analysis and storage workflows, or discussing these improvements.