| Title: | Lightweight Vector Embeddings for the Tidyverse |
|---|---|
| Description: | A lightweight vector database for storing and querying embeddings within the tidyverse framework. Supports multimodal (text and image) embeddings, nearest neighbor search, and visualization, all with a tidyverse-friendly API. |
| Authors: | Nicolas Gauthier [aut, cre] |
| Maintainer: | Nicolas Gauthier <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-14 05:51:32 UTC |
| Source: | https://github.com/flmnh-ai/tidyvec |
Similarity operator
a %~% ba %~% b
a |
First vector or tidyvec object |
b |
Second vector or tidyvec object |
method |
Similarity method to use ("cosine", "euclidean", or "dot") |
Similarity score or tidyvec object with similarity scores
Cluster embeddings using k-means
cluster_embeddings(x, n_clusters = 5, cluster_column = "cluster")cluster_embeddings(x, n_clusters = 5, cluster_column = "cluster")
x |
A tidyvec object |
n_clusters |
Number of clusters |
cluster_column |
Name for the cluster assignment column (default: "cluster") |
tidyvec object with cluster assignments added
Compute embeddings for items in a tidyvec collection
embed(x, content_column, embedding_fn = NULL, force = FALSE, ...)embed(x, content_column, embedding_fn = NULL, force = FALSE, ...)
x |
A tidyvec object |
content_column |
Column containing content to embed |
embedding_fn |
Embedding function to use (overrides collection's function) |
force |
Whether to overwrite existing embeddings |
... |
Additional arguments passed to the embedding function |
Updated tidyvec object with embeddings
Create a HuggingFace embedding function
embedder_hf( model_name, modality = c("multimodal", "text", "image"), device = "cpu", cache_dir = NULL )embedder_hf( model_name, modality = c("multimodal", "text", "image"), device = "cpu", cache_dir = NULL )
model_name |
Name of the HuggingFace model |
modality |
Type of model ("text", "image", or "multimodal") |
device |
Device to use ("cpu", "cuda", or "mps") |
cache_dir |
Optional directory for caching model files |
An embedding function
Create a simple TF-IDF embedding function for text
embedder_tfidf(corpus, min_freq = 2)embedder_tfidf(corpus, min_freq = 2)
corpus |
Text corpus to build vocabulary |
min_freq |
Minimum term frequency |
An embedding function
Print details about a tidyvec collection
inspect_collection(x)inspect_collection(x)
x |
A tidyvec object |
Invisibly returns the input object
Find nearest neighbors for a query in a tidyvec collection
nearest( x, query, n = 5, as_embedding = FALSE, method = c("cosine", "euclidean", "dot"), min_score = 0, keyword_weight = 0, keyword_column = NULL )nearest( x, query, n = 5, as_embedding = FALSE, method = c("cosine", "euclidean", "dot"), min_score = 0, keyword_weight = 0, keyword_column = NULL )
x |
A tidyvec object |
query |
Query item (content or embedding) |
n |
Number of results to return |
as_embedding |
Whether the query is already an embedding vector |
method |
Similarity method ("cosine", "euclidean", "dot") |
min_score |
Minimum similarity score |
keyword_weight |
Weight for keyword matching (0-1, default 0 for pure vector search) |
keyword_column |
Column to search for keywords (required if keyword_weight > 0) |
Filtered tidyvec object with similarity scores
Read a tidyvec collection from disk
read_vec(file)read_vec(file)
file |
Path to tidyvec collection file |
A tidyvec object
Creates a vector collection from a data frame or tibble
vec(x, embedding_column = "embedding", embedding_fn = NULL)vec(x, embedding_column = "embedding", embedding_fn = NULL)
x |
A data frame or tibble |
embedding_column |
Name of column containing embeddings (or to be created) |
embedding_fn |
Function to generate embeddings (optional) |
A tidyvec object
Visualize embedding space using dimensionality reduction
viz_embeddings( x, method = c("umap", "tsne", "pca"), labels = NULL, color = NULL, n_neighbors = 15, perplexity = 30, images_column = NULL, ... )viz_embeddings( x, method = c("umap", "tsne", "pca"), labels = NULL, color = NULL, n_neighbors = 15, perplexity = 30, images_column = NULL, ... )
x |
A tidyvec object |
method |
Dimensionality reduction method ("tsne", "umap", "pca") |
labels |
Column to use for point labels |
color |
Column to use for point colors |
n_neighbors |
Number of neighbors (for UMAP) |
perplexity |
Perplexity parameter (for t-SNE) |
images_column |
Optional column containing image paths to use instead of points |
... |
Additional arguments passed to the plotting function |
A ggplot object
Visualize images in a tidyvec collection using magick
viz_images( x, path_column, n = NULL, ncol = 3, width = 200, include_similarity = TRUE, label_columns = NULL )viz_images( x, path_column, n = NULL, ncol = 3, width = 200, include_similarity = TRUE, label_columns = NULL )
x |
A tidyvec object containing images |
path_column |
Column containing image paths or URLs |
n |
Number of images to display (default: all) |
ncol |
Number of columns in the grid (default: 3) |
width |
Width of each image in pixels (default: 200) |
include_similarity |
Whether to show similarity scores if available (default: TRUE) |
label_columns |
Additional columns to use as labels |
A magick image object
Save a tidyvec collection to disk
write_vec(x, file)write_vec(x, file)
x |
A tidyvec object |
file |
Path to save file (recommended: .qs extension) |
Invisibly returns the input object