Technology and local AI in the collection

This pilot is built as a fully local AI system, where all processing takes place within the museum’s own infrastructure.
The goal is to maintain control over data, reduce external dependencies, and test whether smaller models can deliver sufficient quality.

This represents a clear shift from earlier prototypes, where external AI services were used for image analysis and embeddings.

Why we use local AI models

The system is designed to run without external AI services.

This approach is based on three priorities:

data control
Images and metadata remain on the museum’s infrastructure
predictability
The system behaves consistently and is not affected by external changes
resource efficiency
Smaller models reduce computational cost and energy use

This allows the museum to evaluate AI use in a controlled and sustainable way.

Vision–language model

The system uses a locally hosted vision–language model:

Qwen 3.5 (9B)

The model analyses images and generates textual descriptions of visible content. These descriptions focus on:

objects and motifs
composition
colours and visual structure

This replaces the need to send images to external services for analysis.

Embedding model

All text is converted into embeddings using a local model:

BAAI BGE-M3

Embeddings represent meaning as numerical vectors.
This enables the system to compare content based on similarity rather than exact wording, which is central to semantic search.

The same process is applied to:

generated descriptions
existing metadata
user queries

System architecture

The system is implemented as a modular pipeline.

Main components:

image processing (vision model)
text processing (embedding model)
vector storage and retrieval
search interface

All components run within the museum’s infrastructure and are connected through the museum’s own API.

Own API and integration

The system uses the museum’s own API to:

access collection data
process images and metadata
serve search results

This allows full control over how data is structured, processed, and exposed to users.

It also makes it possible to integrate semantic search with existing systems over time.

Model selection and trade-offs

A significant part of the work has been to identify models that are:

small enough to run locally
large enough to provide useful results

This involves trade-offs between:

quality
speed
resource use

The pilot tests whether this balance is sufficient for real use in the collection.

What this enables

This technical approach makes it possible to:

run semantic search without external dependencies
process images without sending them outside the museum
maintain control over data and infrastructure
experiment with AI in a controlled environment

→ How semantic search works
→ Responsible use of AI
→ About the project