Semantic search in the collection using local AI

This pilot tests how semantic search can improve access to the collection using locally hosted AI models.
The work builds on an earlier prototype, where the National Museum was an early adopter of AI-based search. In this version, we test a fully local approach using our own models and infrastructure.

The project is developed at the National Museum as part of ongoing work with digital innovation and artificial intelligence in collections, led by Tord Nilsen.

Why we test semantic search

The pilot evaluates whether semantic search provides more relevant access to the collection than traditional keyword-based search.

Traditional search depends on exact terms and predefined metadata. This limits discovery, especially for users who do not know how objects are described internally.

Semantic search focuses on meaning rather than exact wording. It allows users to search in natural language and retrieve results based on similarity.

We test:

whether relevance improves
whether search becomes easier to use
whether local AI models provide sufficient quality

What changed from version 1

This pilot introduces a different technical and strategic approach.

from external AI services to locally hosted models
from sending images out to processing them internally
from dependency on third-party infrastructure to full control

In the previous prototype, images were analysed using external services.
In this version, all processing takes place within the museum’s own infrastructure.

How semantic search works

The system connects images, text, and queries through a shared representation of meaning.

Images are processed with a locally hosted vision–language model, which generates textual descriptions of visible content. These descriptions, together with existing metadata, are converted into vector representations (embeddings).

When a user searches, the query is processed in the same way. The system compares vectors and retrieves results based on similarity.

Technology and local AI

The system is implemented as a fully local solution.

locally hosted vision–language model (Qwen 3.5, 9B)
locally hosted embedding model (BAAI BGE-M3)
processing of images and metadata within the museum’s infrastructure
use of the museum’s own API

No external AI services are used.

→ Read more about the technology

Responsible use of AI

The pilot follows defined principles for responsible AI use, including transparency, human control, inclusion, and data security.

All processing is done locally, and images and metadata are not used to train external models.

→ Read more about responsible use of AI

What we test in the pilot

The pilot evaluates:

relevance of search results
quality of generated descriptions
user understanding of how the system works
performance of local AI models

What we’re NOT testing

Due to the famous scope creep, we’re not testing

Combination of semantic and metadata search
onDisplay functionality
Filtering

What happens next

The pilot is limited in scope and used for evaluation.

Results will inform further development, integration with existing systems, and requirements for quality, governance, and infrastructure.

The aim is to improve access to the collection while maintaining control over data and technology.