Technology and local AI in the collection

This pilot is built as a fully local AI system, where all processing takes place within the museum’s own infrastructure.
The goal is to maintain control over data, reduce external dependencies, and test whether smaller models can deliver sufficient quality.

This represents a clear shift from earlier prototypes, where external AI services were used for image analysis and embeddings.


Why we use local AI models

The system is designed to run without external AI services.

This approach is based on three priorities:

  • data control
    Images and metadata remain on the museum’s infrastructure
  • predictability
    The system behaves consistently and is not affected by external changes
  • resource efficiency
    Smaller models reduce computational cost and energy use

This allows the museum to evaluate AI use in a controlled and sustainable way.


Vision–language model

The system uses a locally hosted vision–language model:

  • Qwen 3.5 (9B)

The model analyses images and generates textual descriptions of visible content. These descriptions focus on:

  • objects and motifs
  • composition
  • colours and visual structure

This replaces the need to send images to external services for analysis.


Embedding model

All text is converted into embeddings using a local model:

  • BAAI BGE-M3

Embeddings represent meaning as numerical vectors.
This enables the system to compare content based on similarity rather than exact wording, which is central to semantic search.

The same process is applied to:

  • generated descriptions
  • existing metadata
  • user queries

System architecture

The system is implemented as a modular pipeline.

Main components:

  • image processing (vision model)
  • text processing (embedding model)
  • vector storage and retrieval
  • search interface

All components run within the museum’s infrastructure and are connected through the museum’s own API.


Own API and integration

The system uses the museum’s own API to:

  • access collection data
  • process images and metadata
  • serve search results

This allows full control over how data is structured, processed, and exposed to users.

It also makes it possible to integrate semantic search with existing systems over time.


Model selection and trade-offs

A significant part of the work has been to identify models that are:

  • small enough to run locally
  • large enough to provide useful results

This involves trade-offs between:

  • quality
  • speed
  • resource use

The pilot tests whether this balance is sufficient for real use in the collection.


What this enables

This technical approach makes it possible to:

  • run semantic search without external dependencies
  • process images without sending them outside the museum
  • maintain control over data and infrastructure
  • experiment with AI in a controlled environment

Related pages

  • → How semantic search works
  • → Responsible use of AI
  • → About the project