This pilot is built as a fully local AI system, where all processing takes place within the museum’s own infrastructure.
The goal is to maintain control over data, reduce external dependencies, and test whether smaller models can deliver sufficient quality.
This represents a clear shift from earlier prototypes, where external AI services were used for image analysis and embeddings.
Why we use local AI models
The system is designed to run without external AI services.
This approach is based on three priorities:
- data control
Images and metadata remain on the museum’s infrastructure - predictability
The system behaves consistently and is not affected by external changes - resource efficiency
Smaller models reduce computational cost and energy use
This allows the museum to evaluate AI use in a controlled and sustainable way.
Vision–language model
The system uses a locally hosted vision–language model:
- Qwen 3.5 (9B)
The model analyses images and generates textual descriptions of visible content. These descriptions focus on:
- objects and motifs
- composition
- colours and visual structure
This replaces the need to send images to external services for analysis.
Embedding model
All text is converted into embeddings using a local model:
- BAAI BGE-M3
Embeddings represent meaning as numerical vectors.
This enables the system to compare content based on similarity rather than exact wording, which is central to semantic search.
The same process is applied to:
- generated descriptions
- existing metadata
- user queries
System architecture
The system is implemented as a modular pipeline.
Main components:
- image processing (vision model)
- text processing (embedding model)
- vector storage and retrieval
- search interface
All components run within the museum’s infrastructure and are connected through the museum’s own API.
Own API and integration
The system uses the museum’s own API to:
- access collection data
- process images and metadata
- serve search results
This allows full control over how data is structured, processed, and exposed to users.
It also makes it possible to integrate semantic search with existing systems over time.
Model selection and trade-offs
A significant part of the work has been to identify models that are:
- small enough to run locally
- large enough to provide useful results
This involves trade-offs between:
- quality
- speed
- resource use
The pilot tests whether this balance is sufficient for real use in the collection.
What this enables
This technical approach makes it possible to:
- run semantic search without external dependencies
- process images without sending them outside the museum
- maintain control over data and infrastructure
- experiment with AI in a controlled environment
Related pages
- → How semantic search works
- → Responsible use of AI
- → About the project