We are prototyping a new semantic search feature for Nasjonalmuseet’s online collection. This search method focuses on context and meaning within the image, and offers a different approach to exploring the objects in our collection from the conventional method of matching the user’s query against terms in our collection management system. In this prototyping stage we are testing the semantic search on approximately 6,000 objects. Your feedback will be of great value as we are refining this prototype. Join us in enhancing the way art is discovered and experienced digitally at Nasjonalmuseet.
In-Depth Understanding of Semantic Search in Art
Our prototype utilises semantic search, transcending traditional metadata keyword search by interpreting the context and meaning behind the visitors’ queries. We use OpenAI’s GPT-4 Vision API, which analyses digital images of collection objects, and feeds back detailed, descriptive text about each object. This text is much richer than the metadata from our collection management system, capturing pictorial content and motifs as well as subtleties like themes, emotions, and narratives in art.
The prototype so far allows search within 6,000 images of which the majority are from the Fine Art collections, but also a great number from the Design and Architecture collections. We are indexing more work continuously.
Advanced Implementation: From Vision API to MongoDB Atlas Vector Search
Our process begins with extracting detailed descriptions from the GPT-4 Vision API. We then transform these descriptions into numerical embeddings using OpenAI’s Embeddings API. These embeddings, which encapsulate the core characteristics of the object, are stored in a MongoDB Atlas database. Here, we employ Atlas’s Vector Search feature, a robust tool that merges operational database functionality with vector search, enabling us to efficiently handle and query these complex data representations.
To facilitate user interaction, we present the artworks on a webpage equipped with a search field. Here, users can enter queries, which are also converted into embeddings. Utilizing MongoDB Atlas’s Vector Search with a K-nearest neighbors (KNN) algorithm, we can efficiently match user queries with the most relevant works in our collection.
This system offers a more intuitive, comprehensive, and technically sophisticated approach to art discovery, allowing users to make connections with works in ways never before possible.
Search Query Interpretation
The user can input any query which will be interpreted by OpenAI’s Embeddings API. We encourage the users to try out different queries, from description of visual motifs (e.g. “red house”), abstractions, metaphors and emotions (e.g. “Man’s best friend“, “translucent material“, “love“, “fear” and “companionship“. You can also try longer queries such as “a man who has trouble walking“.
As the user’s query is interpreted by the Embeddings API, the query can be in any language. The API will provide vector representations of text that capture semantic relationships between words and phrases, and is language agnostic in a similar way to ChatGPT. Thus, searching for the words “datter” (Norwegian), “hija” (Spanish), “بنت” (Arabic), “娘” (Japanese) will return similar but not necessarily identical search results as a search for “daughter”. This is a powerful feature with potential for greater international outreach for instance for collections that have limited metadata in foreign languages. On the other hand it might have its pitfalls for instance for words that have conflicting meanings in different languages, or words that are profane in one language but not another.
Interestingly, the languages also extends to the language of emojis. Try for instance a search with the snowflake emoji ❄️!
Beta Testing – Deep Dive into Evaluation and Optimisation
This prototyping phase is crucial for shaping the feature’s effectiveness. We have implemented a system where each query always displays 30 artworks, sorted by relevance. Accompanying each artwork is its relevance score, a low-resolution thumbnail, and a URL linking to the work in our online collection.
This setup is designed to evaluate the optimal threshold for relevance in search results. User feedback is vital in this process. We encourage visitors to:
- use the link under each object returned in the search result to report an object that you consider less or not relevant to your query
- use the feedback form at the bottom of the page to share your thoughts on how the semantic search options works for you.
These two types of feedback will help us to define where the relevance cutoff should be set, as well as provide us with other information that can guide us in fine-tuning the search algorithm to better match user expectations.
Currently, the query process has not been optimized for speed. We prioritise accuracy and relevance of search results over response time. However, we acknowledge the importance of a swift search experience and plan to address this in future updates.
A significant challenge we are encountering is the extraction of accurate descriptions for some artworks, particularly figure studies. The OpenAI API, which we utilize for this purpose, often categorizes these images as nudity or sexual activity due to its content sensitivity guidelines. This situation illustrates a critical drawback of relying on commercial AI solutions, which may not always align with the nuanced needs of cultural heritage.
Next Steps: Enhancing User Experience and Data-Driven Refinement
As we progress with the development of our semantic search feature, our focus shifts to expanding user engagement and refining the tool through data analysis. We intend to increase user participation in the beta testing by outreach to the museum community and forging collaborations with art and cultural heritage groups.
In parallel, we will be analysing the feedback from users in-depth, particularly the threshold value for relevance. This analysis is important in identifying user preferences, and pinpointing areas that require enhancement. These insights will help us refine the search functionality, making it more intuitive, user-friendly, and responsive to the specific needs of our audience.
Another critical aspect of our ongoing work is the optimisation of the search tool’s performance. Based on user experiences and technical evaluations, we will focus on enhancing the speed and efficiency of the search function. This optimization is key to improving the overall user experience, ensuring that our tool is not only accurate in its search results but also fast and reliable.
We also acknowledge the challenges posed by using AI provided by large tech firms like OpenAI that might interpret artworks, particularly concerning content sensitivity and social biases. We have been in contact with OpenAI, who recognizes the issues in interpreting art through AI, but currently, they do not offer a tailored solution for these challenges. This situation underscores the importance of being aware of these limitations as we utilize these technologies in our work.
Looking towards the future, we are considering the integration of open-source models into our semantic search system. This move aims to reduce our reliance on external online APIs, granting us more autonomy and flexibility in the operation of the search function. Embracing open-source technologies will allow us to tailor the tool more precisely to the unique requirements of our collection and users, with a view to build a more robust and self-sufficient digital art exploration platform.
Through these initiatives, we aim to not only meet but exceed the expectations of our users, enriching their digital interaction with our art collection and lifting the bar for how digital art can be explored.