AI Sauna/Generate alt-texts for historical images
Generate alt-texts for historical images
[edit]Description
[edit]Can we make historical images searchable using AI, without relying (much) on the metadata, similar to the prototype by the National Museum of Norway? There are two image sets among the AI Sauna Resources as well as a vector database, and LUMI for running AI models, these could be combined into a prototype.
The team
[edit]What were the roles of each?
Created by: Osma Suominen
Team members: Main coding by Osma. Prompting and documentation by Mona Lehtinen, Harri Hihnala, Julia Isotalo, Lu Chen, Vertti Luostarinen
Process
[edit]ALT-text best practices
[edit]- Describe what can be seen in the picture, try to avoid biases and your own interpretations
- Use plain language
- Important info first
- "What is in the forefront / fore ground?"
- No use to start describing the background before the main characters
- Don't start with "image of" or "photo of"
- Don't include information that's already in the written image description or adjected text.
- Additional information, such as the name of the photographer, is not included in the ALT-text
- Always end with a dot.
- Include text within the image
- How to prompt this? If the text is hand-written and isn't easily readable, should write "photograph includes/has handwritten text"
- Understand if the subject is a widely known (with specific name, for example, not just "a large building" but "the central railway station")
- Can we get this information from the metadata for the AI?
- Consider the context of the page
Prompts:
[edit]This is an alt text description. What can be seen in the front? what can be seen in the back? Is the photo coloured or black and white? indicate in the description if there's text in the picture. Do not use words image or picture in the description. Don't count the amount of things.
Testing
[edit]Test with prompting on llava-13b model on replicate.com Batch processing with Jupyter Notebook on LUMI.
Results
[edit]Our method
[edit]The data set was pre-made for testing. We started by gathering best practices for alt-texts. Then we went to manually try the llava-13b -model on replicate.com. We chose pictures from the data set and prompted the model to generate an alt-text. We aimed to find a good prompt for this task and see the results. For batch processing the same task, a Jupyter Notebook was made and ran on LUMI.
Resources we used
[edit]Computational resources
[edit]Lumi supercomputer, Jupiter Notebook, replicate.com ...
Data set
[edit]The data set consists of 5947 old photographs (until 1917). It is from the collections of the Helsinki City Museum, obtained from the Finna.fi discovery service. The data set with a full description can be found on Hugging Face
Conclusion
[edit]In general, making a good prompt is important & can be difficult. As it is now, while the llm does work surprisingly well, it is not perfect. The setup cannot be trusted to provide human-quality alt-texts on its own so intervention is needed.
What next
[edit]Do you wish to continue exploring this? What was not covered? What did you get curious about?
How about tagging / indexing the alt-text produced? How to make the automatically generated alt-texts better?
Links, images, documentation
[edit]Upload at least one image to Wikimedia Commons for the image of the page banner.