Wikimedia Foundation Annual Plan/2023-2024/Draft/External Trends/Community call notes

Challenges

Erosion of trust in written knowledge

Most worried not about how we can use AI ourselves, but having the Internet flooded with AI content – how will that change the relationship between readers and our projects?
Our movement comes from a tradition of boiling down things into text – takes more time to create than consume. But with AI, that’s reversed. Response from knowledge seekers might be to rely on signals to glean which text has human authorship. This might exacerbate the trend of young people turning to video to gain information, because it becomes harder & harder to trust the written word.
What I see among young people (such as my kids) is that video-based creators / educators / commentators — especially ones that are building social communities around their work — are the main starting point for trying to figure out what to trust.

Erosion of the Wikimedia movement’s model for producing high-quality content

Wiki projects have a huge amount of high quality content (reliable, structured, categorized). This is the value we bring to the world. What scares me the most is not that people use something like GPT to get knowledge, but that there’ll be an overflow of AI-generated content submitted to new page patrol. If that happens, it might damage the thing of value we have (large amount of high-quality content).
OpenAI provides a model where you take ChatGPT output and submit corrections. This introduces a vulnerability. E.g., a corporation could submit “corrections” that are biased toward their product and add that content en masse to Wikipedia. Even if most of these contributions are caught and undone by our community, we could still end up with biased content. Trying to catch that pattern is very difficult (because editing trends are very spiky), so it’s a huge risk.
Do we have trust that most people won’t just create large amounts of text using AI to present themselves as great content contributors? That they’d be as diligent using AI tools as if they were writing this themselves?
We shouldn’t care if something was written by a bot – but we should care that it’s factually correct, follows norms of the wiki, follows citation rules. Question is, how do we check to make sure that’s true in an automated way? How do automated tools do this with different projects/languages’ norms & policies? We can experiment to see if it’s possible, but it might not be solvable with technology – we will still need to rely on human judgment.
Concerned about how LLMs could be used to generate fictitious content at scale on Wikipedia
Ability to disrupt our projects requires very little technical know-how – the main protection from this in the past has been lack of will. Historically, disruption has been easy to identify. LLMs might remove the ability to identify large-scale abuse, especially when it’s productionized and automated.
Speed/efficiency with which AI can produce content could overrun our projects’ ability to function.
ChatGPT and tools like them will automatically make citation checking more complicated as there will be much greater amount of content to sort through.

Potential for creation/dissemination of harmful misinformation

Very concerned and afraid about what’s happening, especially what ChatGPT does. It behaves as if it’s a human, but it doesn’t take much to make it spit out nonsense. It can’t tell you where its knowledge comes from. I have to do my own research to figure out if what it's saying is true.
Trying to understand why people are excited about this. We’re putting people’s lives in the hands of this technology – e.g. people might ask this technology for medical advice, it may be wrong, and people will die.

Bias in training data amplifying bias in output

We already have a developed language bias. How might that be exacerbated by these tools?
Need to understand what the sources/training data are that trained the tool to produce content. Was it only English-speaking authors?

Need perspective of smaller communities, non-EN projects to measure success of AI outputs. Need human validation system to measure bias from English development.

Another worry: which data are models getting? Because we have a content bias, we’ll also have a generated content bias.

Intellectual property & copyright concerns

Already there are artists who are in court against AI generative art. Argument is that you need to have consent to use training data.

Some voices say AI art can go against copyright – already very radical counterpoint to that (Felix Reda for Pirate Party in the EU argued that AI can’t have IP rights)

Surveillance concerns

There are arguments against using CC licenses for pictures of people at all. Anything that depicts a human being can and may be used for surveillance. CC licenses are legally weak to protect vulnerable people from ending up in these DBs. Whole point of knowledge equity – cf Wikimania Capetown presentation/Coded Bias – is to protect vulnerable people.

Creative Commons and the Face Recognition Problem https://adam.harvey.studio/creative-commons

Reliance on opaque proprietary/closed-source models

LLMs are opaque, covered by trade secrets, not in line with our mission.

Opportunities

Increased importance/reliance on our content to establish ground truth

What happens when there’s tons of AI-generated content? Could imagine an Internet of trillions of pages of low quality/trustworthiness. When that is the case, both people who create LLMs and readers will need heuristics/hints for where to go to find reliable information – they might go to Wikipedia more. Maybe they’ll use an LLM that only grabs from Wikipedia, or go directly to projects – but they’ll use our content more & the work of our community will be even more valuable.

Content generation

Wikipedia text articles

Started first WP article using ChatGPT (Artwork Title). There’s a proposal page on English Wikipedia and probably others for an LLM policy – some progress there on how LLMs should and shouldn’t be used, who should use them (i.e. probably not newbies, not without a human in the loop similar to current machine translation policies, etc.)
Some of us have been experimenting on Wiki Spore to post AI generated articles there and think about how to grade and develop them over time. There’s a template for that.

Images

Notes from a German Wikipedia meeting: https://en.wikipedia.org/wiki/User:Ziko/AI_images_and_German_Wikipedia Saw a lot of negativity in the discussion – copyright, personality rights, and whether reality is presented in the way we want in an encyclopedia (e.g. photography is preferred over other kinds of illustrations).

But the rule on English Wikpedia is that photos are not always preferred over drawing. Considering this, other types of illustrations, and the Wiki Unseen projects – and speed at which technology & legal/regulatory space is evolving – maybe we (communities) should wait & see and stay open.

NVIDIA just started a project partnering with artists and Getty Images. Want to create a system where there’s no problem with images & copyright because it’s opt-in. Push from certain corporations to do the right thing – create materials w/permission of authors.
Can use it to create images for mythology, other things that don’t have images – could make a difference on how we allow readers to interact with knowledge.
The use of AI images in Wikibooks could be really useful

Multimedia

Realistic AI-generated explainer videos are not far off perhaps

Content summarization

There are other places where we’re using AI besides translation: e.g. children’s version of Wikipedia. AI is great at summarizing for children. Take a long & technical article and ask it to make childrens’ version and it works very well.

https://eu.wikipedia.org/wiki/Txikipedia:Energiaren_kontserbazioaren_printzipioa

Translation

As a teacher, I have been using AI generative tools in the classroom for a while. Wanted to have an Earth Day translate-a-thon: decided that this year it would be assisted by ChatGPT. (Haven’t done it yet but planning on Spanish first)
Translation is the one thing we can do without worrying about copyright and other concerns. Seems like a more straightforward opportunity

GPT-3 is quite good, 4 likely to be better. Idea is to have humans vetting the articles translated.

I think that (ChatGPT) is amazingly good with small languages which aren't specifically trained to it.
GPT-4 works in 25 languages more fluently than GPT-3 worked in english

Internal search

Searching in Google already provides summarized knowledge from Wikipedia. This is great for Google but maybe not for us because it likely reduces site visits to Wikipedia and browsing/additional page exploration. Can we explore how to use AI to improve internal search so as not to lose visits?

Technical task simplification for technical and non-technical contributors

Important way of contributing to our movement is not just individuals but institutions that contribute on a larger scale. We have issues with supporting these large at-scale contributions. We have a coding gap – not many people who can build and maintain tools, or even create smaller scale automations. Human factor is super important, but I do see opportunities from LLMs to help non-technical and technical contributors to develop better tools, software, scripts.
Super interested in exploring this at hackathons & other venues with other volunteers.
- See https://phabricator.wikimedia.org/T333127 for a proposed Hackathon session. KHarlan (WMF) (talk) 08:45, 27 March 2023 (UTC)[reply]
Would love to see a collaboration between volunteers and developers to strengthen the community we already have using AI.
I’ve trained AI to make SPARQL queries on Wikidata. Was initially very bad but now much better, saves lots of time for queries.

Templates: GPT-4 in Bing, if you put in a URL, it can generate a reference template as wiki markup

Expecting WMF to create tools based on these use-cases
I think that there are very good possibilities to convert different things (such as category names) to Wikidata properties and values. It is good for generating SPARQL-queries from text.
Reconciliation for Wikidata items can be a really interesting use case for LLM
There's so much messy data on all the projects that is really hard right now to find. LLMs could be incredibly useful to solve this.
I tried to use ChatGPT text chat for converting Commons categories to wikidata property-value pairs.
Experimenting with ways of writing database queries in regular language to create better interface with tools we have

Text-to-speech and speech-to-text

Audio generation and interpretation – working for last couple days with Whisper (OpenAI trained model that does speech to text translation). Loaded Dutch podcast and was seeing English transcriptions of audio – model can both detect speech & translate
Opportunity for accessibility, capturing sources only available in audio (e.g. podcasts - right now hard to use those as references, but may be able to link to specific parts/text), and immediate translation could all be useful for our movement
Automated audio transcriptions of video and audio on Commons

Improved detection of harmful content

Can LLMs improve ORES accuracy?
With undisclosed paid editing, organizations are trying to manipulate how the world thinks about some topic by submitting convincing but wrong text. Lot of overlap between identifying persuasive but wrong LLM outputs and paid editing (which is a long-term pain point). We don’t have ways to measure paid editing but suspect it’s increasing year over year. Rise of LLMs may give us tool for doing better large-scale analysis of this problem.
We could massively expand ORES to look not just at individual contributions but patterns/texture of contributions overall to help us monitor and reduce large-scale violations of our terms of use like undisclosed paid editing.

Creating/fostering an LLM movement in line with our movement’s values

Don’t see a viable open LLM movement – maybe something for a new WM sister project? Could we upload free LLMs to Commons?
Meta decided to release its own LLM and then immediately Stanford built a FOSS version of it (with a low cost of $600). Very soon might be able to have our own Wiki GPT that is ours, modeled to what we need. May not need to depend on these large corporations.
Lot of hype – we should make our voice heard, recognize that we are a provider of the data in these LLMs. While they can be wrong, WM projects are the place where knowledge that is true (whatever that means) can be used directly or as structured data to input into external sources. We want a world in which knowledge is created by humans.
One of the most important things about us having these conversations is to create different center/approach to ML than the more commercial options. Can draw on lessons learned from them, but we want a Wikipedia/Wikidata/Wikimedia perspective that matches our values and our own perspective
Translation: The Content Translation tool we already have is really amazing. Recently switched to NLLB-200 – an open source model that makes it easier to translate, works on smaller languages. Think we should host that model and give it out to translate anything we want.

Hosting our own LLMs: definitely approach we should do. Lots of FOSS models are already being developed. BLOOM is one – could open this up to the community for free.

In an ideal world, the Foundation would start internal projects to replicate ROME and RARR.

Open questions

If/when there is community agreement to use AI to generate content, can we bring in language models to check the output? Is it possible for AIs to vet each other?
How does this relate to Abstract Wikipedia? (Answer: Abstract Wiki team working on a blog post to summarize this – stay tuned!)
Tested putting in a proposal for Wikidata and asked it to create a new proposal. This technology can mimic comments of people, can be used in a discussion – but should this be allowed? We need more guidelines on what we can/can’t do.
When do we slow/stop the use of AI because it has potential to fundamentally disrupt what’s going well on Wikipedia, while the benefit (getting incremental content) may be marginal?
We’ve talked about bias in LLM output, but LLMs can be tuned to produce content with a specific bias. Challenge of trying to detect those biases, potentially hundreds/thousands, to see if contributions are trying to shift articles or wikis in a particular direction. How do you do that? What are the signals we can use?
Does ChatGPT create a moment for regulatory bodies in the EU to put the brakes on things or because of huge potential, will companies be lobbying to leverage these technologies?
There seems to be a bubbling debate about whether or not Wikipedia should accept AI-generated content, but I think the bigger problem is how does Wikipedia even identify such content because almost certainly someone will try to submit it at some point.
Is there any (big) open source generic LLM (ie. ChatGPT style which can translate text, summarize, answer questions) working on which would use Wikimedia content?

Other comments/thoughts

As a movement, we’ve been using AI in many ways for a long time (machine translation, bots, etc.). We should think of new AI tools as more of a continuum.
Things are changing fast – technology has evolved tremendously in just the last 6 months.
There’s a split in the community over whether or not to use these tools to generate content. Some people see a lot of interesting things w/generating/summarizing content, but some are very wary. As WMF, we could do things like build tools to make it easy to identify bot-generated content, but worry that chat-generated content has to be balanced with a lot of human review. Could overwhelm a wiki (especially smaller ones) with bad content.
Some longtime Wikimedians have a rigid vision of what the projects are and are not accepting of any new vision, which makes it hard to discuss this topic. For example, the conversation around whether to include structured data on Commons was very heated and long – can imagine similar things will happen with AI discussion.
Need to have more meetings – ideally in-person – to discuss this topic and to have good brainstorming around problems and potential solutions.
Should discuss this at upcoming Wikimedia Hackathon and Wikimania.
- See https://phabricator.wikimedia.org/T333127 for a proposed Hackathon session. KHarlan (WMF) (talk) 08:45, 27 March 2023 (UTC)[reply]
Wikipedia is about humans coming together to try to define the truth. These tools are unreliable and are a distraction from our actual mission. We should be careful how fast we try to follow this trend just not not miss out on it. We should focus on people producing knowledge.
Would be very helpful if the Foundation asked LLM producers who train on CC-BY-SA content to acknowledge those sources and disclaim accuracy and fidelity until robust attribution and verification becomes available. (Bing and Bard try now but do not do a good job.)
English Wikipedia’s block of machine-generated translation needs a re-visit. The guideline that says “a machine translation is worse than no translation” is about a decade out of date, and has not been updated.
The Foundation should make a public statement in support of increasing the accuracy of attribution and verification systems such as RARR [ https://arxiv.org/abs/2210.08726 ]
We should look into open source AIs like Open Assistant. That would solve the problem of opacity.

Links

WMF Legal team’s resource for communities thinking about copyright implications of generative AI: https://meta.wikimedia.org/wiki/Wikilegal/Copyright_Analysis_of_ChatGPT
https://en.wikipedia.org/wiki/User:Ziko/AI_images_and_German_Wikipedia
https://commons.wikimedia.org/wiki/Commons:AI_generated_media
https://www.openml.fyi/author/luis/
https://adam.harvey.studio/creative-commons
https://www.knowledgerights21.org/news-story/open-science-ai-a-uk-policy-discussion/
https://shows.acast.com/wikimove/episodes/abstract-wikipedia