Research:How LLMs impact knowledge production processes

Created

2024 Dec

Contact

Moyan Zhou

University of Minnesota

Collaborators

Loren Terveen

University of Minnesota

Soobin Cho

no affiliation

Duration: 2024-December – 2025-June

Research:Projects

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

The advent and increased accessibility of large language models (LLMs) has drastically transformed the landscape of content generation. The remarkable capabilities of LLMs introduce novel opportunities to enhance productivity, improve access to information, and lower barriers to participation. However, LLMs also pose risks, such as potentially introducing bias, hallucinating end-users, and causing over-reliance on AI-generated content.

In response to LLMs’ popularity, knowledge building platforms have adopted cautious approaches toward using LLMs. For example, Wikipedia strongly discourages using LLMs to content in a non-transparent way ^[1]. While Wikipedia regulates LLM usage through policies, we also observe the trend of community members continuing to incorporate LLMs into their contribution process. For example, AI detectors marked positive for more than 5% of recently created articles on English Wikipedia ^[2]. Members from Wikiproject: AI cleanup increasingly flagged pages with AI-generated templates ^[3].

However, the gap remains in understanding the challenges and impacts of incorporating LLMs into knowledge production workflow, and this gap motivates our work. Given the increased adoption of LLMs, it is critical to investigate how knowledge contributors use LLMs in practice, and the perceived impacts of their practices. We ask the research questions: (1) How do knowledge contributors make sense of and incorporate LLMs into their knowledge production workflows? (2) What do contributors perceive as the long-term impacts of LLM adoption on knowledge production? (3) How do knowledge contributors navigate and mitigate negative impacts or risks of LLMs? Through semi-structured interviews with Wikipedia editors who've interacted and adopted LLMs in their editing processes, we hope to make the following contributions:

Crystalizing knowledge contribution workflow and provide empirical evidence on how LLMs are used along the stages;
Surfacing the tensions between community expectations and actual editorial practices concerning LLM usage;
Discussing the broader implications of LLMs and offer design opportunities, centered around knowledge contributors;
Reveal the design needs for AI tools which can better align with collaborative tasks.

Methods

Participants will first complete a pre-study survey to determine their experience with LLMs in relation to their Wikipedia editing practices. Based on their responses, researchers will select participants for a semi-structured interview lasting approximately 45-60 minutes. Following the interview, participants will have the option to receive updates about the manuscript and future research. All participation will be anonymous.

Interviews will continue until thematic saturation is reached. Then researchers will collaboratively analyze the data using open coding, following the principles of thematic analysis. The initial codes will then be clustered into broader themes to inform the findings, which will be presented in a manuscript.

Call for Participation

We call for participation in this study.

If you have used LLMs (e.g., GPT, Llama, Claude...) when you contribute to Wikipedia (eg. Editing Wikipedia articles with LLMs, using LLMs when interacting with other contributors), we’d love to join the study! You will be engaging in a 45-60 min interview, talking and reflecting about your experience with Wikipedia and your perception/usage of LLMs in Wikipedia. Your valuable input will not only help us understand practical ways to incorporate LLMs into the knowledge production process, but also help us generate guardrails about these practices. All participation would be anonymous.

To learn more and sign up, please visit https://umn.qualtrics.com/jfe/form/SV_bqIjhNRg9Zqsuvs.

Timeline

Beginning of December: Start recruitment & conduct interviews
Beginning of March: Finish interviews (Goal: 10-11 participants)
End of April: Finish data analysis

Policy, Ethics and Human Subjects Research

We were approved by IRB at the University of Minnesota on Nov 25, 2024 under STUDY00023793. If you have any questions, concerns, or feedback, please go to the link

References

↑ Wikipedia:Large language model policy. 2024. https://en.wikipedia.org/wiki/Wikipedia:Large_language_model_policy
↑ Brooks, C., Eggert, S., & Peskoff, D. (2024). The Rise of AI-Generated Content in Wikipedia. arXiv preprint arXiv:2410.08044.
↑ Wikipedia:WikiProject AI Cleanup. 2024. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_AI_Cleanup

[wikipedia_policy-1] Wikipedia:Large language model policy. 2024. https://en.wikipedia.org/wiki/Wikipedia:Large_language_model_policy

[rise_llms-2] Brooks, C., Eggert, S., & Peskoff, D. (2024). The Rise of AI-Generated Content in Wikipedia. arXiv preprint arXiv:2410.08044.

[ai_cleanup-3] Wikipedia:WikiProject AI Cleanup. 2024. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_AI_Cleanup

[1]

[2]

[3]