Jump to content

User:APaskulin (WMF)/Research/Writing for multilingual communication

From Meta, a Wikimedia project coordination wiki

Status: Developed between July 2022 and June 2023
Last updated: 2023-07-01 by APaskulin (WMF)

This page is a collection of notes and findings about best practices for writing for multilingual communication.

The document or site works when target users can find what they need, understand what they find, and act on it confidently. — Center For Plain Language, "Five Steps to Plain Language", centerforplainlanguage.org

Prototypes

[edit]

Research questions

[edit]
  • Does Wikimedia documentation follow best practices for writing English for understandability and translation?
  • Does improving these best practices improve the experience of translators and readers?
  • What guides and automated tools can help people write better English for understandability and translation?

Principles

[edit]

Emphasize, celebrate, and support multilingual communication: This research seeks to improve our use of English to support better multilingual communication and to reduce inequality as a result of bias in favor of English and certain styles of English.

Scope

[edit]

This research is initially focused on English written by the Wikimedia Foundation, but I hope that it will be widely applicable across the movement and across languages. This research is focused on readability and ease of translation, both translation done by humans and by machines. While vitally important, inclusive language isn't specifically addressed, although these tools and strategies can likely be adapted to apply to inclusive language as well.

Terms

[edit]
plain language, Plain English
Plain language is a well-established concept in research, governments, and international organizations. Although most plain language resources don't call out translation as a key benefit, language that is plain is also easier to translate.
Global English, International English
I found a lack of consistency in the way these terms are used.[1] Some uses of these terms refer to a prescribed universal language, an idea which ignores the value of multilingual communication.
Basic English
This terms refers primarily to a formalized, simplified English created by Charles Kay Ogden in the 1930s.[2]
internationalization, localization, translation
Internationalization refers to the work required to allow something to be localized, such as having translatable strings.
Localization is the process of fully adapting something for another locale, including translating the text and making other adjustments, such as changing currencies.
Translation is the specific process of converting text from one language to another.
machine translation
The use of software to translate text or speech from one language to another[3]

Background: Methodologies

[edit]

Controlled or simplified English

[edit]

There are several standards that enforce a simplified version of English through allowed words and structures. These controlled forms of English are created locally by organizations or individuals, by industries, or using a reduced dictionary. Some examples include Ogden's Basic English[4], the international aviation industry's Simplified Technical English [5], and Randall Munroe's Thing Explainer[6]. Although useful, these standards place too much emphasis on brevity at the expense of clarity. While it makes sense for aviation documentation to be rigidly controlled, I don't think it would work for the breadth and diversity of the Wikimedia movement.

Syntactic cues guidelines

[edit]

In The Global English Style Guide, John R. Kohl makes a persuasive argument for what he calls syntactic cues guidelines. These guidelines focus on revising text at the sentence, clause, and phrase level. These guidelines, outlined in the book, are great tools, but they require a deep knowledge of grammar and sentence structure.

Readability and grade level scores

[edit]

The popular Flesch–Kincaid test gives a score out of 100, roughly corresponding with US school grade levels.[7] However, some consider other tests, such as SMOG[8], to be more reliable.

Narrative guidelines and training

[edit]

The US government provides narrative guidelines on how to write plain language, supplemented by examples and trainings.[9] The before and after examples are particularly helpful.[10] Simple English Wikipedia also uses narrative guidelines to teach writers.[11]

Case studies

[edit]

Before-and-after examples of plain language on plainlanguage.gov

[edit]
Wordiness Made Spare
These examples are from US government regulations. They demonstrate how to use plain language with an authoritative tone. A few of the examples use bulleted list items to complete an introductory sentence fragment, which is an anti-pattern when writing for translation.
Ambiguous Wording Rewritten
These changes are too minor to be useful as examples. There's also an instance of using bulleted list items to complete an introductory sentence fragment.
Monthly Due Date
This is an interesting example of replacing a paragraph with a table. The phrases are fairly atomic, so it might work for translation.
Application Due Date
This example replaces passive voice with a pronoun that isn't defined in the context of the example.
Car Safety
This example replaces a paragraph with a graphic, which is a great, practical example of plain language.

Testing samples with Hemmingway, Expresso, and Write Clearly

[edit]

For this case study, I assessed four writing samples using three tools:

Hemmingway
Closed-source, paid app with a free web version
Expresso
Free, open-source web tool
Write Clearly
Free, open-source, cross-browser bookmarklet

With each tool, I noted the grade level score and suggestions for improvement. With simpler samples, I revised the text and noted the new grade level score. In these cases, I was able to improve the score by 2-3 points by making the changes suggested by the tools. With more complex samples, I found that it was difficult to improve the readability of the text without a clear understanding of the subject matter.

Of the three tools, I found that Expresso was the most useful in improving the readability of the text.

  • Hemmingway gives a limited set of metrics that overemphasize removing passive voice and adverbs. Long, complex sentences are labeled "hard to read" and "very hard to read" without providing a cause.
  • Write Clearly provides some good suggestions, including "shorten sentence" and "replace complex words". However, the interface and unusual implementation make the tool difficult to use.
  • Expresso provides options to highlight different sets of words based on their type or contribution to readability, making it easy to focus on improving one aspect of the text at a time. Groupings like pronouns, clustered nouns, and rare words allowed me to easily improve readability.

Within Expresso, here are the metrics I found most helpful for improving the text, in rough order of value:

readability grade
Noting this score before and after editing provides a tangible measure of improvements made.
rare words
This filter highlights words outside the 5000 most frequent words based on the Corpus of Contemporary American English.
Caught: anchored (and "anchor"), allocation, Wikimedia, outward, inward, disjointed, focused (not "focus"), radically, changed (not "change"), concurrently, reallocate (and "allocate"), haphazard
False positives: January, 2022
detached subjects
This filter highlights subjects that are more than 8 words apart from corresponding predicates. Already I'd expect this to appear rarely, it is a key improvement for translation.
entity substitutions
This filter highlights all non-possessive pronouns as well as vague words like "thing" and "stuff". Pronouns are prone to ambiguity and can be problematic for translation. In fact, Kohl makes several specific recommendations for using it, this, that, these, those, and which clearly (95). Weiss goes a step further and recommends replacing all pronouns with their equivalent nouns for ease of translation (69). Using this filter helps the reviewer ensure that all pronouns have clear referents and remove pronouns where unnecessary.
negations
Good filter to check for confusing negations and places where negations could be removed.
clustered nouns
Checks for nouns with three or more words. Great filter for reducing complexity and improving ease of translation.
extra long sentences
This filter highlight sentences with 40 or more words.

Conclusions

[edit]

September 2022

[edit]

Plain language is a well-established discipline backed by research, but some of its recommendations go against best practices for writing for translation. In order to write for multilingual communication, we need to go beyond plain language and add best practices for writing for translation.

In order to write using plain language, it's necessary to thoroughly understand the subject matter. There's an opportunity here to use plain language to improve our thinking, which will in turn further improve our writing.

Long lists of grammar rules are too difficult to understand and remember when editing. The most important and useful guidance on how to write plainly comes from examples. We'll need to experiment with how to present examples and how to encourage people to submit examples in a blameless environment.

As a next step in this research, I'd like to test Expresso, along with some guidance and a prescribed process, as a tool for improving text for multilingual communication.

Example editing process

[edit]

1. Structure the text for readability

  • Organize the text into small paragraphs with 2-3 sentences each.
  • Use headings, lists, tables, images, and callouts to break up the paragraphs and make the page easy to scan.

2. Edit the text

Open Expresso, and note the grade level score. After each edit, reanalyze the text. When edits are complete, note the new grade level score.

  • Replace rare words with simpler alternatives.
  • Fix detached subjects.
  • Check that all pronouns used are clear. Remove pronouns where possible.
  • Break up any long sentences.

April 2023

[edit]

Use plain language as the central concept to teach writing clearly.

The concept of plain language is well established in governments, international organizations, and academic research, including the United States government, New Zealand government, International Organization for Standardization, World Wide Web Consortium, United Nations, and World Health Organization. The influence and reach of plain language make it the best terminology to use when training writers to write clearly. Writers will be able to search for plain language and find resources that help them reach the level of clarity and ease of understand that we're aiming for, more so than if we used more general terminology around writing well. Additionally, by using plain language as the central concept, we can avoid wordy phrases like writing for translation and writing for multilingual communication.

Although most plain language guidelines don't call out ease of translation as a benefit, writing that is plain is easier to translate. Plain language principles cover most of the principles of writing for translation, although there are additional principles that support writing for translation that are not part of typical plain language guidelines. Instead of separating writer-facing guidance into plain language principles and writing for translation principles, I've chosen to combine these two sets of principles under an umbrella of "plain language at Wikimedia", making the material more cohesive.

Define when to use plain language.

Another technique for separating writing plainly and writing well is to define when it is important to use plain language and when it is not important. To do this, I'd like to avoid discussing specific genres of writing (like technical writing) or tools, and instead focus on purpose: Use plain language when writing to explain or instruct. The goal of plain language is to get the right information, at the right time, to the right people, so they can take action. Asking writers to change their writing in specific contexts is easier than asking writers to only use plain language everywhere.

Plain language is as important for writing in English for English speakers as it is for writing in English for translation.

When writing is confusing, it is not only difficult to translate; it is difficult to understand for everyone. There are real risks to using confusing writing styles when we write to explain or instruct. Confusing writing is gatekeeping; it excludes people who cannot (or don't have time to) decipher the writer's intent. Frustrated readers can react negatively to information they would otherwise agree with if presented differently, which can damage relationships and block collaboration. Writing clearly is even more important as we rely more on digital, asynchronous communication methods to connect global communities.

Additionally, even if a text is not scheduled for immediate translation, Wikimedia-related writing always exists in a multilingual context. Wikimedia readers likely speak more than one language, and the techniques that make writing easier to translate also make it easier for multilingual speakers and speakers with varying levels of proficiency in English to understand.

Confusing writing is a result of habit.

Even with the best intentions, communication is hard. There are a lot of factors that influence a writer's ability to convey information and a reader's ability to understand it. Furthermore, we turn to documentation specifically when things are complicated. As a result, it is no surprise that our efforts to explain and instruct often result in writing that is confusing and hard to understand. Why do writers write in confusing ways? Why do we use rare words instead of common ones, complicated sentence structures instead of simple ones?

The first factor is that people naturally mimic the language they're exposed to. When people read things that are confusing and filled with jargon, they are more likely to use that style in their own writing. The more confusing writing we consume, the more familiar it feels and the more naturally we fall back on phrases that we've heard before, even if these phrases are hard to understand.

The second factor is that confusing writing is easier to write than clear writing. By using ambiguous words, we avoid having to decide what we really mean to say. We unconsciously place the burden of deciphering our intentions on the reader. For example, it's much easier to just say that we're "connecting the dots" than to specify exactly what we mean by that.

Thirdly, in teaching English writing, we emphasize style, tone, and rhythm. This methodology teaches writers to vary the words they use and the cadence of their sentences. When we apply these concepts to documentation, the results make the text much harder to understand. For example, writers are taught that it sounds better to use three items in a list instead of two. To try to achieve this, writers will often add a third item to the list even if it doesn't make sense or repeats another list item in other words.

Teaching plain language by teaching grammar doesn't work.

Plain language guidelines often resort to lists of rules of allowed and disallowed grammatical structures. These rules are very difficult for writers to act on. These rules assume a level of expertise in English grammar that most writers don't have. Furthermore, these rules treat writers like machines, asking writers to iterate through a set of rules as they check the text for specific elements of grammar. As a result, these lists create a heavy cognitive load for writers. You can sometimes see the result of this cognitive burden in cases where writers actually produce more confusing writing as a result of being exposed to these types of guidelines. At its worst, grammar rules act like another form of gatekeeping, sidetracking efforts to make writing more clear into debates about what is grammatically correct.

Instead of teaching the minutia of grammar, we need to re-frame the way we teach writing. We should reserve lists of grammar rules for machines (and professional writers), and teach better writing through changing the way people think about writing and, more concretely, through examples of clear and unclear writing. After all, grammar prescriptivism goes against the natural development of language. The rules for comma usage aren't an inherent property of English; those rules are merely a convention, established by authorities and enforced through education. In the same way that language naturally evolves based on the creativity of speakers, grammatical conventions can evolve away from gatekeeping, shame, and an overemphasis on consistency. We can refocus the way we teach writing on what matters in communication: empathy. If we succeed, we can stop defining good explanatory or instructive writing by grammatical correctness and start defining good explanatory or instructive writing by how easy it is to understand.

Omit guidance about how to structure documents for readability.

Although I originally planned to include structuring for readability in guidance for writers, I found that it wasn't necessary. In my research, I found that almost all the texts I looked at were well structured for readability, using short paragraphs, lists, and callouts to break up the text.

Examples

[edit]
Instead of

You do not harm our technology infrastructure and follow the policies for that infrastructure.

Use

You do not harm our technology infrastructure, and you follow the policies for that infrastructure.

Repeating the word "you" makes it clear that you should follow the policies but you should not harm the infrastructure. You could also fix this by re-arranging the actions in the sentence, but repeating "you" makes the sentence easier to read.
Instead of

This code can be shared with other users.

Use

You can share this code with other users.

Clearly stating the actor makes the sentence easier to understand.
Instead of

Get feedback from all members of the team, users, leadership and other relevant stakeholders to ensure the goals are clearly understood by all.

Use

To ensure the goals are clearly understood by all stakeholders, get feedback from users, leadership, all members of the team, and other relevant stakeholders.

Splitting the sentence into two parts makes it easier to read. Moving "all members of the team" to the end of the list makes it clear that we're not talking about all users or all members of the leadership team.
Instead of

This is a double-edged guide: to writing content with an eye to making it translatable, and to translating Wikimedia content effectively.

Use

This is a guide with two purposes: to help write content that is easier to translate, and to help translate Wikimedia content effectively.

Removing idioms makes the sentence easier to understand.
Instead of

Architectural artifacts are created in order to describe a system, solution, or state of the enterprise.

Use

Architectural artifacts describe a system, solution, or state of the enterprise.

Removing unnecessary words makes the sentences simpler to translate and easier to understand.
Instead of

Feedback (positive and constructive) as a reaction to the goals should be recorded and addressed before moving to the next phase.

Use

Before moving to the next phase, document and respond to constructive feedback that you received on the goals.

Splitting up the sentence and focusing on the actions makes the sentence easier to understand.
Instead of

If the feedback was incorporated that should be communicated to the person(s) that gave the feedback, if it was not, the decision maker should equally share why the constructive feedback was not incorporated into the goals.

Use

Respond to all the constructive feedback that you receive. In your response, indicate whether you incorporated the feedback and why.

Breaking up the sentence into shorter blocks that highlight the actions makes the sentences easier to understand.
Instead of

There is no hard and fast limit on API requests, but be considerate and try not to take a site down. Most system administrators reserve the right to unceremoniously block you if you do endanger the stability of their site.

Use

There is no official limit on API requests, but administrators may block your client if it impacts the stability of a site.

Removing extra words and unusual words makes the sentence simpler and easier to understand.
Instead of

As used throughout the rest of the terms our services consist of: The websites we host, technological infrastructure that we maintain, and any technical spaces that we host for the maintenance and improvement of our projects.

Use

As used throughout the rest of the terms, our services consist of any technical spaces that we host for the maintenance and improvement of our projects, including websites and technological infrastructure.

Breaking up the sentence along the main points makes it easier to understand.
Instead of

However, we act only as a hosting service provider, maintaining the infrastructure and organizational framework that allows our users to build the Wikimedia Projects by contributing and editing content themselves, and to reuse that content, including specialized technological infrastructure that enables users to programmatically interact with and re-use content on Wikimedia Projects ("APIs"), and mobile applications.

Use

However, we act only as a hosting service provider. We maintain the infrastructure and the organizational framework that allow our users to build the Wikimedia Projects by contributing and editing content themselves.

The infrastructure we maintain includes mobile applications and programmatic interfaces ("APIs"). APIs are specialized technological infrastructure that enable users to programmatically interact with and re-use content on Wikimedia Projects.

Breaking up the sentence into smaller parts and grouping related ideas together makes the sentence easier to understand.
Instead of

The Revisions Committee appreciates your time in reviewing and commenting on these changes. They will be reading suggestions and incorporating changes as part of the review process.

Use

The Revisions Committee appreciates your time in reviewing and commenting on these changes. The Revisions Committee will be reading suggestions and incorporating changes as part of the review process.

Repeating "The Revisions Committee" instead of using "they" makes the text easier to translate.

Word list

[edit]
Instead of Use Explanation
"set in stone" permanent idiom
"here" in link text Use the title of the page as the link text. unclear
"and/or" Rephrase. confusing
"detail" as a verb describe or explain confusing
"connect the dots" Rephrase. idiom
"big picture" Rephrase. figurative language
"zoom out" (figuratively) Rephrase. figurative language
"build on" (figuratively) Rephrase. figurative language

Content audit

[edit]

A non-exhaustive list of pages related to writing for multilingual communication. This is an experimental type of content audit that looks for topic coverage across collections. Pages in this section are not linked to avoid spamming WhatLinksHere.

Findings

[edit]
  • Content related to writing best practices on wiki can be split into two groups:
    • general writing advice (either for all languages or for a specific language)
    • writing advice within a specific context (such as on English Wiktionary)
  • Only two pages provide in-page advice for "writing for translation":
    • Writing clearly on Meta-Wiki
    • Documentation/Style guide on mediawiki.org
  • There are several pages that link to resources that provide writing advice, both internally and externally.

Ideas

[edit]
  • Use Writing clearly as the main page for information about writing for multilingual communication.
    • Try to increase traffic to Writing clearly by linking to it from places where people are likely to benefit from it.
      • Tech/News/Manual
      • Multilingual communication
      • Multilingualism
      • Style guides
      • Documentation/Style guide
      • Documentation/Technical style guides and templates
      • Meta:About (Add a list of all Category:Meta-Wiki guidelines)
      • Project:Help on mediawiki.org (Add a list of policies and guidelines)
      • (There appears to be no project-level information on Wikitech, so I'm skipping it for now.)
    • Try to centralize Writing clearly as a place for external links to writing advice (including tools), so we don't have to maintain these lists in several places
      • Tech/News/Manual
      • Translatability
    • Be clear about what advice applies across languages and which applies only to some languages, but don't try to document the grammar of every language.
    • Add a corpus of English examples at Writing clearly/Examples
    • Add a word list of English terms at Writing clearly/Word list
  • Use Language guides on Meta as the canonical list for writing advice in a specific context.
    • Update this page in include relevant pages found during this audit.
      • [done] Inclusive language on mediawiki.org
      • [done] Documentation/Style guide on mediawiki.org
      • [too specific?] Wikipedia:WikiProject Military history/Academy/Copy-editing essentials on English Wikipedia
      • [not a policy] Technical writing/Style on English Wikiversity
      • [done] Wikinews:Style guide on English Wikinews
      • [done] Help:Writing definitions on English Wiktionary
    • Find instances where pages link to a specific one of these and link to the list instead.
  • As a stretch goal, update Multilingualism on Meta.

On Meta-Wiki

[edit]
Tech/News/Manual
Short list of tools and resources in Language section
Does not link to Writing clearly
No translations
Last updated 2023
~100 pageviews monthly
Multilingual communication
Mostly outdated information about multilingualism specifically on Wikipedia
No translations
Last updated 2012
~50 pageviews monthly
Multilingualism
Marked as needing an update
Information about multilingual communication specifically on Meta-Wiki
Links to resources for translators
Has translations
Last updated 2023
~40 pageviews monthly
Writing clearly
Marked as a Meta-Wiki guideline
Covers general writing advice and information about translating content
No translations
Last updated 2020
~30 pageviews monthly
Translatability
Short, likely incomplete list of tips for creating translatable pages
Links to Writing clearly
Has translations
Last updated 2019
~10 pageviews monthly
Language guides
List of writing guides across languages
Does not link to Writing clearly or the MediaWiki style guide
No translations
Last updated 2022
~10 pageviews monthly
Style guides
Marked as a disambiguation page
Incomplete content
Links to Language guides
No translations
Last updated 2017
~10 pageviews monthly
Meta:Style guide
Stub
No translations
Last updated 2009
~5 pageviews monthly

Obsolete and historical

[edit]
Style guide
Wikimedia style guide
Copyediting

Categories

[edit]
Multilingualism
Communication
Translation

On the fringe

[edit]
Thoughts on Multilingualism
Short description of ideas for improving multilingualism on English Wikipedia
Linguistic democracy in a multilingual project
Interesting example of multilingual communication

On mediawiki.org

[edit]
Documentation/Style guide
MediaWiki style guide
Has translations
Does not link to Writing clearly
Last updated 2022
~280 pageviews monthly
Inclusive language
Marked as a MediaWiki development guideline
Non-inclusive terms to avoid
Has translations
Does not link to Writing clearly
Last updated 2023
~140 pageviews monthly
Help:Extension:Translate/Translation best practices
Information about how to translate content
Has translations
Does not link to Writing clearly
Last updated 2015
~70 pageviews monthly
Documentation/Technical style guides and templates
Landing page for writing guides in a technical context
No translations
Last updated 2023
Does not link to Writing clearly
~30 pageviews monthly
Documentation/Toolkit/Style review
Short checklist for improving writing style
No translations
Last updated 2022
~5 pageviews monthly

On the projects

[edit]
English Wikipedia
Wikipedia:Manual of Style
Wikipedia:WikiProject Military history/Academy/Copy-editing essentials
Translatewiki
Localisation guidelines
FAQ
English Wikiversity
Technical writing/Style
English Wikinews
Wikinews:Style guide
English Wiktionary
Help:Writing definitions

User pages

[edit]
Meta-Wiki
User:Tom Morris/WMFers Say The Darndest Things
User:The Land/Why do They always do It wrong
English Wikipedia
User:Tony1/How to improve your writing
User:Tony1/How to use hyphens and dashes
User:Tony1/Build your linking skills
User:Tony1/Spot the ambiguity
User:Tony1/Advanced editing exercises
User:Tony1/Redundancy exercises: removing fluff from your writing

References

[edit]
[edit]

Notes from Wikimedia Foundation quarterly learning sessions on translation

[edit]
  • Contacts: Runa B, Mayur P
  • 2022/2023 Wikimedia Foundation annual plan objective: "Strengthen and streamline shared translation / interpretation services across the organization"[12]
  • June 2022[13]
    • "The massive multilingualism of the movement is a great strength that is often overlooked."
    • This research fits under the "Tools / Technical Side"
    • Relevant comments
      • We have been relying more on machine translation since translation is tedious work for humans.
      • We should measure the effectiveness of translation. Did people actually read and understand something that was translated?
      • In some cases, translations can have high stakes, and errors can have serious consequences.
      • We should use simple language, shorter sentences, readability metrics, and the information pyramid[14] to help reduce the amount of content that needs translation.
      • We've created a database of terms [15] to help translators and reduce the "amount of failure we get from translation agencies".
      • Style guides are helpful for writing for translation.
      • We should measure how our translations are perceived by communities.
  • September 2022
    • 80% of foundation staff use translation services more than once during a quarter, and 65% needs almost every week.
    • More than 40% of translations are related to announcements and Wiki pages.
    • 82% of the content is for affiliates and volunteers
    • More than 50% use between 10-20 languages.

WCAG 2

[edit]

Center for Plain Language

[edit]
  • Five Steps to Plain Language
    • "The document or site works when target users can find what they need, understand what they find, and act on it confidently."
    • "Leave out details that don’t help or may distract readers, even if they are interesting."

Essentials of Plain Language, Katherine Spivey

[edit]
  • Recorded talk on YouTube
    • Plain language improves transparency.
    • Rough guidelines: Paragraphs should be no more than 7 lines; sentences should be no longer than 20 words.

Citations

[edit]