ウィキメディア財団年次計画/2024-2025/製品&技術OKR類

This page is a translated version of the page Wikimedia Foundation Annual Plan/2024-2025/Product & Technology OKRs and the translation is 39% complete.

Outdated translations are marked like this.

当文書はウィキメディア財団製品技術部門の2024年-2025年次予算の手順のうちパート1に当たります。当部門草稿の「目的と主な成果」（OKRs＝objectives and key results）の提示を趣旨とします。昨年度に始めた 作業のポートフォリオを引き継いでいます（通称「バケット」）。

ウィキメディア運動の直面する最も差し迫った問題として私が信じるところに関して、去る11月、私どもでは広く皆さんと対話しました。すなわち、ウィキペディアをはじめウィキメディアの全プロジェクトは多世代にわたる存在であると、私たちはどのようにすれば保証できるでしょうか？時間を割いてこの質問を真剣に検討して、直接、私に答えてくださった皆さんに感謝申したく、また皆さんの回答にじっくりと目を通しましたので、私の方からも学んだことを共有しようと考えました。

第一にボランティアの皆さんがなぜ貢献するか、理由はひとつに絞れません。複数世代のボランティア育成には、人々がなぜ私たちのプロジェクトに時間を費やしてくれるのか、多くの理由をもっと掘り下げて理解するべきです。次に、私たちがどんな点で他と差別化されているか焦点を当てるべきです。インターネット上やプラットフォーム上では新世代の観衆の注目を集めようと先を争うあまり、偽情報や偽情報が蔓延していても、私たちには信頼できるコンテンツを提供する能力があります。一例として、もし情報の欠落が不公平や差別や偏見に根ざしていた可能性があるなら、私たちは人類の知識の総和を集めて世界に届けるという使命のため、それらをも拾い上げ確実に使命を果たそうとしています。インターネットは、人工知能と豊富な経験値によってめまぐるしく変化していきますが、私たちのコンテンツはその中できちんと機能し、その重要性を保つ必要があります。最後にこの活動に対する資金供給を長く保つには、製品と収益を横断する共通戦略を築き、長命な資金提供の方法を見つけなければなりません。

これらのアイデアは2024年-2025年のウィキメディア財団年次計画に反映する予定であり、本日、皆さんとその最初の部分を共有したく、製品・技術の取り組み目標の草案の形でお見せします。昨年と同様、私たちの年次計画は全体として観衆とプラットフォームの技術面のニーズを中心に据えているのですが、果たして焦点を当てた課題が正しいかどうか、皆さんのフィードバックをお待ちしています。この数ヵ月を費やして話し合い：2024やメーリング・リスト、トークページやコミュニティのイベントなどの場で、今後1年間の製品・技術戦略に関してコミュニティの皆さんからアイデアを聴き集めており、それに基づいて組み立ててあります。以下で目標の草案全体に目を通してください。

ここでいう「目標」とは、当財団が来年度に取り組む製品技術プロジェクトを形作る高次の方向性です。（objective）。当財団の戦略方向性を意図的に広範に示してあり、要点は、多くの注力分野を想定できる中で当財団の提案としてどのような課題分野を来年度は優先すべきか表しています。現状、これを皆さんと共有して、当該年の予算や測定可能な目標の確定前に、私たちの初期段階の考え方の形成にコミュニティの皆さんからご協力いただきたいと考えます。

フィードバック

特にフィードバックを期待する分野の1つは「ウィキ体験」と名付けた一連の作業です。（「Wiki Experiences」。）「ウィキ体験」とは人々が直接、ウィキを使うとき、それぞれの人が投稿者や消費者あるいは寄付者のどの立場であっても効率的な方法を提供し、改善し、革新する方法と関連します。これには中核の技術と機能を支える作業が含まれ、ボランティア編集者 — 特に拡張権限を預かる編集者 — の体験向上に繋がるように、もっと良い機能とツールを、翻訳サービスを、プラットフォームの更新を手がけていきます。

ここでは計画をめぐる直近の論議から一部をまとめ、また私たちの発想にどのように磨きをかけるべきか皆さん全員が考える助けになるよう、以下の質問を置いてみます。

ボランティアとしてウィキメディアのプロジェクトに参加するなら、やりがいを感じる体験であるべきです。さらにオンラインの協働作業という経験は、ボランティアの皆さんが何度でも戻ってくる主な理由であるべきだと私たちは考えます。皆さんが無償の編集にやりがいを感じるため、お互いに協力して信頼できるコンテンツを構築するために、欠かせないものはいったい何でしょうか？
私たちのコンテンツの信頼性は、世界に贈るウィキメディア固有の貢献の一部を構成し、これがあるからこそ、人々は私たちのプラットフォームに何度もアクセスしコンテンツを利用し続けるのです。コミュニティはプロジェクトごとに品質のガードレールを設定しており、その範囲内に収まりつつ、信頼に足りるコンテンツをより迅速に伸ばすには、何を構築するとよいでしょうか？
関連性を保ちながら、ウィキメディアが大規模な他のオンライン・プラットフォームと競争するには、新世代の消費者にコンテンツと自身のつながりを感じてもらう必要があります。どうすれば、読者や寄付者にとってコンテンツが見つかりやすくなり、操作も楽になるでしょうか？
虐待がオンラインに蔓延する今の時代、コミュニティもプラットフォームも、そのためのシステムも、確実に保護する必要があります。それと並行して一方で私たちはコンプライアンス義務の進化に直面し、他方で世界の政策立案者は個人情報の保護、自己同一性やオンラインの情報共有を根付かせようとしています。これらの課題にきちんと対処するには、虐待と戦う能力をどのように改善すれば良いでしょうか？
メディアウィキ（MediaWiki）とはソフトウェアのプラットフォームでありインタフェースであり、ウィキペディアが機能するように働くため、公開で大規模な多言語コンテンツを作成し修正し保存と発見ができるようにするため、常にサポートが必要です。今年、何を決定しプラットフォームをどのように改善すると、このMediaWiki を持続可能にできるでしょうか？

議論

–– Selena Deckelmann

趣旨

現状では、計画の最高レベル - "目標" を公開しています。

次のレベル - 3月に最終的な目標ごとの"主要な成果"（以下KR）^※を、このページに公開しました。（"※"＝Key Results。）

それぞれの KR には下敷きとなる"仮説" は以下に示したとおりで、年間を通じて関連プロジェクト／チームのウィキページで更新していき、そのタイミングは時点を問わず、教訓が得られたときに更新する予定です。

ウィキの経験（WE）と目的
目的	目的範囲	目的	目的の内容	主催者
WE1 議論	投稿者の経験	投稿者は経験豊富であっても初学者であってもオンラインで連携して、もっと簡単に、ストレスを減らして信頼できる百科事典を構築します。	今後もウィキペディアが活気に満ちたものであり続けるには、ボランティア育成の対象を複数の世代に広げ、投稿が人々にとって魅力的になる取り組みをしなければなりません。必要な投資はボランティアの世代ごとに異なり -- 経験豊富な投稿者は強力なワークフローの合理化と修復を求めるとするなら、初学者は一人ひとりにとって意味のある編集方法を新しく手当てする必要があります。そして最も影響力のある作業をする上で肝心なのは、どの世代であっても、すべての貢献者が互いにつながり、協力できることです。この目的達成に向けて経験豊富な貢献者向けに重要なワークフローを改善し、初学者向けには建設的な貢献を阻害する壁を下げ、共通の事柄に関心を寄せるボランティア同士が相手を見つけたり互いに意思疎通できる方法に投資します。	Marshall Miller
WE2 議論	百科事典にふさわしい内容	知識の格差を効果的に縮めるには、アクセスしやすいツールやサポートシステムをコミュニティの皆さんに提供して、活用や改善のしやすさが百科事典に信頼性を確保したコンテンツを増やしていきます。	主にウィキペディアにある百科事典のコンテンツを増やしたり改善するには、取り組みの継続と革新を介しています。投稿者がニーズに合わせて使う（技術および技術以外の両面の）ツールとリソースは、もっと見つけやすく、信頼できるものにする余地があります。WMFはこれらのツールに対してもっと適切に対応し、機能改善を短いサイクルで達成するべきです。最近の傾向として AI 支援によるコンテンツ生成とユーザー行動の変化を考慮するなら、大幅な変化に適応する基礎づくりも探っていき（ウィキ関数など）、コンテンツ作成と再利用における大規模な成長を支援できるようにします。コンテンツ格差を特定するメカニズムは、もっと見つけやすく簡単に対策できる必要があります。百科事典の内容の伸展は、姉妹プロジェクトやウィキペディア図書館を含むプロジェクト類やキャンペーン関連の内容などを含めて、投稿ワークフローとの統合をもっと適切にできるはずです。これと同時に、手順の信頼性を引き続き確保するには、伸展に用いる手法は百科事典の掲載内容についてウィキメディアのプロジェクト群全体で認識される基本理念に忠実であることと、増大する脅威に対するガードレールを備える必要があります。観衆：編集者、翻訳者	Runa Bhattacharjee
WE3 議論	消費者の体験（閲読とメディア＝Reading & Media）	ウィキペディアにやってくる新世代の消費者にとって、百科事典のコンテンツを発見したり関与し、長く続くつながりを築きたくなる目的地となるようにします。	目標：既存と新世代の消費者と寄付者を保持。私たちのコンテンツの見つけやすさを高めて操作しやすくし、既存の消費者および新世代とのつながりを強化。私たちの経験と既存のコンテンツに適応するようにプラットフォーム間で連携させ、新世代の消費者や寄付者が行為者として、また、これらの人々を対象とした百科事典のコンテンツ探索やキュレーションができるようにします。	Olga Vasileva
WE4 議論	信頼安全	さまざまな種類の大規模かつ直接的な嫌がらせ行為に対抗するインフラとツール類や手順を改善して十分な設備を整え、コミュニティやプラットフォームおよびサービス提供システムを保護しつつ、進化する規制環境への説明責任（コンプライアンス）を保ちます。	嫌がらせ行為と戦う能力のいくつかの側面は向上させなければなりません。IP に基づいた不正行為の軽減は効果が薄れてきたことから、効率の改善が必要な管理ツールが複数あり、大規模な不正行為と戦う統一戦略を立てて、具体的にはさまざまな予兆や軽減の仕組み（キャプチャ、ブロックなど）を連動させて用いる必要があります。私たちは今年いっぱいをかけ、この分野最大の問題に取り組みを始めます。さらに嫌がらせ行為防止へのこの投資とバランスを取るには、コミュニティの健康状態を理解し改善するためにも投資が必要であり、それらの側面のいくつかはさまざまな規制要件に分散しています。	Suman Cherukuwada
WE5 議論	知識のプラットフォーム I（プラットフォームの進化）	メディアウィキのプラットフォームとそのインターフェイスを進化させ、ウィキペディアの中核となるニーズをより適切に満たします。	メディアウィキ（MediaWiki）を構築した目的は、オープンな多言語コンテンツを大規模に作成し管理し、保存や検出、利用ができるようにするためでした。知識プラットフォーム（Knowledge Platform）が2年目に入る今期は、システムの精査とプラットフォームの改善に取り組み、今後10年にわたってウィキメディアのプロジェクト群の中核となるニーズを効果的にサポートすることを目指し、まずはウィキペディアから始めます。これには、知識生成プラットフォームを定義する作業を続けること、プラットフォームの持続可能性を強化すること、拡張機能／フック類のシステムに注力して機能開発を明確にして合理化すること、人々がメディアウィキに貢献できるように知識の共有に投資を継続することが含まれます。	Birgit Müller
WE6 議論	知識のプラットフォーム II（開発者対象のサービス）	技術系の職員とボランティア開発者には、ウィキメディアのプロジェクト群を効率よくサポートする上で特定のツール類が必要です。	私たちが立ち上げた、ウィキメディアの製作における開発や試験、展開のワークフローを改善（および拡張）を目指す作業を継続し、ツール開発者向けのサービスを含めるように定義を拡張します。また開発者／エンジニアリングのワークフローや対象ユーザーの分野では、よくある質問に回答する能力を向上させ、関連データにアクセスできるようにして情報に基づいた意思決定を可能にすることも目指しています。この作業には、現行の慣行（またはその欠如）がエコシステムの課題となっているなら、それを検討することも含まれます。	Birgit Müller

信号およびデータサービス（SDS）と目的
目的	目的範囲	目的	目的の内容	主催者
SDS1 議論	洞察の共有	ウィキメディアの使命と運動をどのように支えるか、私たちは、高レベルの指標と洞察に基づいて決定します。	私たちが技術の構築を効率よく効果的にしたり、ボランティア対応、知識へのアクセスを保護し促進する政策を提唱するには、ウィキメディアのエコシステムを理解し、何が成功か調整する必要があります。ここで追跡する共通の指標のセットとは信頼できて理解しやすく、タイムリーに利用できるものを意味します。また同時に調査と洞察を浮上させて、それぞれの測定の背後にどんな理由と方法があるか理解するのに役立足せることも意味します。	Kate Zimmerman
SDS2 議論	実験用プラットフォーム	製品管理者は製品の機能に関して、迅速に簡便に製品の機能の影響を測定し結果に自信が持てます。	製品の機能開発に関して、データに基づいた意思決定を可能にし迅速化するために、製品管理者が実験プラットフォームを使えるようにすると、機能の定義やユーザーの比較対象者を選択し、影響を数値化して確認する場が得られます。立ち上げから分析までの時間短縮が重要であって、これは学びの日程短縮により実験を加速し、すると究極には発明の加速をもたらすからです。手作業による作業や測定ごとに特製の取り組みをすると、スピード化の壁となることがわかりました。理想的なシナリオは、製品管理者が自力で実験の開始から発見まで完了できることであり、その過程でエンジニアやアナリストが手動作業でほとんどまたはまったく介入しないことです。	Tajh Taylor

将来の観衆（FA）と目的
目的	目的範囲	目的	目的の内容	主催者
FA1 議論	仮説をテストする	オンラインで知識がどのように共有され消費されるか、実験に基づく洞察を得て実態の理解を深め、ウィキメディア財団が追求すべき戦略的な投資について勧告し – ゆくゆくは私たちの運動が変化を続けるインターネットで新しい視聴者にサービスを提供できるよう、補佐することになります。	読者や投稿者を引き付け維持しようとするウィキメディア運動ですが、技術もオンライン利用者の行動も変化し続けることから、課題に直面しています（例：情報はソーシャル・メディア系のアプリ経由で取得したい人の増加、短い教育的動画（エデュテインメント）の人気、生成AI の台頭）。こうした課題はまた、情報の作成と配信に新しい方法を使うと、新しい視聴者へのサービス提供という機会にもなります。それでも私たちという運動はさまざまな戦略の全体像をデータに基づいてそれらが何をもたらし何を奪うのか明確に把握していないため、課題の克服や新たな機会を掴もうとするとき、本来なら使えるかもしれない戦略を応用できていません。たとえば……。チャットボットやソーシャルビデオなど、規模の大きな新機能を私たちのプラットフォーム上に持ち込むように投資しますか？ウィキメディアにある知識と道のりを使って人気のあるサードパーティのプラットフォームに貢献してみませんか？他にもありますか？ウィキメディアが複数世代にわたるプロジェクトとなるよう目指し、私たちは – 将来もウィキメディア財団とウィキメディア運動に視聴者を引き付け、維持するため – 仮説の検証により、有望な戦略をよりよく理解し、推奨できるようにします。	Maryana Pinchuk

製品技術支援（PES）と目的
目的	目的範囲	目的	目的の内容	主催者
PES1 議論	業務の効率化	財団の業務をより迅速に、経費節約型に、波及効果を大きくすること。	財団業務をこなす職員は日常的に多くの業務を行って迅速に経費を抑えて処理し効果が出せるよう働いています。この目標が焦点を当てる特定の取り組みは、a）事務処理の迅速化と経費節減または効果をより発揮して大幅な利益をもたらす点と b）公式と非公式の財団慣行を調整し変更する点を両立させるものです。当財団の製品と技術関連の作業の運用効率に関して、この目標に含まれる KR を実施しようとすると、今年、基本的に最も困難かつ最善の改善となります。	Amanda Bittaker

主な成果の下書き

"主な成果"（KR＝Key Results）はそれぞれの決定した目標ごとに出揃いました。このページに既出の目的と対応しています。

各KRが根拠とする「仮説」は以下のとおり、このページで年間をつうじて更新し、教訓が得られるにつれて関連プロジェクトまたは担当チームのウィキページに載せます。

ウィキ体験の主な成果（WE＝Wiki Experiences） [ 趣旨 ]
主な成果の短縮名	主な成果の本文	主な成果の本文	主催者
WE1.1 議論	共通の関心をいだく貢献者同士が互いにつながり、一緒に貢献できるようにワークフローを1件、開発または改善します。	私たちは、コミュニティの空間とウィキ上の交流は、人々を幸せにするものであり、貢献者として生産性を高めると理解しています。コミュニティ空間はさらに、新規参加者対象の研修や指導、貢献の最善手法のモデル化、知識の格差解消に役立ちます。しかしながら、一方ではウィキ上でつながる人々をサポートする既存のリソースやツール、空間はまだ標準以下であり、今日の編集者の大多数がかかえる課題やニーズを満たしていません。他方、キャンペーン担当チームの取り組みから、多くの主催者が導入と実験に熱心でコミュニティ活動に役立つ新ツールには、ワークフローの構造化が求められると示しています。こうした理由から、私たちはウィキ上で寄稿者同士の帰属意識を奨励し促すことに重点を置きたいと考えます。	Ilana Fried
WE1.2 議論	建設的な活性化：調整済み実験では介入の広範な展開により、新規参入者がモバイル機器のメイン名前空間で最低1件、建設的な編集を公開する割合を前年比（y-o-y）でモバイルウェブ版は10%増、iOS版では同じく25%、それぞれ増やすと判明しました。注：この KR 測定はプラットフォーム単位で実施予定。	現在のページ全体の編集経験では、建設的に貢献しようとする初心者の皆さんの多くにとって求められる文脈も忍耐も、試行錯誤も、あまりにも多大です。新世代のボランティア支援のため、より小規模で構造化された、タスク固有を強めた編集ワークフローの件数と可用性を増やします。（編集チェックや構造化タスクなど。）注記：ベースライン類の確立は今年度第4四半期末まで待つことになり、KR目標指標の百聞率の確立も、その後の見込み。	Peter Pelberg
WE1.3 議論	仲裁製品4件のユーザー満足度を 5%ポイントずつ引き上げること。	ウィキメディアのプロジェクト群で拡張権限を預かる編集者は、既存の機能や拡張機能、ツールやスクリプトを幅広く利用して、仲裁のタスクを実行します。今年、この分野のプロジェクトでは新機能の構築に取り組むのではなく、これらツール群の改善に重点を置きたいと考えます。年間を通して多くの製品に触れることを目指し、それぞれに効果的な改善を加えたいところです。それによりコンテンツ全体の仲裁の体験向上を実現したいと考えます。当ワークストリームにおいて満足度を向上させるツールのうち、一般的な調停者ツールはそのターゲットにするかもしれないので、ベースラインを定義します。このKRの優先順位を決定する上で、コミュニティ要望リストは大きく貢献します。	Sam Walton
WE1.4 議論	Implement at least 2 interventions to diversify the user base of the CampaignEvents extension, with the goal of extension tools being used by 3 new communities or activity types by the end of FY24/25	The CampaignEvents extension provides tools to manage and promote events on the wikis, so that people can more easily connect and collaborate together. We want more people to be able to use the tools, so that more people can also organize/participate in events or find new ways to connect with others. To do this, we want to generalize some of our existing tools (such as Event Registration and the Collaboration List) so that they can be used in different ways on the wikis and can be customizable to different people's needs. We also want to release the extension to more wikis, so that more people can use its tools with the goal of fostering greater community and collaboration.	Ilana Fried
WE1.5 議論	Create a strategy for the Contributors' experience by the end of Q3, including metrics and goals, to guide our work until 2030.	This KR reflects the work we're completing to create a long-term strategy for the contributor space as a whole, including the following teams (as of Jan 2025): Editing, Growth, Campaigns, and Moderator Tools. With our strategy we're aiming to provide more clarity for contributors over the next 5 years in order to fuel volunteer growth and create a more meaningful contributor experience.	Sonja Perry
WE1.6 議論	200 users favorite 5+ templates by the end of Q4	This KR reflects the work we're completing to create a long-term strategy for the contributor space as a whole, including the following teams (as of Jan 2025): Editing, Growth, Campaigns, and Moderator Tools. With our strategy we're aiming to provide more clarity for contributors over the next 5 years in order to fuel volunteer growth and create a more meaningful contributor experience.	Jack Wheeler
WE2.1 議論	By the end of Q2, organizers, contributors, and institutions hav 第2四半期末までに特定のツールや洞察、組織化の方法に関して主催者や寄稿者や団体にはアクセス可能で関連性の高い出版点3件を提供して、主要なトピック領域すなわちジェンダー関連（女性の健康、女性の人物伝）ならびに地理関連（生物多様性）で質の高いコンテンツの占有率向上に取り組めるようにします。	この KR の目的は既存の知識の格差縮小のために主題の範囲改善とします。コミュニティがプロジェクトのコンテンツ品質向上を目的としたキャンペーンの恩恵を受けるコツは、効果的なツールの組み合わせにあると確立しました。今年は、既存のツール改善と、知識の格差に対処する主要な主題の領域に優先順位を付けるため、新しい方法の実験に重点を置きたいと考えています。	Purity Waigi & Fiona Romeo
WE2.2 議論	第2四半期末までに、小規模な言語コミュニティの言語オンボーディングに対応する推奨事項2件（社会的と技術的）を展開してテスト、評価を採用してコミュニティのフィードバックを分析する。	現状でウィキペディアには約300の言語版があります。それでも、話者が何百万人もいる言語なのに、専用のウィキペディアも他のウィキも全くない言語がたくさんあるのです。インキュベータを2006年に開設したときは、前提条件として利用者ウィキ編集の事前知識を備えていることにしました（The Incubator）。これは、私たちが目指す理想、すべての人があらゆる知識の総和を無償で共有できるというものの実現を妨げます。ウィキメディアのインキュベータはウィキメディアのプロジェクトとして新しい言語版のウィキの可能性を整理し、作成してテストし、ウィキメディア財団がホストする価値があると証明できる場所です。この問題をさらに悪化させている事実とは、私たちの運動において参加した期間が最短で経験が最も浅い人々たちに、この過程を実行させるという想定です。あれ以来、ウィキメディアのウィキの編集は大幅に改善されたのですが、技術的な制限のせいで、インキュベーターにはこれらの更新が届いていません。現状で、特定のウィがインキュベータを卒業するまでの経過時間は数週間かかり、年ごとに作成される新規ウィキは12件程度しかないという、重大なボトルネックとなっています。既存の研究と資料から、新しい言語導入（オンボーディング）のあらゆる段階に技術的な課題が明らかになり、インキュベータに新しい言語を追加すること、コンテンツの開発と評価の複雑さ、特定の言語がインキュベータを卒業するときにウィキのサイト作成の過程が遅いことなどです。各段階は複雑で手作業で進めるため時間がかかり、改善が必要だと示しています。この問題に対処すると、特定のウィキに新しい言語版をもたらすとき、作成はより迅速かつ簡単になり、知識を共有できる人がより多くなります。さまざまな利害関係者や、既存の調査研究および情報源から、社会面でも技術面でも推奨事項のお勧めが提案され注目に値します。この主な成果では2つの推奨事項に着目、社会および技術の両面でテストし、コミュニティから寄せられるフィードバックの評価を提案します。	Satdeep Gill & Mary Munyoki
WE2.3 議論	第2四半期末までに新機能2件を導入、寄稿者はプロジェクトのガイドラインに準拠したソース素材を追加できるようになり、提携先3-5件から言語と地理の格差に対処する典拠資料の提供を受ける。	質の高い情報源素材へのアクセスを増やし、戦略上のコンテンツの格差を縮小しようと、以下を計画します。学習ネットワークとして、生物多様性ヘリテージ図書館、AfLIA、ウィキソースラブ・手稿と提携（Biodiversity Heritage Library; AfLIA; the Wikisource Loves Manuscripts。）もっとアクセスしやすい再利用指標を介し、コンテンツの提携先取得と保持を支援。寄稿者がコンテンツの信頼を高めるように、プロジェクトのガイドラインに適合する画像や典拠の追加に導き、例えば各自が画像のアップロード／追加の作業中に、潜在的な問題を指摘するなど対策します。	Fiona Romeo & Alexandra Ugolnikova
WE2.4 議論	第2四半期末までに tesi2wiki でウィキファンクションズ呼び出しを有効にして、新しいコンテンツのきっかけになるスケーラブルな方法を提供する。（Wikifunctions）	知識の格差を効果的に削減するには、質の高いコンテンツの成長を支えるワークフローを改善、特にコミュニティが小さな言語地域では、計測可能にする必要があります（スケーラブル）。	Amy Tsay
WE2.5 議論	By the end of Q4, support organizers, contributors, and institutions to increase the coverage of quality content in key topic areas i.e. Gender (women's health, women's biographies), and Geography (biodiversity) by 138 articles through experiments.	A direct followup to WE2.1, this KR is about improving topic coverage towards reducing existing knowledge gaps. We’ve established that communities benefit from effective tools paired with campaigns targeted at increasing the quality of content in our projects. This year we want to focus on improving existing tools and experimenting with new ways of prioritizing key topic areas that address knowledge gaps.	Purity Waigi & Satdeep Gill
WE2.6 議論	By the end of Q4, Wikifunctions will be used across at least 5 Wikimedia projects.	To validate our idea that Wikifunctions can support scalable growth in quality content, we need to roll the integration out to more Wikis and iterate to increase its value to multilingual communities. These learnings will give us more confidence as we scale up.	Amy Tsay
WE2.7 議論	By the end of Q4, one feature guides contributors to add source materials that comply with project guidelines on Commons, and one community collaboration with a strategic partner is completed on a topic of impact.	To grow access to the quality source material that’s needed to close strategic content gaps, we will: Partner with the Biodiversity Heritage Library; AfLIA; and the Wikisource Loves Manuscripts learning network. Support the acquisition and retention of content partners through more accessible reuse metrics. Guide contributors to add images and references that comply with project guidelines and increase trust in content, for example, by flagging potential issues during their upload/ addition	Alexandra Ugolnikova
WE3.1 議論	コミュニティ主導で厳選されアクセス可能な閲覧および学習体験2件を代表的なウィキに展開、非ログイン読者として経験を積んだ利用者の保持率は5%増を目指す。	このKRは、私たちのウェブサイトで新世代の読者の定着を高め、新世代の利用者がウィキペディアとのつながりを永続的に築けるように、興味を持つコンテンツをどうすれば読者がもっと簡単に発見したり学ぶ機会を得られるか、探求します。キュレーションを改め個人に引き寄せて、ブラウジングおよび学習体験をコミュニティ主導にするなどの探究と開発が含まれるはずです（例えば関連のあるコンテンツをフィードする、話題のコンテンツのお勧めや提案、コミュニティがキュレーションしたコンテンツを探索できる機会の創出など）。今年度は、はじめにブラウジング体験の一連の実験を開始する予定で、実稼働用にどれを拡張するか、またどのプラットフォームで拡張するか（ウェブかアプリまたはその両方か）を決定します。次に、これらの実験の拡張により運用環境で（読者の）保持率引き上げ効果のテストに焦点を当てます。年末までの目標として、実験を少なくとも2件、代表的なウィキで開始し、これら体験に関与した読者の保持率を正確に測定、5%増に達するかどうか確認。このKRの最適な達成に求められる条件として、ログアウトした利用者対象のA/B テストを実行できるか、読者の維持率を測定できる機器があるか。そのほか勧告事項その他をキュレーションする仕掛けの提示には、新しい API やサービスが必要な場合も見込まれます。	Olga Vasileva
WE3.2 議論	恒例のバナーや依頼メール以外のタッチポイント経由で、プラットフォームごとに寄付者の件数を50％増。	私たちの目標は、既存の寄付者に感謝しながら、収入源を多様にしようというものです。財団が過去に頼りにしてきた方法、特に毎年恒例のバナーによる募金のみに限らず、フィードバックとデータに基づいて寄付の数を増やすことに焦点を当てます。寄付者の経験をより統合し投資すると、バナーによる募金には応じない寄付者やその候補者に別の受け皿を提供するなら、私たち自身の仕事を続け、影響を拡大できると示したいと考えます。当初は指標を50%増しに設定、これはベクター2022の導入でウェブ上の寄付ボタンが見つけにくくなった点と、2023-2024年度のウィキペディア・アプリで実施した先行プロジェクトで寄付者の経験を強化したところ、寄付数が増えた点（寄付50.1%増）に基づきます。この指標の評価はプラットフォームごとに実施し、将来は異なる戦術を展開する必要があるかどうか理解するために役立てようとしており、プラットフォーム単位の傾向、それぞれの観衆の区分が行動の違いを表すかどうかに基づきます。	Jazmin Tanner
WE3.3 議論	By the end of Q2 2024-25年度の第2四半期内にボランティアの皆さんにはウィキペディア製品版の記事を対象に、新しいグラフ拡張機能を使って従来のグラフを置き換る作業を始めてもらう見込みです。	The Graph extension has been disabled for security reasons since April 2023, leaving readers unable to view many graphs that community members have invested time and energy into over the last 10 years. Data visualization plays a role in creating engaging encyclopedic content, so in FY 2024-25, we will build a new secure service to replace the Graph extension that will handle the majority of simple data visualization use cases on Wikipedia article pages. This new service will be built in an extensible way to support more sophisticated use cases if WMF or community developers choose to do so in the future. We will know we’ve achieved success when community members are successfully converting legacy graphs and publishing new graphs using the new service. We will determine which underlying data visualization library to use and which graph types to support during the initial phase of the project.	Christopher Ciufo
WE3.4 議論	Develop the capability model to improve website performance through smaller scale cache site deployments that take one month to implement while maintaining technical capabilities, security and privacy.	トラフィック・チームでは、コンテンツ配信ネットワーク（CDN＝Content Delivery Network）の保守を担当しています。このレイヤーは、頻繁にアクセスされるコンテンツやページなどをメモリとディスクにキャッシュします。利用者のリクエスト処理時間が短縮されます。第2点目は、コンテンツの保存場所を物理的に利用者の近くにすることです。これにより、データが利用者に届くまでの時間（遅延）を短縮します。昨年、南米地域の遅延を短縮する目的で、ブラジルにサイト1件を開設しました。新しいデータセンターを設置できれば素晴らしくても費用と時間が必要で完了まで、多くの作業が求められ – たとえば、昨年の作業は1年がかりで完了させました。アフリカと東南アジアにセンターを設置したいですし、世界中に設置したいと考えています。私たちの仮説では、世界中のトラフィックの少ない他の場所に小規模なサイトを立ち上げます。これらのサイトで必要なサーバは4台か5台以下です。これからコスト削減が導かれます。これら地域の利用者の待ち時間短縮に役立ち、維持にかける時間と労力も軽減されます。	Kwaku Ofori
WE3.5 議論	By the end of Q4 2024-25, interested volunteers from any Wikipedia can create charts and the task force successfully hands off maintenance to the Reader Experiences group.	The Chart extension is live in production and enabled for a select list of pilot wikis (itwiki, svwiki, hewiki). The goal of the pilot is to uncover early bugs and usability issues before we scale up the rollout to more wikis. The project mandate includes providing a successor to the graph extension on all wikis, and there is more work to enable that. The task force is also temporary, meaning the maintenance and any future feature development need to be handed off when the project winds down.	Chris Ciufo
WE4.1 議論	第3四半期末までに嫌がらせ行為や有害コンテンツに対する対策を3つ提案、データで裏打ちして、規制環境の進化に適合するものを提供します。	利用者の安全を確実にして幸福を保つことは、オンライン・プラットフォームの基本の責任です。多くの司法管轄国にはオンラインのプラットフォームに関する法律と規制があり、嫌がらせやサイバーいじめその他の有害なコンテンツに対策を講じるよう要求します。これらの問題に対処しない場合、プラットフォームは法的責任を追求され、規制や制裁にさらされると予見できます。現時点でこれらの問題がどれほど大きいか、あるいはその背後にある原因はよくわかっていません。私たちは事例証拠と手動のプロセスに大きく依存しているため、法的リスクばかりか、その他の広範な影響として、問題の過小評価、被害の拡大、風評被害、ユーザーの信頼の低下にもさらされています。嫌がらせと有害なコンテンツの発生率を測定するという強い文化を構築し、積極的に対策を実施する必要があります。	Madalina Ana
WE4.2 議論	第3四半期末までに、不正行為対策の手順用に信号を2件以上開発、悪意の利用者に関して行動の精度を向上させる。	ウィキ類は荒らしやスパム、悪用をブロックするメカニズムとして、IPアドレスのブロックに大きく依存しています。ところが個々の攻撃者を安定して識別する要素として、IP アドレスはますます役に立たなくなり、特定のIP アドレスをブロックしたばかりに、悪意のある攻撃者とたまたま同じ IP アドレスを共有した善意の利用者にまで、予期せぬ悪影響が生じる結果を招きかねません。IP アドレスの安定性の低下に IP ブロック依存度の高さが重なって、善意の利用者の巻き添え被害が高率になり、悪意のある攻撃者をターゲットにする精度と効果が低下しました。逆の状況を実現したいものです。つまり、巻き添え被害のレベルが低下し、攻撃者を対象とした緩和の精度も向上させたい。このKRで提言したいのは、役務者の不正行為対策をもっと支援したり、基本単位を提供するには、既存^※1や新しいツールで利用と再利用することで個人とそのアクションを確実に関連付け（ソックパペットの軽減）、悪意のある行為者をより正確に補足できるように既存の信号^※2を組み合わせることです。（※:12＝チェックユーザや特別:Blockほか。2＝IP アドレスやアカウント履歴や申請の属性。）	Kosta Harlan
WE4.3 議論	対策の適用とトラフィック量の維持にかかる時間をシミュレーションで測定し、大規模な分散攻撃の有効性を50%減。	インターネットの状況が進化して、大規模なボットネットの台頭、より頻繁な攻撃などにより、大規模な不正行為を制限する従来の方法は時代遅れになりました。左記の攻撃により、インフラにリクエストが殺到してサイトが利用できなくなったり、大規模な破壊行為に対抗するコミュニティの能力が圧倒されたりする可能性があります。すると高次の権限を持つ編集者や技術コミュニティにも本来はないはずの負担がかかります。このような攻撃に対して早急に改善する必要がある能力とは、それらを自動的に検出し耐え切り、軽減または停止するものです。改善は、実際に攻撃されたときの頻度や強度の計測だけに頼ることは不可能で、なぜなら、それでは外部の行動に依存することになり、私たちの進歩を明確に定量的に把握できなくなります。攻撃を受けていない状態で新しい対策のテストや、改善点の客観的な報告を実現するは、性質や複雑さや期間が異なる複数の攻撃を予測して、私たちのインフラに対して無害な状態で走らせ、それを四半期ごとに実施します。	Giuseppe Lavagetto
WE4.4 議論	一時的なアカウントは全てのウィキ・プロジェクトで使えるようにすること。	一時的なアカウントは IP の公開をめぐるさまざまな規制要件に準拠する解決策であり、私たちのプラットフォーム上のさまざまなサーフェスに適用されます。この作業には多くの製品やデータ・パイプライン、機能ツールおよびさまざまなボランティアのワークフローの更新が含まれ、追加のアカウント種別の存在に対処します。 This work involves updating many products, data pipelines, functionary tools, and various volunteer workflows to cope with the existence of an additional type of account.	Niharika Kohli
WE5.1 議論	第3四半期末までに、プラットフォームの継続可能性を増加させる介入を5件以上完成させる。	メディアウィキのプラットフォームが持続可能かどうかは常に基本となる取り組みであり、開発者の満足度を向上させて広げるか低下を回避し、技術コミュニティを成長させる能力向上ための要点です。測定が難しく、技術および社会の要因に左右されがちです。それでも私たちには暗黙の知識があり、持続可能性を目指すなら特定の分野で戦略的な改善が必要だと気づいているはずです。そこで介入を計画すると、プラットフォームの持続可能性と保守性の向上、プラットフォームの劣化回避に役立つ可能性があります。私たちはこの取り組みの影響を第4四半期に評価し、持続可能性の今後の目標に何を推奨すべきか検討する予定であり、対象は例えば次の項目です。ウィキメディアの中核となる複雑なコードのドメインについて、その仕組みを知っている人がほんの一握りしかいないものを簡素化。コードベースの品質を知らせるためコード分析ツールの使用を増やすこと。パッケージ化やリリースなどのプロセスを合理化。	Mateus Santos
WE5.2 議論	1件以上の介入を第2四半期末までに識別を終え、第4四半期末までに完成を目指し、メディアウィキのエコシステムにおけるプログラミング用インターフェースを進化させ、より簡略で持続可能な独立した機能開発を強化します。	KR 5.2 の主な目標は、メディアウィキのコアなプラットフォームとその拡張機能や外装その他の部分で相互作用を改善し、明確にすることです。目的はメディアウィキのアーキテクチャの機能改善であり、実用的なモジュール性と保守性を可能にする理由は、拡張機能の開発が容易になることと、メディアウィキ製品のより広範な理想の要件を強化することにあります。この作業は、コア内や拡張機能またはそれらを結ぶインターフェイス内で、存在すべき（または存在すべきでない）ものは何かを知らせることも目指します。一年は次の2つのフェーズに分かれ、まず研究および実験段階に5ヵ月を使い、次にどんな特定の介入を実施するか、第2フェーズに引き継ぎます。	Jonathan Tweed
WE5.3 議論	第2四半期末までに、データ収集イニシアチブ1件とパフォーマンス改善実験1件を完了、メディアウィキの能力を活用し^※フォローアップ製品とプラットフォームの介入に受け渡す。（※＝ページは構造化された断片の構造であるとモデル化する。）	ここで言う主な目標とは、百科事典コンテンツの現在および将来のニーズに対応できるように、開発者と製品マネージャーにメディアウィキの新プラットフォームの機能活用を任せて、現状では実装が困難な新製品の提供と、プラットフォームのパフォーマンスと回復力を向上させることです。具体的にはその特定のプラットフォームのレベルで、メディアウィキの処理モデルにおいてページの扱い方を移し、モノリシック単位ではなく構造化されたコンテンツ単位の構成にしたいと考えています。これに向けた暗黙の動きとは、Parsoid ベースの読み取りビュー、ウィキデータとの統合、ウィキファンクション（Wikifunctions）のウィキへの統合のすべてです。この KR の一環として、より意図的に実験を行ってデータを収集し将来の介入に情報を提供したい、これらの新機能に基づいてインフラと製品への意図した影響を確実に達成したいと考えます。	Subramanya Sastry
WE5.4 議論	第2四半期末までに 1.43 LTS リリースを実施、これは PHP 更新と連動する新規の MW リリース・プロセスを経由します。	MediaWiki ソフトウェアのプラットフォームはその安全と持続可能性を保つため、次の PHP バージョンへの定期的な更新に依存しており、これこそインフラの最新化にとって重要で、私たちのプロセスにおける問題点です。私たちは同時に、MediaWiki ソフトウェアの新バージョンを定期的にリリースして、それには、たとえばソフトウェアのメッセージを翻訳する translatewiki.net などのプラットフォームが依存しており、ウィキメディアのプロジェクト群その他の多くのオープンソースのプロジェクト群で使用されています。このは、るプラットフォームです。	Mateus Santos
WE5.5 議論	By the end of Q4, define an actionable strategy to evolve our API product offering to better meet staff, volunteer, and content reuser needs by simplifying the developer journey through consistent experiences, centralized access, and higher quality options for integration.	The main goal of KR 5.5 is to identify and deeply understand all existing public developer pathways for content reuse and platform integration, so that we can create a more streamlined and sustainable offering. We will do this by 1) gathering additional metrics to highlight current utilization across data access channels, 2) conducting one or more experiments that will simplify the Wikimedia developer journey, 3) delivering a comprehensive 6-pager API strategy document, and 4) creating a tactical roadmap of additional value-add opportunities that will drive adoption of supported data consumption channels. Understanding why different developer cohorts prefer certain entry points or data models will empower us to use our existing leverage as a highly trusted, high value data source, and ultimately drive downstream behaviors through our preferred mechanisms. Additionally, by streamlining our integration offerings and Wikimedia developer journey, we will increase overall sustainability by reducing maintenance costs and developer onboarding complexity to ensure the multi-generational future of the mission.	Halley Coplin
WE6.1 議論	質問5つを解決、効率性と情報に基づいた意思決定を可能にして、第4四半期末までに開発者とエンジニアリングのワークフローとサービスの関連データにアクセスできるようにする。	質問として「ウィキメディアの制作にどのリポジトリを展開するか」と問うなら、よくある答えは「複雑だ」です。この KRでは、エンジニアリングの生産性と経験の分野において「永遠の新人」の部分を探求します - 繰り返し問われ、簡単なように見えても回答は難しい質問、答えは出せるがデータにアクセスできない質問、対象の専門家しかカスタムのクエリができないなど、プロセスの格差その他の理由で答えがなかなか得られない質問。「解決」の意味をそれぞれの質問ごとに以下を含めて定義します。ある人にとっては、単に既存の正確なデータを入手できればよいだけのことかもしれません。他の質問では、研究と技術的な時間がないと、それらに答えられないかもしれません。この作業の全体的な目的はエンジニアリングと開発者のワークフローとサービスの改善を可能にすることで、開発者の経験の重要な側面に関して洞察するときに、時間と代替の解決法と注入する努力を削減します。	[TBD]
WE6.2 議論	第4四半期末までに既存の特定のプロジェクト強化と実験を2件以上実施し、維持可能な標的型環境の提供により、安全で半連続的な配信へ向かいます。	運用環境の利用者が影響を受ける以前にバグを検出しようと、開発者も利用者もウィキメディアのベータクラスタ（ベータ版）に頼っています。時間の経過とともにベータ版の使用が拡大し、競合が発生してしまいました—用途が多様すぎて、単一の環境に収まりきらない状態。私たちは既存の代替環境1件を強化して実験を実行するつもりで、目的は、テストが必要な優先度の高い単一のニーズについて現状ではベータ版で対処しているところを、保守可能な代替環境と置き換えて各ユースケースのニーズによりよく応えることにあります。	Tyler Cipriani
WE6.3 議論	By Q4 Create and implement a Toolforge sustainability scoring framework, designed to easily accommodate the addition of new categories and items over time. Apply it to improve at least one critical platform aspect by Q4 and inform longer-term strategy.	ウィキメディアのボランティアによって構築されたツールは、Toolforge という主要なプラットフォームにあり、編集から破壊行為対策まで重要な役割を果たしています。私たちの目標は、Toolforge の使いやすさ向上、貢献をはばむ障壁を低くして、コミュニティの実践を改善し、確立された方針の順守を促進することです。このため技術および社会の側面に着目した持続可能性を評価する数値化システムを、第2四半期末までに Toolforge プラットフォームに導入の予定です。このシステムを目安に使い、重要な技術要素1件を 50% 向上させるよう目指します。	Joanna Borun

信号とデータサービス（SDS）の主な成果（Signals & Data Services） [ 趣旨 ]
主な成果の短縮名	主な成果の本文	主な成果の本文	主催者
SDS1.1 議論	第3四半期末までに、プログラム2件もしくはKR主導の取り組みでは各チームの作業がコア指標1件以上に及ぼす直接の影響を評価し終えた。	財団の目標に向けた進捗状況を評価するには、私たちの中心的な組織指標がその主要なツールとして機能します。プログラムのリソース割り当て、主な成果（KR）指向のワークストリーム設定の際に、これら高次の指標は、年次計画に定義した財団の包括的な目標と、これら投資をどう結びつけるか、導く必要があります。主な成果をめぐるこの研究により、あらかじめ見込んだ介入すべての影響を高次または中核的な指標に定量的に結び付ける能力において、財団全体がまだまだ初歩段階にあると認められます。このKRの最終目標の追求では、目的は私たちの取り組みと高次の指標を論理および理論の両面でつなぎ共有するプロセス開発としています。実際の手順ではプロジェクトのレベルで得る取り組みの成果を、財団レベルのコア指標にどのようにリンクさせ、どんな波及効果がありそうか理解すること、財団各所の取り組み単位で主催者（オーナー）と提携することを意味します。現在、財団は目標の初期段階にあり、プログラムまたは製品主導の取り組みを実行し、それらの活動の影響をコア財団レベルの指標に帰するよう努力しています。この目標の追求には、この KR は次の項目の実現を目的とします。すなわち特定する候補プログラムまたは製品主導型イニシアチブは2件以上、コア指標への影響を評価する評価戦略を設計し、この評価戦略を実行することです。始める取り組みを2件と設定すると実際に分析をする際の評価の問題点を早く把握でき、すると自分たちのどの作業がコア指標にどのような影響を及ぼしたのか、目に見える形として理解できます。この KR からの学びが情報提供する戦略はより範囲が広く、財団がこの測定戦略を適用する取り組みの範囲を広げるよう、そして量的にも増やすように呼びかけます。	Omari Sefu
SDS1.2 議論	戦略的なオープンリサーチの質問3件について2026年度の年間計画に推奨事項や情報を提供したりするため、2024年12月までに回答すること。	ウィキメディアのエコ・システムには研究上の未解決の質問がまだたくさんあり、WMF または提携団体にとってその質問のいくつかに答えを求めることは戦略となります。これらの質問に対する答えは、将来の製品や技術の開発に役立つ可能性があり、また政策分野の意思決定や権利擁護を支えることにも結びつきます。これらの質問の一部は、純粋に調査研究または研究工学の専門知識の活用で答えが出せますが、信頼できる洞察に達するウィキメディア・プロジェクトの社会技術的な性質を考慮すると、多くの場合、データ収集や文脈構築、利用者の相互作用、チーム間の協調による実験などの慎重な設計が必要になります。私たちはこの KR を通じてリソースの一部を優先して割り振り、それらの質問に少なくとも1つ答えようと目指しています。この KR の作業には、戦略に関する未解決の質問一覧に優先順位を付けること、実験作業の末、そのうちの X 個（現時点の目標値は2個）の回答探しが含まれます。ここで取り組む質問の理想としては、ここで見つけた回答が他の複数のチームやグループにとっても鍵になり、それぞれが抱える製品や技術または方針の作業を（情報を得て、よりよく？）行うという波及効果を期待できるものです。この KR の作業で、以下の KR を補完したいと意図しています。 PES1.3. 焦点は既存の製品に基づき、プラットフォーム上の製品もしくは機能の発想の実験。 FA1.1. 焦点は、将来の観衆に関して、人工知能／機械学習（AI/ML）を使っ実験すること。	Leila Zia
SDS1.3 議論	3つの中心的な主要指標に関して、データの利害関係者がデータの理解と追跡に要する平均所要時間を最低でも50%減らすこと	データ・ガバナンス標準の必須事項。データセットの出どころを遡って求めたり変換することは困難であり、異なるリポジトリやシステムの理解が欠かせません。私たちのシステムにおいて、データの流れをもっと理解しやすくすることが重要で、そうするとデータの利害関係者はもっと自立自助により作業ができるようになります。この作業が役立つ作業手順とは、データの変換と利用の目的が分析、機能、API、データ品質のジョブである場合です。指標の文書化については、フォローアップ用の KR を実施する予定。	Luke Bowmaker
SDS1.4 議論	By the end of Q4, three data pipelines dependent on the wikitext history data set will have weekly delivery guarantees (SLOs) for the wikitext history input sources.	Currently 6 known internal data pipelines rely on the monthly Dumps 1 data dumps. Some of them are blocked or cause downstream system failures if Dumps 1 data dumps are not provided in time. Migration to the new tables will provide guarantees on delivery, i.e. improve reliability. Even in the case of a pipeline outage of a few days, a timely weekly update can be guaranteed. This work: Improves reliability of the delivery of data Reduces time to deliver critical data used for essential and core metric reporting from 1 month to 1 day, which further reduces the impact of short term data incidents Mitigates risk from dumps monthly run taking more than a month to complete due to data growth and limitations of current implementation Removes blocker for PHP 8 upgrade, which eventually runs risk from not being able to run on the latest supported version of Linux Prevents the splintering of internal Mediawiki deployment systems as Dumps 1.0 is not migratable to MW on k8s Places the Dumps system in a sustainable state using established standard technologies, therefore reducing overall cost of ownership, and removing knowledge silos	Andreas Hoelzl
SDS2.1 議論	第2四半期末までに支援対象とする製品チーム1ヵ所について、基本的なA/Bテストを用いた特定の機能もしくは製品の評価を行い、利用者相互作用のデータ所要時間は目標50%減。	共有ツールの採用は、製品チームがデータ基準の意思決定をしたり、効率性と生産性を高めて製品をめぐる戦略と革新をもたらすと考えます。ログイン利用者向けの UX および技術システムを確立すると、SDS 2.3 の実現に向けた長期目標に向かって前進することができ、つまりその作業と並行して、ログアウト利用者対象の A/B テストを支援できます。これを採用したチームごとに、利用者の相互作用のデータ取得時間をベースラインとして、50％短縮を目指します。あわせて製品部門全チームを見回し、より完全な文脈を想定して、これらの利点をどのように文脈化できるか検討します。採用したチームから集めるフィードバックならびにSDS 2.2.の成果に基づき、体験の改善と向上する能力の特定と優先順位の決定について学びがあると期待します。	Virginia Poundstone
SDS2.2 議論	第2四半期末には、実験分析（A/Bテスト）の基本的な指標3件が得られ、2024-25事業年度のKR類に関して、製品および／または機能の仮説実証に対応できるようにする。	プロダクト・マネージャー（PM）（またはデザイナー）が立てた仮説で、製品／機能がユーザーまたは組織の問題／ニーズに対処するとした場合、その仮説をテストすると指標に対するアイデアの潜在的な影響を実験により知ることができます。実験の結果を受け取ったPMは、それを次に取るべきアクションの決定に役に立てます（この考えを放棄して別の仮説を試す、開発ライフサイクルの早い段階で実行した実験なら開発を続行、あるいはまた製品／機能をより多くのユーザーにリリースする）。PMは立場上、自信を持ってそのような決定を下す必要があり、それは信頼し理解する証拠の裏付けを求められます。製品チームは現状、これに関して高いハードルを抱えており、プロジェクト固有のカスタム指標を用いて仮説を策定しているため、専任のアナリストに依頼して仮説の定義と測定、分析とレポートをしてもらう必要があります。テスト可能な製品／機能の仮説の声明を定式化するなら一連の必須指標に切り替えるわけで、次のようになります。その仮説の検証に用いるなら、より簡便かつ迅速に実験の設計や展開、分析ができる。実験から得た結果や学びを、意思決定者その他の対象者に容易に伝達できる（前者はPM、後者は上級リーダーその他、組織内の個人やコミュニティなど）。広く理解され一貫して使われる一連の指標 – かつ業界標準の指標から情報を得て影響を受けるもの – は重要であり、同時に組織のデータリテラシー向上、評価と実験学習の文化を促すと考えられる。（data literacy。）重要な指標として着目するのは2種類で、（1）製品／機能の成功と影響の最適な測定と評価に必要な2つの「Wiki Experiences KRs」 – 略号WE3.1 と略号WE1.2 – 、(2) 業界標準指標を反映または描き出しウェブ分析で採用するもの。	Mikhail Popov
SDS2.3 議論	私たちの CDN に対応する一意のエージェントを追求する仕組みを実装して、匿名の利用者を対象とした製品機能を A/B テストにかけられるようにします。	そのような追跡の仕組みがないままだと、匿名の読者を対象とした A/B テストの実施は不合理です。これは基本的に新しい技術的機能を作成するマイルストーンを基準とする成果であり、それを下地として他の人が測定可能なものを構築できるものです。主要な優先事例とは、匿名の読者を対象とした機能の A/B テストですが、他の重要な将来の機能もこの作業によって可能になり、すなわち後に WE4.x で（リクエストのリスク評価と大規模攻撃の緩和など）後続の仮説を作成するものであり、リソースと優先順位が許す限り、一意のデバイス数に関する指数／調査が成立します。	Brandon Black
SDS2.4 議論	By end of Q4 FY24/25, successfully enable one product team to run an A/B test on anonymous users for a first paint feature while maintaining privacy compliance and data integrity.	In order to centralize the work, we will merge outstanding FY24-25 SDS 2.3 KR work into this new KR SDS 2.4. This release introduces anonymous user experimentation capabilities, focusing on enabling A/B testing for “first paint features” that render on pageload. This is a new A/B testing capability that will unlock our ability to better design for anonymous readers and learn more about the differences and similarities between users who are both logged out or logged in. Success depends on the completion of Edge Uniques deployment and involves close collaboration between Traffic, Experiment Platform Team, Legal, Security, SRE, Product, Data Engineering, Data Platform SRE, and Movement Communications teams. While the system will have known limitations as an MVP, it provides essential learnings about our shared components and scalability needs.	Virginia Poundstone

将来の観衆の主な成果（FA＝Future Audiences） [ 趣旨 ]
主な成果の短縮名	主な成果の本文	主な成果の本文	主催者
FA1.1 議論	将来の観衆について実験的な洞察と推奨事項の成果は、第3四半期終了時に得るものとし、FA担当以外のチームが所有する目標または主な成果を翌年の年次計画の草案に1つ以上取り入れます。	2020年以降、財団は外部トレンドを追跡し、将来世代の知識消費者や知識貢献者に奉仕する私たちの能力、今後数世代にわたって無償の知識運動として繁栄を続けることに影響を与える可能性のある要素を見守ってきました。ここで言う「将来世代の観衆」とは小規模なR&Dチームを指し、担当は次のとおり【研究開発】。時間設定をした実験を迅速に実施（予算年次単位で実験は最低3件）、これらの傾向に対処する方法を探る WMFが追求する本格運用の新規投資 – すなわち新製品もしくは新規プログラムであり専用のチーム（複数可）を配置するべきもの – の提言は、恒例の年次計画の策定期間に実験から得た洞察に基づいて実施します。つまりチーム全体で取り組む必要がある新製品やプログラムについての推奨事項を作成します。この主な成果達成には、所有するチームがFA担当以外の目標または主な成果であり、FAの推奨事項に裏打ちされたものが少なくとも1件、次の会計年度の年次計画の草案に載っていることが条件となります。	Maryana Pinchuk

製品と技術支援の主な成果（PES＝Product and Engineering Support） [ 趣旨 ]
主な成果の短縮名	主な成果の本文	主な成果の本文	主催者
PES1.1 議論	評価の文化：四半期ごとの調査でP+T職員の感情スコアを徐々に改善し、その範囲は配信、アライメント、方向性、チームの健全性とする。	評価の文化とは、製品開発の文化であり、より短いサイクルで実施する反復と学習、適応に基づきます。私たちの組織が年間目標を設定できると示してはいても、これらの目標達成に向けた私たちの行動は学習につれて変わっていき、年間を通じて適応するものです。評価の文化構築には要素が2つあり、それはプロセスと行動です。この KR は後者に焦点を当てます。行動の変化は評価の文化を成長させる要因であり、強化もできます。これは、より反復的な製品開発へと移行するため、個人の習慣や慣行（ルーチン）を変えることを含みます。この KR は、個人が自己申告する自分の行動の変化に基づき、その果てに職員の感情に変化があったなら、その変化を測定します。	Amy Tsay
PES1.2 議論	第2四半期末までに、新しい要望リストを運動のアイデアや要求を財団のP + T活動によりよく結びつけます（製品と技術）。要望リストのバックログ項目は2024-25年次の主な成果（KR）を介して対処、小さな要望10件を完了した財団は、ボランティアと連携して2025-26年度の機会の領域を3件以上、特定する。	コミュニティ要望一覧は、ウィキメディア運動のごく狭い部分を表しています（Community Wishlist）。参加者はおよそ1千名、そのほとんどが寄稿者または管理者です。 Phabricator 経由で機能のリクエストやバグ報告を書いて、要望一覧を回避することがしばしばありますが、財団やコミュニティが発するリクエストの識別は困難です。参加者にとって要望一覧とは、費用対効果を考えると時間の投資をしても見返りは最小限です。要望一覧とは、影響力のあるバグや機能改善への注意喚起、より広範で戦略的な機会の必要性を知らせる唯一の手段だと感じるため、参加者は今でも要望一覧に関与しつづけています。要望の書き方が問題提起ではなく解決策である場合が散見されます。机上では一見、懸命な対策に思える解決策ですが、必ずしも技術的には複雑かどうか、あるいはウィキメディア運動の戦略に及ぼす影響まで考慮してあるわけではありません。場合によっては、要望の範囲や広さがコミュニティ技術部門あるいは単一のチームの担当範囲や能力を超越してしまい、不満が長引いたり、コメント募集（RFC）、はたまた要望一覧そのものの廃止論にまで発展してしまいます。コミュニティ参加者側は、プロジェクトに関するアイデア出しに要望一覧を使いたがるものの、財団のチーム側では優先順位付けの時点で、要望一覧その他の受け入れ手順を比較検討したがる傾向があり、その背景には、要望の流れは年次計画の立案過程にとってタイミングが良くない点、ロードマップおよび／または OKR に組み込みにくい点があります。「将来の要望一覧」とはコミュニティと財団との架け橋になるものとして、コミュニティからは構造化した要望を差し出してもらい、私たちの側はそれに応じた行動を起こせるし、ボランティアの側にとって嬉しい事態となります（Future Wishlist）。ログインしたボランティアの皆さんを対象に、365日いつでも要望を受け取る新規プロセスを作成中です。要望とはバグの通報や詳細の提示、改善してほしいことの申し入れ、あるいはまた新機能に関するアイデア出しを呼び寄せるものです。要望には誰でもコメントを書けるし、ワークショップの主催者になったり、サポートを引き受けて優先順位付けに影響を与えたりできます。財団は要望を「大きすぎる」または「小さすぎる」などとは分類しません。さらに大きな問題領域にテーマ別にマッピングできる要望は、年次計画やチームのロードマップに影響を与え、戦略的な方向性と機会を提供すると考えられます。要望を運動に向けて見える化するには、プロジェクトや製品および／または問題領域、要望の種類ごとに分類したダッシュボードに表示します。当財団は要望にタイムリーに対応し、コミュニティと連携して要望の分類と優先順位付けにあたります。私たちはウィキメディアンと協力して、2025-26年度の財団年次計画に組み込むべき改善分野3件を特定し優先順位を付け、そこから影響力のある願望の採用率と実現率が向上すると見込まれます。私たちは、ボランティア開発者のコミュニティと財団チーム双方のため、広範囲にわたる要望にフラグを立て、チームと開発者の関与を深めたり、そこから満了する要望がますます増え、コミュニティの満足につながります。より多くの要望に応えると、投稿者の幸福度や役立っているという感覚、定着率が上がり、質が高めの編集やコンテンツを生み出し、読者がさらに増えるはずです。	Jack Wheeler
PES1.3 議論	第1四半期と第2四半期時点の消費者とボランティアの聴衆にとって知識の目的地となるには、ウィキペディアをどのように成長させればよいのか、既存の探索型の製品および／または機能からデータや洞察を提供する2つを選び、実験して完了します。ウィキ体験バケットでは第3四半期末までに、将来の OKR 作業に採用できるであろう学習と推奨事項をまとめあげ、共有します。	この取り組みは「将来の観衆」の目標に相当するものでありながら、重点は、もっと多くのプラットフォーム上の製品アイデアをより迅速にテストしたり、既存の視聴者（ウィキペディアの消費者と寄稿者）の関与を増やし深める機会を明白にすることに置いています。これらはエネルギーの供給と増殖させるものとして PES1 に組み込まれ、もっと有望な機能に焦点を当て - 主に対する従のプロジェクトの実験やハッキングに個人やチームが「すでに」費やした時間を活用する形になります。この KR は、これら副次のプロジェクトの停滞（限られたリソースを有効に活用していない）ではなく、これらアイデアの一部を実証済みの実験により規模がさらに大きな APP 設定に取り込めそうな道筋を提供します。これは翻って、職員が勤務時間をさらに効率的に使い、創造性と生産性、やる気の高揚が期待できます。こうした小規模で短期のプロジェクト群をより多く実施に導くと、ウィキペディアを変革する可能性のあるアイデアの学習や試行を現在の視聴者のニーズや期待の変化にもっと適合させる「賭け」も広がり多様化します。これは私たちの作業をもっと効果的かつ迅速に進めるため、財団にとっては、より短期に目標に向けて正しく調整できる一助になります。	Rita Ho
PES1.4 議論	手順を理解する：SLOの設定、監視、決定。SLO類のリリース時には、何か新しい点を選んで定義する。特定のSLOの定義は適切なチームと協力する（複数可、通常は製品、開発、SREのいずれかのチーム）。将来のリリースに関するガイドラインを作ってSLOを備えるものとその設定方法を書き、繰り返し見直す。	KRの将来：プロセスと初歩的なツールを設けて、新しいリリースのSLOを設定して監視します。四半期ごとに報告し、それを使って、修正作業の対象と優先順位付けをする（あるいはしない）タイミングを決めます。レポートはコミュニティと共有します。根拠：何かを修正するときに、作業の優先順位をどのタイミングで決めればよいのか、まだわかりません。しかも私たちのコードはとても多いのです。フットプリントは拡大し続け、問題の対処かイノベーションか、どちらに焦点を当てるべきか決定を迫られる状況が増えるにつれて、タイミングはますます不確実になります。それとは別に、当チームが信頼性とパフォーマンスにどう対応し／注力のレベルはどうなのか、コミュニティにも職員にも明確ではありません。サービスのレベルをどう予想しているか示すと、対象ごとにリソースを割り当てるべきかそうではないか、タイミングを判断できるはずです。	Mark Bergsma
PES1.5 議論	Define ownership and commitments (including SLOs) on services and learn how to track, report and make decisions as a standard and scalable practice by trialing it in 3 teams across senior leaders in the department.	PES1.5 の一環として編集チェック機能（EditCheck）の SLO を共同で定義した後、実際に SLO を試用して学習し、信頼性作業の優先順位付けに役立てます。また、コード／サービスの所有権の役割と責任を文書化して、継続的な支援のレベルについて共通の明確な関与を作成できるようにします。部門全体の3チームが、これらを使用し実践としようとしています。	Mark Bergsma
PES1.6 議論	The 2025/26 annual plan (i.e. by the end of Q4) includes hypotheses NOT assigned to the CommTech team that directly address at least 3 different Wishlist focus areas.	The revamped community wishlist bets on our ability to influence WMF product teams to incorporate and adopt wishes, so that the community tech team better disperses responsibilities of fulfilling wishes across the movement.	Jack Wheeler
PES1.7 議論	By Q4, we have completed 1 experiment and identified 3 further experiments for improving initial response to wishes (resulting in a status change) next FY	We want to bring added clarity and rigor to the way in which the foundation engages with new wishes, to improve contributor satisfaction and engagement with the wishlist. In improving how we process wishes, we believe that the foundation will be better equipped to prioritize wishes.	Jack Wheeler

仮説

以下に述べる仮説は四半期単位に個別に取り組むもので、関連がある上述の主な成果に対処します。

それぞれの仮説とは、主要な成果達成に役立つと思われる実験または実験段階です。当チームはある仮説を立ててテストしたら、その結果に基づいて繰り返し検討するか、まったく異なる新しい仮説を作成します。仮説とはチームが時間の賭けをしていると考えてみると – チームは数週間の小さな賭けも、数ヵ月がかりの大きな賭けもできるとは言え、チームが費やす時間は、リスク調整後の報酬に見合う必要があります。私たちの仮説は機敏に素早く適応することを目指します。四半期のどの時点でも仮説の廃止や調整または新規開始が可能です。

仮説のいちばん直近の状態を知るには、そして／あるいは担当チームと特定の仮説について協議するには、以下にあるリンクを選んでクリック、それぞれのプロジェクトへ遷移してください。

第1四半期（Q1）

ウィキメディア財団における年次計画の第1四半期（Q1）とは、7月9月時期に当たります。

ウィキ体験の仮説（WE＝Wiki Experiences） [ WE Key Results ] 議論
仮説の短縮名	Q1 本文	詳細と議論
WE1.1.1	If we expand the Event List to become a Community List that includes WikiProjects, then we will be able to gather some early learnings in how to engage with WikiProjects for product development.
WE1.1.2	If we identify at least 15 WikiProjects in 3 separate Wikipedias to be featured in the Community List, then we will be able to advise Campaigns Product in the key characteristics needed to build an MVP of the Community List that includes WikiProjects.
WE1.1.3	If we consult 20 event organizers and 20 WikiProject organizers on the best use of topics available via LiftWing, then we can prioritize revisions to the topic model that will improve topical connections between events and WikiProjects.
WE1.2.1	If we build a first version of the Edit Check API, and use it to introduce a new Check, we can evaluate the speed and ease with other teams and volunteers could use the API to create new Checks and Suggested Edits.
WE1.2.2	If we build a library of UI components and visual artefacts, Edit Check’s user experience can extend to accommodate Structured Tasks patterns.
WE1.2.3	If we conduct user tests on two or more design prototypes introducing structured tasks to newcomers within/proximate to the Visual Editor, then we can quickly learn which designs will work best for new editors, while also enabling engineers to assess technical feasibility and estimate effort for each approach.	mw:Growth/Constructive activation experimentation
WE1.2.4	If we train an LLM on detecting "peacock" behavior, then we can learn if it can detect this policy violation with at least >70% precision and >50% recall and ultimately, decide if said LLM is effective enough to power a new Edit Check and/or Suggested Edit.
WE1.2.5	If we conduct an A/B/C test with the alt-text suggested edits prototype in the production version of the iOS app we can learn if adding alt-text to images is a task newcomers are successful with and ultimately, decide if it's impactful enough to implement as a suggested edit on the Web and/or in the Apps.	mw:Wikimedia Apps/iOS Suggested edits project/Alt Text Experiment
WE1.3.1	自動仲裁機能 Automoderator の行動の追加カスタマイズを可能にし、 Q1 でパイロット版プロジェクトのフィードバックに基づいて変更を加えれば、その機能セットと信頼性に満足する仲裁者は増えて、それをウィキメディアのプロジェクト群で使おうとするし、その結果、製品の採用が増加するはずです。	mw:Automoderator
WE1.3.2	要望のサブセットを調停者関連の重点分野と解釈し直して、これら重点分野を第1から第2四半期にかけてコミュニティの意見として共有できれば、第3四半期にリリースしたときには、選んだ重点分野が調整役の満足度向上に役立つという自信が高まります。
WE2.1.1	If we build a country-level inference model for Wikipedia articles, we will be able to filter lists of articles to those about a specific region with >70% precision and >50% recall.	m:Research:Language-Agnostic Topic Classification/Countries
WE2.1.2	If we build a proof-of-concept providing translation suggestions that are based on user-selected topic areas, we will be set up to successfully test whether translators will find more opportunities to translate in their areas of interest and contribute more compared to the generic suggestions currently available.	mw: Translation suggestions: Topic-based & Community-defined lists
WE2.1.3	If we offer list-making as a service, we’ll enable at least 5 communities to make more targeted contributions in their topic areas as measured by (1) change in standard quality coverage of relevant topics on the relevant wiki and (2) a brief survey of organizer satisfaction with topic area coverage on-wiki.
WE2.1.4	If we developed a proof of concept that adds translation tasks sourced from WikiProjects and other list-building initiatives, and present them as suggestions within the CX mobile workflow, then more editors would discover and translate articles focused on topical gaps. By introducing an option that allows editors to select translation suggestions based on topical lists, we would test whether this approach increases the content coverage in our projects.	mw:Translation suggestions: Topic-based & Community-defined lists
WE2.2.1	If we expand Wikimedia's State of Languages data by securing data sharing agreements with UNESCO and Ethnologue, at least one partner will decide to represent Wikimedia’s language inclusion progress in their own data products and communications. On top of being useful to our partner institutions, our expanded dataset will provide important contextual information for decision-making and provide communities with information needed to identify areas for intervention.
WE2.2.2	過去 2 年間を費やしてウィキメディアンの皆さんが解説文書の各言語版作成に注いだ努力をマッピングした場合は、データに基づいたベースラインを作成し、新しい言語版の立ち上げ（オンボーディング）に関するコミュニティの経験に提供します。
WE2.2.3	If we provide production wiki access to 5 new languages, with or without Incubator, we will learn whether access to a full-fledged wiki with modern features such as those available on English Wikipedia (including ContentTranslation and Wikidata support, advanced editing and search results) aids in faster editing. Ultimately, this will inform us if this approach can be a viable direction for language onboarding for new or existing languages, justifying further investigation.	mw:Future of Language Incubation
WE2.3.1	If we make two further improvements to media upload flow on Commons and share them with community, the feedback will be positive and it will help uploaders make less bad uploads (with the focus on copyright) as measured by the ratio of deletion requests within 30 days of upload. This will include defining designs for further UX improvements to the release rights step in the Upload Wizard on Commons and rolling out an MVP of logo detection in the upload flow.	phab:T347298 phab:T349641
WE2.4.1	関数ウィキ呼び出しのプロトタイプを構築して MediaWiki のコンテンツ内に埋め込みが成功すると、MediaWiki の非同期コンテンツ処理パイプラインを使う準備が整い、第2四半期にそのパフォーマンスの実現可能性をテストできるようになるはずです。	phab:T261472
WE2.4.2	ウィキペディアのどれかひとつのウィキ群で初期のウィキ関数の使用事例の初期設計プロトタイプを作成できた場合、仮説1で述べたように、第2四半期でパフォーマンスの実現可能性を検証した段階で統合を構築してテストする準備が整うはずです（仮説1を参照）。	phab:T363391
WE2.4.3	If we make it possible for Wikifunctions users to access Wikidata lexicographical data, they will begin to create natural language functions that generate sentence phrases, including those that can handle irregular forms. If we see an average monthly creation rate of 31 for these functions, after the feature becomes available, we will know that our experiment is successful.	phab:T282926
WE3.1.1	Designing and qualitatively evaluating three proofs of concept focused on building curated, personalized, and community-driven browsing and learning experiences will allow us to estimate the potential for increased reader retention (experiment 1: providing recommended content in search and article contexts, experiment 2: summarizing and simplifying article content, experiment 3: making multitasking easier on wikis.
WE3.1.3	If we develop models for remixing content such as a content simplification or summarization that can be hosted and served via our infrastructure (e.g. LiftWing), we will establish the technical direction for work focused on increasing reader retention through new content discovery features.
WE3.1.4	If we analyze the projected performance impact of hypothesis WE3.1.1 and WE3.1.2 on the Search API, we can scope and address performance and scalability issues before they negatively affect our users.
WE3.1.5	If we enhance the search field in the Android app to recommend personalized content based on a user's interest and display better results, we will learn if this improves user engagement by observing whether it increases the impression and click-through rate (CTR) of search results by 5% in the experimental group compared to the control group over a 30-day A/B test. This improvement could potentially lead to a 1% increase in the retention of logged out users.	phab:T370117
WE3.2.1	If we create a clickable design prototype that demonstrates the concept of a badge representing donors championing article(s) of interest, we can learn if there would be community acceptance for a production version of this method for fundraising in the Apps.	Fundraising Experiment in the iOS App
WE3.2.2	Increasing the prominence of entry points to donations on the logged-out experiences of the web mobile and desktop experience will increase the clickthrough rate of the donate link by 30% Year over Year	phab:T368765
WE3.2.3	If we make the “Donate” button in the iOS App more prominent by making it one click or less away from the main navigation screen, we will learn if discoverability was a barrier to non banner donations.
WE3.3.1	特定のデータ視覚化ライブラリを選択し、サーバ・レンダ型の新グラフ化サービスの初期バージョンを7月末までに使えるようにすると、ウィキマニア参加のボランティアから、私たちが取り組んでいる解決策を使うと、果たして従来のグラフを置き換え可能かどうか、教えてもらうことができます。
WE4.1.1	If we implement a way in which users can report potential instances of harassment and harmful content present in discussions through an incident reporting system, we will be able to gather data around the number and type of incidents being reported and therefore have a better understanding of the landscape and the actions we need to take.
WE4.2.1	If we explore and define Wikimedia-specific methods for a unique device identification model, we will be able to define the collection and storage mechanisms that we can later implement in our anti-abuse workflows to enable more targeted blocking of bad actors.	phab:T368388
WE4.2.9	If we provide contextual information about reputation associated with an IP that is about to be blocked, we will see fewer collateral damage IP and IP range blocks, because administrators will have more insight into potential collateral damage effects of a block. We can measure this by instrumenting Special:Block and observing how behavior changes when additional information is present, vs when it is not.	WE4.2.9 Talk page
WE4.2.2	If we define an algorithm for calculating a user account reputation score for use in anti-abuse workflows, we will prepare the groundwork for engineering efforts that use this score as an additional signal for administrators targeting bad actors on our platform. We will know the hypothesis is successful if the algorithm for calculating a score maps with X% precision to categories of existing accounts, e.g. a "low" score should apply to X% of permanently blocked accounts	WE4.2.2 Talk page
WE4.2.3	If we build an evaluation framework using publicly available technologies similar to the ones used in previous attacks we will learn more about the efficacy of our current CAPTCHA at blocking attacks and could recommend a CAPTCHA replacement that brings a measurable improvement in terms of the attack rate achievable for a given time and financial cost.
WE4.3.1	既知の攻撃中に Webリクエスト・ログに機械学習とデータ分析ツールを適用すると、悪意のあるトラフィックばかりを送信する不正な IP アドレス類を最低80%超の精度で特定し、瀬戸際でレート制限をかけることができると、利用者の信頼性が向上します。	phab:T368389
WE4.3.2	If we limit the load that known IP addresses of persistent attackers can place on our infrastructure, we'll reduce the number of impactful cachebusting attacks by 20%, improving reliability for our users.
WE4.3.3	If we deploy a proof of concept of the 'Liberica' load balancer, we will measure a 33% improvement in our capacity to handle TCP SYN floods.
WE4.3.4	If we make usability improvements and also perform some training exercises on our 'requestctl' tool, then SREs will report higher confidence in using the tool.	phab:T369480
WE4.4.1	If we run at least 2 deployment cycles of Temp Accounts we will be able to verify this works successfully.
WE5.1.1	If we successfully roll out Parsoid Read Views to all Wikivoyages by Q1, this will boost our confidence in extending Parsoid Read Views to all Wikipedias. We will measure the success of this rollout through detailed evaluations using the Confidence Framework reports, with a particular focus on Visual Diff reports and the metrics related to performance and usability. Additionally, we will assess the reduction in the list of potential blockers, ensuring that critical issues are addressed prior to wider deployment.
WE5.1.2	If we disable unused Graphite metrics, target migrating metrics using the db-prefixed data factory and increase our outreach efforts to other teams and the community in Q1, then we would be on track to achieve our goal of making Graphite read-only by Q3 FY24/25, by observing an increase of 30% in migration progress.
WE5.1.3	If we implement a canonical url structure with versioning for our REST API then we can enable service migration and testing for Parsoid endpoints and similar services by Q1.	phab:T344944
WE5.1.4	ブラウザの追跡防止対策が認証機能 CentralAuth の自動ログインに与える影響軽減を実現し、残りの作業を完了、さらに、回復力の高い認証インフラ（SUL3）に移行できた場合、第2四半期に本番環境の wiki に展開する準備が整うはずです。
WE5.1.5	If we increase the coverage of Sonar Cloud to include key MediaWiki Core repos, we will be able to improve the maintainability of the MediaWiki codebase. This hypothesis will be measured by spliting the selected repos into test and control groups. These groups will then be compared over the course of a quarter to measure impact of commit level feedback to developers.
WE5.2.1	If we make a classification of the types of hooks and extension registry properties used to influence the behavior of MediaWiki core, we will be able to focus further research and interventions on the most impactful.	Simplify feature development
WE5.2.2	If we explore a new architecture for notifications in MW core and Echo, we will discover new ways to provide modularity and new ways for extensions to interact with core.	Simplify feature development
WE5.3.1	If we instrument parser and cache code to collect template structure and fine-grained timing data, we can quantify the expected performance improvement which could be realized by future evolution of the wikitext parsing platform.	T371713
WE5.3.2	On template edits, if we can implement an algorithm in Parsoid to reuse HTML of a page that depends on the edited template without processing the page from scratch and demonstrate 1.5x or higher processing speedup, we will have a potential incremental parsing solution for efficient page updates on template edits.	T363421
WE5.4.1	MediaWiki 技術グループがリリース・プロセスの説明責任を果たし、第2四半期末までに製品戦略に沿って広報連絡プロセスの強化ができた場合は、現在の計画外の作業やボランティア作業に依存しているプロセスを排除し、リリース・プロセスに対するコミュニティの満足度向上が可能です。この測定は LTS 1.43 リリースに対するコミュニティのフィードバックに組み合わせて、リリース・プロセスに求められるボランティアならびに職員の計画外の大幅に削減した作業時間で計数できます。
WE5.4.2	If we research and build a process to more regularly upgrade PHP in conjunction with our MediaWiki release process we will increase speed and security while reducing the complexity and runtime of our CI systems, by observing the success of PHP 8.1 upgrade before 1.43 release.
WE6.1.1	If we design and complete the initial implementation of an authorization framework, we’ll establish a system to effectively manage the approval of all LDAP access requests.
WE6.1.2	If we research available documentation metrics, we can establish metrics that measure the health of Wikimedia technical documentation, using MediaWiki Core documentation as a test case.	mw:Wikimedia Technical Documentation Team/Doc metrics
WE6.1.3	If we collect insights on how different teams are making technical decisions we are able to gather good practices and insights that can enable and scale similar practices across the organization.
WE6.2.1	MediaWiki や拡張機能、外装やウィキメディア定義をバージョン単位でビルドして1日1件以上を公開する場合は、新しい制約を発見し、ビルド実行に必要な経過実時間のベースラインを確立します（wallclock time）。	mw:Wikimedia Release Engineering Team/Group -1
WE6.2.2	If we replace the backend infrastructure of our existing shared MediaWiki development and testing environments (from apache virtual servers to kubernetes), it will enable us to extend its uses by enabling MediaWiki services in addition to the existing ability to develop MediaWiki core, extensions, and skins in an isolated environment. We will develop one environment that includes MediaWiki, one or more Extensions, and one or more Services.	wikitech:Catalyst
WE6.2.3	If we create a new deployment UI that provides more information to the deployer and reduce the amount of privilege needed to do deployment, it will make deployment easier and open deployments to more users as measured by the number of unique deployers and number of patches backported as a percentage of our overall deployments.	Wikimedia Release Engineering Team/SpiderPig
WE6.2.4	If we migrate votewiki, wikitech and commons to MediaWiki on Kubernetes we reap the benefits of consistency and no longer need to maintain 2 different infrastructure platforms in parallel, allowing to reduce the amount of custom written tooling, making deployments easier and less toilous for deployers. This will be measured by a decrease in total deployment times and a reduction in deployment blockers.	タスク「T292707」
WE6.2.5	If we move MultiVersion routing out of MediaWiki, we 'll be able to ship single version MediaWiki containers, largely cutting down the size of containers allowing for faster deployments, as measured by the deployment tool.	SingleVersion MW: Routing options
WE6.3.1	By consulting toolforge maintainers about the least sustainable aspects of the platform, we will be able to gather a list of potential categories to measure.
WE6.3.2	By creating a "standard" tool to measure the number of steps for a deployment we will be able to assess the maximal improvement in the deployment process.
WE6.3.3	If we conduct usability tests, user interviews, and competitive analysis to explore the existing workflows and use cases of Toolforge, we can identify key areas for improvement. This research will enable us to prioritize enhancements that have the most significant impact on user satisfaction and efficiency, laying the groundwork for a future design of the user interface.

信号&データ・サービスの仮定（SDS＝Signals & Data Services） [ SDS Key Results ] 議論
仮説の短縮名	Q1 本文	詳細と議論
SDS 1.1.1	If we partner with an initiative owner and evaluate the impact of their work on Core Foundation metrics, we can identify and socialize a repeatable mechanism by which teams at the Foundation can reliably impact Core Foundation metrics.
SDS1.2.2	If we study the recruitment, retention, and attrition patterns among long-tenure community members in official moderation and administration roles, and understand the factors affecting these phenomena (the ‘why’ behind the trends), we will better understand the extent, nature, and variability of the phenomenon across projects. This will in turn enable us to identify opportunities for better interventions and support aimed at producing a robust multi-generational framework for editors.	phab:T368791
SDS1.2.1	If we gather use cases from product and feature engineering managers around the use of AI in Wikimedia services for readers and contributors, we can determine if we should test and evaluate existing AI models for integration into product features, and if yes, generate a list of candidate models to test.	phab:T369281 Meta Page
SDS1.3.1	If we define the process to transfer all data sets and pipeline configurations from the Data Platform to DataHub we can build tooling to get lineage documentation automatically.
SDS 1.3.2	If we implement a well documented and understood process to produce an intermediary table representing MediaWiki Wikitext History, populated using the event platform, and monitor the reliability and quality of the data we will learn what additional parts of the process are needed to make this table production ready and widely supported by the Data Platform Engineering team.
SDS2.1.2	If we investigate the data products current sdlc, we will be able to determine inflection points where QTE knowledge can be applied in order to have a positive impact on Product Delivery.
SDS2.1.3	Growth チームが指標プラットフォーム（ Metrics ）にホームページ・モジュールを実装してプラットフォームについて学びを得た場合、第1四半期に測定計画の概要を作成し、新しい指標プラットフォームで A/B テストを完了する準備が第2四半期末までに整う見込みです。
SDS2.1.4	If we conduct usability testing on our prototype among pilot users of our experimentation process, we can identify and prioritize the primary pain points faced by product managers and other stakeholders in setting up and analyzing experiments independently. This understanding will lead to the refinement of our tools, enhancing their efficiency and impact.
SDS2.1.5	If we design a documentation system that guides the experience of users building instrumentation using the Metrics Platform, we will enable those users to independently create instrumentation without direct support from Data Products teams, except in edge cases.	phab:T329506
SDS2.2.1	If we define a metric for logged-out mobile app reader retention, which is applicable for analyzing experiments (A/B test), we can provide guidance for planning instrumentation to measure retention rate of logged out readers in the mobile apps and enable the engineering team to develop an experiment strategy targeting logged out readers.
SDS2.2.2	If we define a standard approach for measuring and analyzing conversion rates, it will help us establish a collection of well-defined metrics to be used for experimentation and baselines, and start enabling comparisons between experiments/projects to increase learning from these.
SDS2.2.3	If we define a standard way of measuring and analyzing clickthrough rate (CTR) in our products/features, it will help us design experiments that target CTR for improvement, standardize click-tracking instrumentation, and enable us to make CTR available as a target metric to users of the experimentation platform.
SDS2.3.1	If we conduct a legal review of proposed unique cookies for logged out users, we can determine whether there are any privacy policy or other legal issues which inform the community conversation and/or affect the technical implementation itself.

将来の観客の仮定（FA＝Future Audiences） [ FA Key Results ] 議論
仮説の短縮名	Q1 本文	詳細と議論
FA1.1.1	If we make off-site contribution very low effort with an AI-powered “Add a Fact” experiment, we can learn whether off-platform users could help grow/sustain the knowledge store in a possible future where Wikipedia content is mainly consumed off-platform.	m:Future Audiences/Experiment:Add a Fact

Product and Engineering Support (PES) Hypotheses [ PES の主な成果 ] 議論
仮説の短縮名	Q1 本文	詳細と議論
PES1.1.1	If the P&T leadership team syncs regularly on how they’re guiding their teams towards a more iterative software development culture, and we collect baseline measurements of current development practices and staff sentiment on how we work together to ship products, we will discover opportunity areas for change management. The themes that emerge will enable us to build targeted guidance or programs for our teams in coming quarters.
PES1.2.2	If the Moderator Tools team researches the Community Wishlist and develops 2+ focus areas in Q1, then we can solicit feedback from the Community and identify a problem that the Community and WMF are excited about tackling.
PES1.2.3	If we bundle 3-5 wishes that relate to selecting and inserting templates, and ship an improved feature in Q1, then CommTech can take the learnings to develop a Case Study for the foundation to incorporate more "focus areas" in the 2025-26 annual plan.
PES1.3.1	If we provide insights to audiences about their community and their use of Wikipedia over a year, it will stimulate greater connection with Wikipedia – encouraging greater engagement in the form of social sharing, time spent interacting on Wikipedia, or donation. Success will be measured by completing an experimental project that provides at least one recommendation about “Wikipedia insights” as an opportunity to increase onwiki engagement.	Wikipedia user insights
PES1.3.2	If we create a Wikipedia-based game for daily use that highlights the connections across vast areas of knowledge, it will encourage consumers to visit Wikipedia regularly and facilitate active learning, leading to longer increased interaction with content on Wikipedia. Success will be measured by completing an experimental project that provides at least one recommendation about gamification of learning as an opportunity to increase onwiki engagement.	Wikipedia games
PES1.3.3	ウィキメディアのハック・イベントで将来の実験を育むために新しいプロセス／トラックを開発したとすると、将来の年次計画プロジェクトへのパイプラインの役割を引き受け、そのようなイベントの影響と価値が高まり、ボランティアと技術／設計職員のつながりは深まるし、戦略的取り組みへの関与も深まります。成功したと評価されるには、財団が支援するイベントから少なくとも PES1.3 プロジェクトが1件以上始まり、OKR に進むこととされます。	Incubator space
PES1.4.1	If we draft an SLO with the Editing team releasing Edit Check functionality, we will begin to learn and understand how to define and track user-facing SLOs together, and iterate on the process in the future.
PES1.4.2	If we define and publish SLAs for putting OOUI into “maintenance mode”, growth of new code using OOUI across Wikimedia projects will stay within X% in Q1.
PES1.4.3	If we map ownership 第1四半期にサービス・カタログ案を使用して既知の所有サービスの所有権をマッピングすると、年末までにサービス・カタログの重大なギャップを特定できるようになり、結果として SLO 文化の解決に役立ちます。

第2四半期（Q2）

ウィキメディア財団における年次計画の第2四半期（Q2）とは、10月12月時期に当たります。

ウィキ体験の仮説（WE＝Wiki Experiences） [ WE Key Results ] 議論
仮説の短縮名	Q2 本文	詳細と議論
WE1.1.1	If we expand the Event list to become a Community List that includes WikiProjects, then we will be able to gather some early learnings in how to engage with WikiProjects for product development.	Campaigns/Foundation Product Team/Event list
WE1.1.2	If we launch at least 1 consultation focused on on-wiki collaborations, and if we collect feedback from at least 20 people involved in such collaborations, then we will be able to advise Campaigns Product on the key characteristics needed to develop a new or improved way of connecting.	Campaigns/WikiProjects
WE1.1.3	If we consult 20 event organizers and 20 WikiProject organizers on the best use of topics available via LiftWing, then we can prioritize revisions to the topic model that will improve topical connections between events and WikiProjects.
WE1.1.4	If we integrate CampaignEvents into Community Configuration in Q2, 第2四半期にキャンペーン・イベント拡張機能（CampaignEvents）をコミュニティ構成に統合すると、第3四半期には少なくともウィキ5件がこの拡張機能を有効にする選択をして、ツールの使用が増えることになるはずです。
WE1.2.2	If we build a library of UI components and visual artifacts, Edit Check’s user experience can extend to accommodate Structured Tasks patterns.
WE1.2.5	If we conduct an A/B/C test with the alt-text suggested edits prototype in the production version of the iOS app we can learn if adding alt-text to images is a task newcomers are successful with and ultimately, decide if it's impactful enough to implement as a suggested edit on the Web and/or in the Apps.
WE1.2.6	アカウントの新規所有者にウィキペディア記事の構造化タスク「リンクを追加」を紹介すると、モバイルで建設的に活動する新規アカウント所有者の割合は、ベースラインと比較して10%増になると予想されます。
WE1.3.1	If we enable additional customisation of Automoderator's behaviour and make changes based on pilot project feedback in Q1, more moderators will be satisfied with its feature set and reliability, and will opt to use it on their Wikimedia project, thereby increasing adoption of the product.	mw:Moderator Tools/Automoderator
WE1.3.3	第2四半期中にヌーク拡張機能のユーザ体験と機能が改善できると、この製品に対する管理者の満足度は同四半期末には5パーセントポイント向上する見込みです。	mw:Extension:Nuke/2024 Moderator Tools project
WE2.1.3	If we offer list-making as a service, we’ll enable at least 5 communities to make more targeted contributions in their topic areas as measured by (1) change in standard quality coverage of relevant topics on the relevant wiki and (2) a brief survey of organizer satisfaction with topic area coverage on-wiki.
WE2.1.4	If we developed a proof of concept that adds translation tasks sourced from WikiProjects and other list-building initiatives, and present them as suggestions within the CX mobile workflow, then more editors would discover and translate articles focused on topical gaps. By introducing an option that allows editors to select translation suggestions based on topical lists, we would test whether this approach increases the content coverage in our projects.
WE2.1.5	If we expose topic-based translation suggestions more broadly and analyze its initial impact, we will learn which aspects of the translation funnel to act on in order to obtain more quality translations.
WE2.2.4	If we provide production wiki access to 5 new languages, with or without Incubator, we will learn whether access to a full-fledged wiki with modern features such as those available on English Wikipedia (including ContentTranslation and Wikidata support, advanced editing and search results) aids in faster editing. Ultimately, this will inform us if this approach can be a viable direction for language onboarding for new or existing languages, justifying further investigation.
WE2.2.5	If we move addwiki.php to core and customize it to Wikimedia, we will improve code quality in our wiki creation system making it testable and robust, and we will make it easy for creators of new wikis and thereby make significant steps towards simplifying wiki creation process.	phab:T352113
WE2.3.2	If we make two further improvements to media upload flow on Commons and share them with community, the feedback will be positive and it will help uploaders make less bad uploads (with the focus on copyright) as measured by the ratio of deletion requests within 30 days of upload. This will include release of further UX improvements to the release rights step in the Upload Wizard on Commons and automated detection of external sources.
WE2.3.3	If the BHL-Wikimedia Working Group creates Commons categories and descriptive guidelines for the South American and/or African species depicted in publications, they will make 3,000 images more accessible to biodiversity communities. (BHL = Biodiversity Heritage Library)
WE2.4.1	関数ウィキ呼び出しのプロトタイプをメディアウィキのコンテンツ内に埋め込んで構築して、安定性をローカルでテストした場合、当該のパフォーマンスの実現可能性テストは第2四半期にメディアウィキの非同期コンテンツ処理パイプラインを使って実施する準備が整うはずです。	phab:T261472
WE2.4.2	ウィキペディアのどれかひとつのウィキ群で初期のウィキ関数の使用事例の初期設計プロトタイプを作成できた場合、仮説1で述べたように、第2四半期でパフォーマンスの実現可能性を検証した段階で統合を構築してテストする準備が整うはずです。	phab:T363391
WE2.4.3	If we make it possible for Wikifunctions users to access Wikidata lexicographical data, they will begin to create natural language functions that generate sentence phrases, including those that can handle irregular forms. If we see an average monthly creation rate of 31 for these functions, after the feature becomes available, we will know that our experiment is successful.	phab:T282926
WE3.1.3	If we develop models for remixing content such as a content simplification or summarization that can be hosted and served via our infrastructure (e.g. LiftWing), we will establish the technical direction for work focused on increasing reader retention through new content discovery features.	Research
WE3.1.6	If we introduce a personalized rabbit hole feature in the Android app and recommend condensed versions of articles based on the types of topics and sections a user is interested in, we will learn if the feature is sticky enough to result in multi-day usage by 10% of users exposed to the experiment over a 30-day period, and a higher pageview rate than users not exposed to the feature.	Rabbit Holes
WE3.1.7	If we run a qualitative experiment focused on presenting article summaries to web readers, we will determine whether or not article summaries have the potential to increase reader retention, as proxied by clickthrough rate and usage patterns
WE3.1.8	If we build one feature which provides additional article-level recommendations, we will see an increase in clickthrough rate of 10% over existing recommendation options and a significant increase in external referrals for users who actively interact with the new feature.
WE3.2.2	Increasing the prominence of entry points to donations on the logged-out experiences of the Vector web mobile and desktop experience will increase the clickthrough rate of the donate link by 30% YoY.	mw:Readers/2024 Reader and Donor Experiences
WE3.2.3	If we make the “Donate” button in the iOS App more prominent by making it one click or less away from the main navigation screen, we will learn if discoverability was a barrier to non banner donations.	Navigation Refresh
WE3.2.4	If we update the contributions page for logged-in users in the app to include an active badge for someone that is an app donor and display an inactive state with a prompt to donate for someone that decided not to donate in app, we will learn if this recognition is of value to current donors and encourages behavior of donating for prospective donors, informing if it is worth expanding on the concept of donor badges or abandoning it.	Private Donor Recognition Experiment
WE3.2.5	If we create a Wikipedia in Review experiment in the Wikipedia app, to allow users to see and share personalized data about their reading, editing, and donation habits, we will see 2% of viewers donate on iOS as a result of this feature, 5% click share and, 65% of users rating the feature neutral or satisfactory.	Personalized Wikipedia Year in Review
WE3.2.7	Increasing the prominence of entry points to donations on the logged-out experiences of the Minerva web mobile and desktop experience will increase the clickthrough rate of the donate link by 30% YoY.
WE3.3.2	If we develop the Charts MVP and get it working end-to-end in production test wikis, at least two Wikipedias + Commons agree to pilot it before the code freeze in December.
WE3.4.1	If we were to explore the feasibility by doing an experiment of setting up smaller PoPs in cloud providers like Amazon, we can expand our data center map and reach more users around the world, at reduced cost and increased turn-around time.
WE4.1.2	If we deploy at least one iteration of the Incident Reporting System MVP on pilot wikis, we will be able to gather valuable data around the frequency and type of incidents being reported.	Incident Reporting System
WE4.2.1	If we explore and define Wikimedia-specific methods for a unique device identification model, we will be able to define the collection and storage mechanisms that we can later implement in our anti-abuse workflows to enable more targeted blocking of bad actors.
WE4.2.9	If we provide contextual information about reputation associated with an IP that is about to be blocked, we will see fewer collateral damage IP and IP range blocks, because administrators will have more insight into potential collateral damage effects of a block. We can measure this by instrumenting Special:Block and observing how behavior changes when additional information is present, vs when it is not.
WE4.2.2	If we define an algorithm for calculating a user account reputation score for use in anti-abuse workflows, we will prepare the groundwork for engineering efforts that use this score as an additional signal for administrators targeting bad actors on our platform. We will know the hypothesis is successful if the algorithm for calculating a score maps with X% precision to categories of existing accounts, e.g. a "low" score should apply to X% of permanently blocked accounts.
WE4.2.3	If we build an evaluation framework using publicly available technologies similar to the ones used in previous attacks we will learn more about the efficacy of our current CAPTCHA at blocking attacks and could recommend a CAPTCHA replacement that brings a measurable improvement in terms of the attack rate achievable for a given time and financial cost.
WE4.3.1	既知の攻撃中に Webリクエスト・ログに機械学習とデータ分析ツールを適用すると、悪意のあるトラフィックばかりを送信する不正な IP アドレス類を最低80%超の精度で特定し、瀬戸際でレート制限をかけることができると、利用者の信頼性が向上します。
WE4.3.3	If we deploy a proof of concept of the 'Liberica' load balancer, we will measure a 33% improvement in our capacity to handle TCP SYN floods.
WE4.3.5	By creating a system that spawns and controls thousands of virtual workers in a cloud environment, we will be able to simulate Distributed Denial of Service (DDoS) attacks and effectively measure the system's ability to withstand, mitigate, and respond to such attacks.
WE4.3.6	If we integrate the output of the models we built in WE 4.3.1 with the dynamic thresholds of per-ip concurrency limits we've built for our TLS terminators in WE 4.3.2, we should be able to increase our ability to neutralize automatically attacks with 20% more volume, as measured with the simulation framework we're building.
WE4.3.7	If we roll out a user-friendly web application that enables assisted editing and creation of requestctl rules, requestctl ルールの編集と作成を支援するユーザーフレンドリーな Web アプリを展開した場合、SRE はキャッシュバスティング攻撃の軽減を、確立したベースラインよりも 50% 短時間で実現できるはずです。
WE4.4.2	If we deploy Temporary Accounts to a set of small-to-medium sized projects, we will be able to the functionality works as intended and will be able to gather data to inform necessary future work.	Trust and Safety Product/Temporary Accounts
WE5.1.1	If we successfully roll out Parsoid Read Views to all Wikivoyages by Q1, this will boost our confidence in extending Parsoid Read Views to all Wikipedias. We will measure the success of this rollout through detailed evaluations using the Confidence Framework reports, with a particular focus on Visual Diff reports and the metrics related to performance and usability. Additionally, we will assess the reduction in the list of potential blockers, ensuring that critical issues are addressed prior to wider deployment.
WE5.1.3	現状で rest_v1/page/html および rest_v1/page/title パスで公開中のエンドポイントは、メディアウィキ（MW）の同等のコンテンツのエンドポイントにルート付け替えすると（再ルーティング）、第1四半期にクライアントを中断することなく、RESTbase の廃止をブロック解除できるはずです。
WE5.1.4	ブラウザの追跡防止対策が認証機能 CentralAuth の自動ログインに与える影響軽減を実現し、残りの作業を完了、さらに、回復力の高い認証インフラ（SUL3）に移行できた場合、第2四半期に本番環境の wiki に展開する準備が整うはずです。
WE5.1.5	If we increase the number of relevant SonarCloud rules enabled for key MediaWiki Core repositories and refine the quality of feedback provided to developers, we will optimize the developer experience and enable them to improve the maintainability of the MediaWiki codebase in the future. This will be measured by tracking developer satisfaction levels and whether test group developers feel the tool is becoming more useful and effective in their workflow. Feedback will be gathered through surveys and direct input from developers to evaluate the perceived impact on their confidence in the tool and the overall development experience.
WE5.1.7	If we represent all content module endpoint responses (10 in total) in our MediaWiki REST API OpenAPI spec definitions, we will be able to implement programmatic validation to guarantee that our generated documentation matches the actual responses returned in code.
WE5.1.8	If we introduce support for endpoint description translation (ie: does not include actual object definitions or payloads) into our generated MediaWiki REST API OpenAPI specs, we can lay the foundation to support Wikimedia’s expected internationalization standards.
WE5.2.3	If we conduct an experiment to reimplement at least [1-3] existing Core and Extension features using a new Domain Event and Listener platform component pattern as an alternative to traditional hooks, we will be able to confirm our assumption of this intervention enabling simpler implementation with more consistent feature behavior.
WE5.3.3	If we instrument both parsers to collect availability of prior parses and timing of template expansions, and to classify updates and dependencies, we can prioritize work on selective updates (Hypothesis 5.3.2) informed by the quantification of the expected performance benefits.
WE5.3.4	If we can increase the capability of our prototype selective update implementation in Parsoid using the learnings from the 5.3.1 hypothesis, we can leverage more opportunities to increase the performance benefit from selective update.
WE5.4.1	MediaWiki の技術者グループがリリースプロセスの説明責任を果たして、第2四半期末までに製品戦略に沿って広報連絡プロセスを強化できる場合、現状で計画外の作業やボランティア作業に依存するプロセスを排除し、リリース・プロセスに感じるコミュニティ満足度を向上できます。その測定は、LTS リリース1.43 に対するコミュニティのフィードバックと、リリースのプロセスに要するボランティアならびに財団職員が負担する計画外の作業時間数の大幅な削減に依拠します。
WE5.4.2	If we research and build a process to more regularly upgrade PHP in conjunction with our MediaWiki release process we will increase speed and security while reducing the complexity and runtime of our CI systems, by observing the success of PHP 8.1 upgrade before 1.43 release.
WE6.1.3	If we collect insights on how different teams are making technical decisions we are able to gather good practices and insights that can enable and scale similar practices across the organization.
WE6.1.4	If we research solutions for indexing the code of all projects hosted in WMF’s code repositories, we will be able to pick a solution that allows our users to quickly discover where the code is located whenever dealing with incident response or troubleshooting.
WE6.1.5	If we test a subset of draft metrics on an experimental group of technical documentation collections, we will be able to make an informed decision about which metrics to implement for MediaWiki documentation.	Wikimedia Technical Documentation Team/Doc metrics
WE6.2.1	MediaWiki や拡張機能、外装や Wikimedia 構成にバージョンを付けてビルドし、1 日に少なくとも1回ずつ公開すると新しい制約が明らかになり、ビルドの実行に必要な実時間のベースラインも確立します。	mw:Wikimedia Release Engineering Team/Group -1
WE6.2.2	If we replace the backend infrastructure of our existing shared MediaWiki development and testing environments (from apache virtual servers to kubernetes), it will enable us to extend its uses by enabling MediaWiki services in addition to the existing ability to develop MediaWiki core, extensions, and skins in an isolated environment. We will develop one environment that includes MediaWiki, one or more Extensions, and one or more Services.	wikitech:Catalyst
WE6.2.3	If we create a new deployment UI that provides more information to the deployer and reduce the amount of privilege needed to do deployment, it will make deployment easier and open deployments to more users as measured by the number of unique deployers and number of patches backported as a percentage of our overall deployments.	mw:SpiderPig
WE6.2.5	If we move MultiVersion routing out of MediaWiki, we 'll be able to ship single version MediaWiki containers, largely cutting down the size of containers allowing for faster deployments, as measured by the deployment tool.	https://docs.google.com/document/d/1_AChNfiRFL3VdNzf6QFSCL9pM2gZbgLoMyAys9KKmKc/edit
WE6.2.6	If we gather feedback from QTE, SRE, and individuals with domain specific knowledge and use their feedback to write a design document for deploying and using the wmf/next OCI container, then we will reduce friction when we start deploying that container.	T379683
WE6.3.4	If we enable the automatic deployment of a minimal tool, we will be able to evaluate the end to end flow and set the groundwork to adding support for more complex tools and deployment flows.	phab:T375199
WE6.3.5	持続可能性カテゴリとそれに関連する指標の相対的な重要性をそれぞれに評価すると、標準化された採点システムが作成可能です。このシステムを実装して記録すると、時間経過に沿って Toolforge の持続可能性の進捗状況を測定し比較する基準が得られます。	phab:T376896
WE6.3.6	If we conduct discovery, such as target user interviews and competitive analysis, to identify existing Toolforge pain points and improvement opportunities, we will be able to recommend a prioritized list of features for the future Toolforge UI.	Phab:T375914

信号&データ・サービスの仮説（SDS＝Signals & Data Services） [ SDS Key Results ] 議論
仮説の短縮名	Q2 本文	詳細と議論
SDS 1.1.1	If we partner with an initiative owner and evaluate the impact of their work on Core Foundation metrics, we can identify and socialize a repeatable mechanism by which teams at the Foundation can reliably impact Core Foundation metrics.
SDS1.2.1.B	If we test the accuracy and infrastructure constraints of 4 existing AI language models for 2 or more high-priority product use-cases, we will be able to write a report recommending at least one AI model that we can use for further tuning towards strategic product investments.	Phab:T377159 Learn more.
SDS1.2.2	If we study the recruitment, retention, and attrition patterns among long-tenure community members in official moderation and administration roles, and understand the factors affecting these phenomena (the ‘why’ behind the trends), we will better understand the extent, nature, and variability of the phenomenon across projects. This will in turn enable us to identify opportunities for better interventions and support aimed at producing a robust multi-generational framework for editors.	Learn more.
SDS1.2.3	If we combine existing knowledge about moderators with quantitative methods for detecting moderation activity, we can systematically define and identify Wikipedia moderators.	T376684
SDS1.3.1.B	If we integrate the Spark / DataHub connector for all production Spark jobs, we will get column-level lineage for all Spark-based data platform jobs in DataHub.
SDS1.3.2.B	If we implement a frequently run Spark-based MariaDB MW history data querying job, reconciliate missing events and enrich them, we will provide a daily updated MW history wikitext content data lake table.
SDS2.1.1	If we create an integration test environment for the proposed 3rd party experimentation solution, we can collaborate practically with Data SRE, SRE, QTE, and Product Analytics to evaluate the solution’s viability within WMF infrastructure in order to make a confident build/install/buy recommendation.	mw:Data Platform Engineering/Data Products/work focus
SDS2.1.3	Growth チームが指標プラットフォームにホームページ・モジュールを実装し、そのプラットフォームについて学んだ場合、測定計画の概要を第1四半期に作成する準備が整うから、新規の指標プラットフォームにおいて第2四半期末までに A/B テストを完了できる見込みです。
SDS2.1.4	If we conduct usability testing on our prototype among pilot users of our experimentation process, we can identify and prioritize the primary pain points faced by product managers and other stakeholders in setting up and analyzing experiments independently. This understanding will lead to the refinement of our tools, enhancing their efficiency and impact.
SDS2.1.5	If we design a documentation system that guides the experience of users building instrumentation using the Metrics Platform, we will enable those users to independently create instrumentation without direct support from Data Products teams, except in edge cases.	タスク「T329506」
SDS2.1.7	If we provide a function for user enrollment and a mechanism to capture and store CTR events to a monotable in a pre-declared event stream we can ship MPIC Alpha in order to launch an basic split A/B test on logged in users.
SDS2.2.2	換算率の測定および分析に使う標準的なアプローチを定義した場合、実験やベースラインに使用する指標を明確に定義して、まとまりとして確立し、実験／プロジェクト間で比較しやすくすること、そこから得る学びの増大に役立つはずです。
SDS2.3.1	If we conduct a legal review of proposed unique cookies for logged out users, we can determine whether there are any privacy policy or other legal issues which inform the community conversation and/or affect the technical implementation itself.

将来の観客の仮説（FA＝Future Audiences） [ FA Key Results ] 議論
仮説の短縮名	Q2 本文	詳細と議論
FA1.1.1	If we make off-site contribution very low effort with an AI-powered “Add a Fact” experiment, we can learn whether off-platform users could help grow/sustain the knowledge store in a possible future where Wikipedia content is mainly consumed off-platform.	Experiment:Add a Fact

Product and Engineering Support (PES) Hypotheses [ PES の主な成果 ] 議論
仮説の短縮名	Q1 本文	詳細と議論
PES1.2.4	コミュニティ要望リストを第2四半期の初めに調査してタスクの優先順位付けの重点分野を見切った場合、仲裁役の満足度向上につながる作業の特定と優先順位付けが可能で、展開は第3四半期に開始できます。
PES1.2.5	第2四半期に6件以上の重点分野を公開し、コミュニティのフィードバックを受け取れば、2025・26年の年次計画に最低3件以上の重点分野を提示して組み込む自信が得られます。
PES1.2.6	By introducing favouriting templates, we will improve the number of templates added via the template dialog by 10%.
PES1.3.4	If we create an experience that provides insights to Wikipedia Audiences about their community over the year, it will stimulate greater connection with Wikipedia – encouraging engagement in the form of social sharing, time spent interacting on Wikipedia, or donation.
PES1.4.1	If we draft an SLO with the Editing team releasing Edit Check functionality, we will begin to learn and understand how to define and track user-facing SLOs together, and iterate on the process in the future.
PES1.4.2	SLA 類を定義して公開し、OOUI を「保守モード」に持ち込むことができたら、第1四半期におけるOOUI を用いた新しいコードの発展は、ウィキメディアのプロジェクト群を超えて X% 以内に収まるはずです。
PES1.4.3	提案されたサービス・カタログを使い、第1四半期中に既知の所有サービスの所有権をマッピングすると、年度末までにサービス・カタログの重大なギャップを特定でき、すると SLO 文化の解決に役立ちます。
PES1.5.1	If we finalize and publish the Edit Check SLO draft, practice incorporating it in regular workflows and decisions, and draft a Citoid SLO, we’ll continue learning how to define and track user-facing and cross-team SLOs together.
PES1.5.2	If we clarify and define in writing a document with set of roles and responsibilities of stakeholders throughout the service lifecycle, this will enable teams to make informed commitments in the Service Catalog, including SLOs

第3四半期

The third quarter (Q3) of the WMF annual plan covers January-March.

Wiki Experiences (WE) Hypotheses [ WE Key Results ] 議論
仮説の短縮名	Q3 text	詳細と議論
WE1.1.3	If we consult 20 event organizers and 20 WikiProject organizers on the best use of topics available via LiftWing, then we can prioritize revisions to the topic model that will improve topical connections between events and WikiProjects.
WE1.1.5	If we implement at least 2 methods to discover the Collaboration List, then we will increase pageviews of the Collaboration List, thereby allowing more people to discover events and WikiProjects that interest them
WE1.1.6	If we identify and then contact 20 affiliates and/or groups connected to wikis that have high organizer activity in Q2, we can build advocacy networks that will set the stage for the extension being enabled on 3 more wikis by the end of Q3.
WE1.1.7	If we add at least 2 improvements to the Collaboration List for events, then at least 50% of surveyed respondents will find the Collaboration List to be more useful in finding events than before the changes were made.
WE1.2.5	If we conduct an A/B/C test with the alt-text suggested edits prototype in the production version of the iOS app we can learn if adding alt-text to images is a task newcomers are successful with and ultimately, decide if it's impactful enough to implement as a suggested edit on the Web and/or in the Apps.
WE1.2.7	If we deploy the Multi-Check sidebar (desktop) at all wikis where the Reference Check is available, we will unlock our ability to present multiple Edit Checks within a new "mid-edit" moment without negatively impacting the quality of new content edits newcomers publish.
WE1.2.9	If we surface the ‘Add a Link’ Structured Task to new account holders who are reading Wikipedia articles through an A/B test on pilot wikis, then we expect to increase the percentage of these people who constructively activate on mobile by 10% compared to the control group.
WE1.2.10	If the Structured Content team improves the code health of the Article-level Image Suggestions data pipeline to meet 90% of code deduplication, article and section level image suggestion separation on the index level; and adapt the image suggestion evaluation tool to be able to get baselines for quality of suggestions for target wikis, then the “Add an Image” task can be released to newcomers on additional Wikipedias. This will enable the Growth team to pursue a follow-up hypothesis focused on increasing constructive activation across at least 10 additional Wikipedias.
WE1.2.11	英語版ウィキペディアで新規参加者の5%以上に構造化タスクの「リンク追加」を展開すると、A/Bテストで測定されたように、このタスクを使える新規参加者がモバイル版で建設的な活動をする率は、ベースラインよりも10%高くなると見込まれます。
WE1.3.3	If we improve the user experience and features of the Nuke extension during Q2, we will increase administrator satisfaction of the product by 5pp by the end of the quarter.
WE1.3.4	If we improve the user experience and features of Recent Changes, we will increase administrator satisfaction of the product by 5pp.
WE1.5.1	If we create a strategy brief by February 2025, including a prioritized strategy and trade-offs, we can use it as one of the main inputs for APP25/26.
WE1.5.2	If we develop a unified measurement strategy, we will enable evaluation of the multi-year product strategy for contributors and set the landscape for prioritization of next steps in metric development and reporting
WE2.1.5	If we expose topic-based translation suggestions more broadly and analyze its initial impact, we will learn which aspects of the translation funnel to act on in order to obtain more quality translations.
WE2.1.6	If we offer list-making as a service, we’ll enable at least 5 communities to make more targeted contributions in their topic areas as measured by (1) change in standard quality coverage of relevant topics on the relevant wiki and (2) a brief survey of organizer satisfaction with topic area coverage on-wiki.
WE2.1.7	"If we developed a proof of concept that adds translation tasks sourced from WikiProjects and other list-building initiatives, and present them as suggestions within the CX mobile workflow, then more editors would discover and translate articles focused on topical gaps. By introducing an option that allows editors to select translation suggestions based on topical lists, we would test whether this approach increases the content coverage in our projects.
WE2.2.4	If we document the pre-incubator, incubator, and post-incubator journeys for the five pilot wikis with quantitative and qualitative data, we will be able to better support new languages in the future.
WE2.4.4	If we develop a live proof-of-concept, using MediaWiki’s async content processing pipeline, for the first use case of Wikifunctions in Wikipedia, we will be ready to switch it on in the new year for the Dagbani community.
WE2.6.1	If we propagate the integration of Wikifunctions from Test2Wiki to a small production Wikipedia with the MVP user experience, we will see the feature used organically without being reverted.
WE2.6.2	If we make it possible to translate sentences in Wikifunctions from something “abstract” like a function, we will see an organic increase of at least 5 multilingual functions that generate natural language sentences. This is a milestone towards building an Abstract Wikipedia.
WE3.1.6	If we introduce a personalized rabbit hole feature in the Android app and recommend condensed versions of articles based on the types of topics and sections a user is interested in, we will learn if the feature is sticky enough to result in multi-day usage by 10% of users exposed to the experiment over a 30-day period, and a higher pageview rate than users not exposed to the feature.
WE3.1.8	(Q2-Q3, web) If we build one feature which provides additional article-level recommendations, we will see an increase in clickthrough rate of 10% over existing recommendation options and a significant increase in external referrals for users who actively interact with the new feature.
WE3.1.9	If we create a daily-use Wikipedia-based trivia game in the Android app, logged-out readers who engage with this feature will open the app on multiple days within a 20-day period at a rate at least 5% higher than those who do not engage with the feature.
WE3.1.10	If we develop and test design prototypes for tabbed browsing in the Wikipedia iOS app, we will gain and incorporate actionable insights on usability, while also enabling engineers to assess technical feasibility of different approaches, building a solid foundation for adding Tabs to the app in Q4.
WE3.1.11	If we make the article search bar more prominent, we will increase the number of users who initiate searches by 8%, possibly leading to a 1% increase in search retention rate for logged out users.
WE3.2.3	If we make the “Donate” button in the iOS App more prominent by making it one click or less away from the main navigation screen, we will learn if discoverability was a barrier to non banner donations.
WE3.2.4	If we update the contributions page for logged-in users in the app to include an active badge for someone that is an app donor and display an inactive state with a prompt to donate for someone that decided not to donate in app, we will learn if this recognition is of value to current donors and encourages behavior of donating for prospective donors, informing if it is worth expanding on the concept of donor badges or abandoning it.
WE3.2.7	Increasing the prominence of entry points to donations on the logged-out experiences of the Minerva web mobile and desktop experience will increase the clickthrough rate of the donate link by 30% YoY.
WE3.2.8	If we make improvements to the personalised and collective content of the iOS apps’ Year in Review, and scale its availability, we will learn if this is an effective fundraising method.
WE3.4.1	If we were to explore the feasibility by doing an experiment of setting up smaller PoPs in cloud providers like Amazon, we can expand our data center map and reach more users around the world, at reduced cost and increased turn-around time.
WE3.5.1	If we make it possible for Commons Data namespace pages to be categorized and surface their usage across wikis, Commons admins will have the minimum tools they need to manage the increased usage of the Data namespace, ensuring we can sustainably scale up deployment to all wikis
WE3.5.2	If we improve test coverage and documentation for Charts, we will be comfortable handing off maintenance and future feature development [to reading engineering, contractors, and volunteers], allowing us to wind down the project and task force.
WE3.5.3	If we seed the Community Wishlist with Charts features we know volunteers have asked for that are out of scope for the MVP, there will be a central place for volunteers and staff to discuss future Charts-related work, allowing the future maintainers to manage expectations and source input for annual planning
WE4.1.3	If we deploy the Incident Reporting System MVP to x more wikis (representative sample) we will be able to gather valuable data that will help us identify patterns of harmful conduct across wikis
WE4.1.4	If we engage stakeholders across key departments in structured discussions, we can collaboratively define a shared vision and realistic scope for the Incident Reporting System, aligned with organizational priorities and compliance requirements, providing valuable insights to inform annual planning.
WE4.2.11a	If we define a terminology and thresholds for revert risk scores across wikis, we will make it possible to use revert risk scores in a wider range of user facing anti-abuse tools. This hypothesis impacts the WE4.2 KR by doing the background work necessary to build upon revert risk scores.
WE4.2.20	Implement a trial enablement which will gather data on the efficacy of the new CAPTCHA on enabled wikis at preventing sockpupppet account creation and bot-based spam edits to measure the efficacy and value of a production rollout of the new technology
WE4.2.15	If we analyze attributes of blocked user accounts on multiple wikis, we will identify patterns across these accounts and assign weights based on the relative importance of each attribute on block rates to use in calculating a user account reputation score. The success of this hypothesis would be measured by whether we are successful in defining a formula for multiplying attributes of an account to provide an account reputation score that maps to blocked users.
WE4.2.10	If we add two more data points to the client hints collection pipeline, we will have more entropy to better identify sockpuppets and potential ban evasion. We will know we are successful when we are able to use the client hints data to identify X% of confirmed sock puppets on en:Wikipedia:Sockpuppet investigations. Or when we are able to use the collected data to identify Y% of suspected ban invasion pair. This hypothesis directly contributes to the KR by providing new signals (browser canvas fingerprint, list of fonts) that will allow CheckUsers to more precisely target sockpuppets and accounts attempting to evade bans.
WE4.2.14b	"If we introduce IP reputation data variables in AbuseFilter variables, we will enable mitigations that can reduce the amount of submissions of vandalism, spam and abuse. Context:This directly contributes to the KR goal by introducing a new signal (IP reputation) to allow for more precision in mitigations (only actions matching the variable are impacted). We could measure the impact of this hypothesis by examining the volume of reverted edits on wikis before/after the variables are introduced. (Other ideas?) We would initially introduce variables like “is likely a VPN” or “is likely a proxy”. We could also consider exposing other variables, depending on discussions in T354599: Make IP reputation available as a variable in AbuseFilter."
WE4.2.14a	If we analyze IP reputation data associated with problematic editing activity and user accounts, we will be able to prioritize a set of IP reputation facets that can be provided as variables in AbuseFilter. This analysis would then be used by WE4.2.14b later Q3 to build out the variables in AbuseFilter, along with specific guidance about what mitigations would be reasonable to use alongside a given set of IP reputation variables. For example, the recommended mitigation for one IP reputation variable might be to block edits outright, while the recommended mitigation for a different IP reputation variable might be to tag the edit for further review, or to show a CAPTCHA.
WE4.2.18a	If we design and build a clickable component to display public data related to user account reputation to functionaries, we will be able to learn if this is useful to them by observing the number of repeat usages of the tool
WE4.3.3b	If we deploy a proof of concept of the 'Liberica' load balancer, we will measure a 33% improvement in our capacity to handle TCP SYN floods
WE 4.3.6b	If we integrate the output of the models we built in WE 4.3.1 with the dynamic thresholds of per-ip concurrency limits we've built for our TLS terminators in WE 4.3.2, we should be able to increase our ability to neutralize automatically attacks with 20% more volume, as measured with the simulation framework we're building.
WE 4.3.8	If we deploy the liberica load balancers to all datacenters, we will increase the capacity to handle TCP SYN floods by 33% everywhere
WE 4.3.9	If we establish and follow a verified procedure for the regular testing of large-scale abuse scenarios, then we will consistently measure and improve our ability to respond effectively to such incidents.
WE 4.3.10	If we define a policy for review and maintenance of requestctl rules, we will keep the system understandable and manageable over time
WE 4.3.11	If we can identify patterns and separate web scraping from general traffic, we will be able to create reporting systematically to reduce the traffic and maintain sustainability of our serving infrastructure.
WE 4.4.3	If we improve the interface of the iOS app, we will be able to clearly communicate how temporary accounts work to users as they edit without logging in, and the iOS app will be prepared for the imminent release of temporary accounts to all projects.
WE 4.4.4	If we update the data models in the data lake, and the corresponding data pipelines and dashboards, to accurately represent the new user account types, we'll be able to provide accurate analytics reporting related to activities of corresponding user types.
WE 4.4.5	If we resolve all remaining product, design and legal blockers for the engineering work that needs to be done before the major pilots deployment, we will be able to complete the engineering work on time for the next round of pilot deployment.
WE5.1.9	If we enable Parsoid on Incubator and all newly created Wikis by Q2, we’ll further ensure sustainability by not allowing the number of wikis that run on the legacy parser to grow. We will measure the success of this rollout through detailed evaluations using the Confidence Framework reports, with a particular focus on Visual Diff reports and the metrics related to performance and usability. Additionally, we will assess the reduction in the list of potential blockers, ensuring that critical issues are addressed prior to wider deployment.
WE5.1.11	The Observability team aims to sunset graphite by enabling read-only mode and disabling new metric ingest by the end of Q3 FY2024/2025. To achieve this goal, the team has set a 90% coverage target of converting the remaining dashboard and retiring legacy metrics and panels that point to graphite metrics.
WE5.1.12	If we release an interactive documentation sandbox for MediaWiki REST APIs, it will introduce a repeatable pattern for low maintenance, high quality API documentation while making the APIs easier to adopt for developers around the world. This will ensure that our API documentation is fully up to date, testable, and localized for generations of developers, while reducing the maintenance cost and increasing sustainability for API publishers.
WE5.1.13	If we roll out SUL3 for all existing accounts and new account creation across all wikis, we will ensure compatibility with browser anti-tracking measures and improve security, by moving authentication to a dedicated domain that requires user interaction and further prevents XSS vulnerabilities.
WE5.2.5	If we model at least one more page state change (e.g. PageDelete) as a PHP event and drive further adoption of in-process domain events across MediaWiki components and extensions currently utilizing event-like hooks, then we will build confidence in events as a platform sustainability pattern by improving component boundaries, improving interface flexibility, and reducing high risk boilerplate code.
WE5.2.6	If we explore designing an architecture for serializing and broadcasting events generated within MediaWiki core, we will create a foundation for offering first class event support that will enable us to consume events outside of the originating MediaWiki PHP process (e.g. JobQueue, EventBus). This will make MediaWiki data more reusable beyond the MediaWiki platform.
WE5.2.7	If we identify and align on a set of domains that can be used for MediaWiki platform events by the end of Q3, we will have an initial map of core component boundaries and can improve consistency across MediaWiki interfaces by utilizing the same domains for the MediaWiki REST API modules.
WE5.2.8	If we clearly define the concept of extension interfaces in the MediaWiki documentation, we can make it easier to develop new functionality on top of MediaWiki and provide a clearer path for defining new extension interfaces, such as Domain Events. We will measure this by identifying places in the documentation where extension interfaces are presented as “extension types” and replacing 100% of those instances.
WE5.4.3	If we enable developers with PHP8.1 MediaWiki images and infrastructure for testing them on Kubernetes, they will be able to validate and certify them to be deployed to production. If we also develop infrastructure for progressive traffic migration and use it to safely migrate production to 8.1, this helps MediaWiki drop unsupported PHP versions in the upcoming May release. Success will be observed by the ability to ramp up production traffic to PHP 8.1 instances.
WE5.4.4	If we decouple the legacy dumps processes from their current bare-metal hosts and instead run them as workloads on the DSE Kubernetes cluster, this will bring about demonstrable benefit to the maintainability of these data pipelines and facilitate the upgrade of PHP to version 8.1 by using shared mediawiki containers.
WE5.4.6	If the beta cluster is configured to run MediaWiki with PHP 8.1 then the Data Platform Engineering group and their SRE team will be able to validate whether the existing dumps code functions correctly, or whether any significant functional changes would be required.
WE5.5.1	If, by the end of January, we are able to measure and monitor Wikimedia hosted dumps traffic using log data, we will have clarity on how users are consuming the different dumps formatting options and access points. This will unblock additional metrics for overall consumption across streams, and improve our understanding of what users care about in terms of recency, data completion, and structure, so that we can tailor the overall API strategy accordingly.
WE5.5.2	If, by the end of Q3, we create a consolidated view of developer personas and use cases collected through a listening and discovery tour, then we will uncover lesser understood gaps and opportunities in this space. This will leverage existing work completed by stakeholder teams in their respective areas (eg: Dumps, WME), in addition to creating new insights by conducting interviews with WMF staff, technical volunteers, and high impact content reuse partners (eg: WME customers and prospects).
WE6.1.7	If we review the user feedback, decide on a code search and code browsing solution, deploy it to the production infrastructure as an officially supported service and enable indexing of both existing and new repositories from both code tracking systems, we will increase the scope of code that is indexed and searchable and simplify the process of locating code in day to day operations as well as during incident response.
WE6.1.8	If we analyze the documentation metrics scores from our test dataset, we can evaluate the usefulness and effectiveness of the draft metrics, collect feedback, and provide actionable insights for implementing automated metrics computation
WE6.1.9	If we transition 5 additional access groups to management within the Identity Management system, it will enhance access governance by improving efficiency, significantly reducing TOIL and improving the onboarding experience for incoming Wikimedia staff and new members of the technical communities.
WE6.2.2	If we replace the backend infrastructure of our existing shared MediaWiki development and testing environments (from apache virtual servers to kubernetes), it will enable us to extend its uses by enabling MediaWiki services in addition to the existing ability to develop MediaWiki core, extensions, and skins in an isolated environment. We will develop one environment that includes MediaWiki, one or more Extensions, and one or more Services.
WE6.2.3	If we create a new deployment UI that provides a web interface for deployments that is open to existing deployers it will allow backporters to have a shared view of deployments in progress and provide greater visibility for deployments in progress.
WE6.2.5	If we publish a planning doc to move single-version routing out of MediaWiki and gather comments from stakeholders on the implementation, then we will reduce friction during implementation.
WE6.2.6	If we gather feedback from QTE, SRE, and individuals with domain specific knowledge and use their feedback to write a design document for deploying and using the wmf/next OCI container, then we will reduce friction during when we start deploying that container.
WE6.2.7	If we make a deployment web UI available behind our single sign-on system and open it to the Wikimedia development community it will increase the number of backport deployers.
WE6.2.8	Continuing on the capabilities of Catalyst to deliver pre-merge test environments of MediaWiki and its extensions & skins on Kubernetes, if we facilitate deployments of pre-merge patches for MediaWiki services, by running pre-merge tests for Wikifunctions, then contributors will be able test more MediaWiki projects with stable, well-defined, isolated test environments.
WE6.2.9	If we test the proposed MediaWiki routing implementation with a single wiki, we will have proven the plan works and can proceed with an accelerated rollout to other wikis and we will be able to route a single version container to Wikimedia’s wiki hosting infrastructure.
WE6.3.7	By establishing detailed measurement criteria and evolution guidelines for our sustainability framework, we will create an actionable scoring system for platform improvements.
WE6.3.8	Engaging with prospective users to explore Toolforge UI’s early design prototype will help us uncover improvement opportunities and risks to be addressed in a follow-up iteration.

Signals & Data Services (SDS) Hypotheses [ SDS Key Results ] 議論
仮説の短縮名	Q3 text	詳細と議論
SDS1.1.1	If we partner with an initiative owner and evaluate the impact of their work on Core Foundation metrics, we can identify and socialize a repeatable mechanism by which teams at the Foundation can reliably impact Core Foundation metrics.
SDS1.1.2	If we assess the impact of the new South American data center (MAGRU) on our relevance metric (unique devices), we will be able to produce a report that provides insights into the return on investment of current and future data center investments.
SDS1.3.1.B	If we integrate the Spark / DataHub connector for all production Spark jobs, we will get column-level lineage for all Spark-based data platform jobs in DataHub.
SDS1.3.2.B	If we implement a frequently run Spark-based MariaDB MW history data querying job, reconciliate missing events and enrich them, we will provide a daily updated MW history wikitext content data lake table.

Future Audiences (FA) Hypotheses [ FA Key Results ] 議論
仮説の短縮名	Q3 text	詳細と議論
FA1.1.1	If we make off-site contribution very low effort with an AI-powered “Add a Fact” experiment, we can learn whether off-platform users could help grow/sustain the knowledge store in a possible future where Wikipedia content is mainly consumed off-platform.

Product and Engineering Support (PES) Hypotheses [ PES Key Results ] 議論
仮説の短縮名	Q3 text	詳細と議論
PES1.1.2	If we choose three main areas in which to highlight efforts being made to improve our culture of review, and communicate about them in the right channels, we will see improvements in the responses for iterative development, decision-making, and collaboration in the next culture survey (Jan 2025).
PES1.1.3	If we send a revised culture survey, we will identify areas where we can provide support to managers to continue strengthening our culture of review.
PES1.3.5	If we create a Wikipedia-based game for daily use that highlights the connections across vast areas of knowledge, it will encourage consumers to visit Wikipedia regularly and facilitate active learning, leading to increased interaction with content on Wikipedia and longer session lengths.
PES1.3.6	If we apply lessons from the first Sprinthackular to a second event focused on improving prototyping tools and processes, at least one Sprinthackular project will show enough value and promise that it can be integrated into the APP. We'll also be able to develop a repeatable Sprinthackular framework that other teams will recognize that they can adopt to explore any focus area!
PES1.5.1	(Starting Oct 1) If we finalize and publish the Edit Check SLO draft, practice incorporating it in regular workflows and decisions, and draft a Citoid SLO, we’ll continue learning how to define and track user-facing and cross-team SLOs together.
PES1.5.2	(Starting Oct 1) If we clarify and define in writing a document with set of roles and responsibilities of stakeholders throughout the service lifecycle, this will enable teams to make informed commitments in the Service Catalog, including SLOs

Q4

The last quarter (Q4) of the WMF annual plan covers April-June.

Wiki Experiences (WE) Hypotheses [ WE Key Results ] 議論
仮説の短縮名	Q4 text	詳細と議論
WE1.2.9	If we surface the ‘Add a Link’ Structured Task to new account holders who are reading Wikipedia articles through an A/B test on pilot wikis, then we expect to increase the percentage of these people who constructively activate on mobile by 10% compared to the control group.
WE1.2.12	If we show multiple Reference Checks within an edit session to newcomers participating in an A/B test, we will learn whether this change in Check payload/edit session causes desirable shifts in edit quality and edit completion.
WE1.2.13	If we conduct usability tests of an initial engineered version of Peacock Check with ≥10 newcomers and Junior Contributors and ≥80% of them describe the experience using terms like "helpful," "makes sense," and "clear", then we can be confident the proposed UX has the potential to lower the rate at which the new content edits are reverted on the grounds of WP:WTW (and related policies)
WE1.2.14	If we build a model that can detect peacock language within in-progress edits with 90% precision and Y inference latency, then we’ll be able to provide an editing experience that doesn't fully rely on human moderators to detect peacock language in newly-published edits.
WE1.3.4	If we improve the user experience and features of Recent Changes, we will increase administrator satisfaction of the product by 5pp.
WE1.3.6	If we improve the user experience and features of the Watchlist, we will increase patroller satisfaction of the product by 5pp.
WE1.4.1	If we develop a plan to release the CampaignEvents extension in batches based on regional targets, the extension will be released to at least 10 more wikis by mid-Q4.
WE1.4.3	If we expand how people can access Event Registration on the wikis, then we will be able to diversify the user base of the CampaignEvents extension, as measured by at least X collaborations from underrepresented audiences (such as: backlog drives, writing contests, and events organized by WikiProjects) using Event Registration by the end of Q4.
WE1.5.2	If we develop a unified measurement strategy, we will enable evaluation of the multi-year product strategy for contributors and set the landscape for prioritization of next steps in metric development and reporting
WE1.6.1	If we introduce the ability for volunteers to add template favourites, then at least 1,000 contributors will favourite 1 template.
WE2.5.2	If we make Collections and Topic-based filters easier to access for translators on desktop and mobile, more users would discover these suggestions, leading to an increase in the publication of translations suggested through these filters.
WE2.5.3	If we identify upcoming translation campaigns in Q3 and Q4, provide list-building support to organizers where needed, and make the lists visible under Collections in the Content Translation tool, we will increase the number of high-quality published articles that address topical gaps.
WE2.6.1	If we propagate the integration of Wikifunctions from Test2Wiki to a small production Wikipedia with the MVP user experience, we will see the feature used organically without being reverted.
WE2.6.2	If we make it possible to translate sentences in Wikifunctions from something “abstract” like a function, we will see an organic increase of at least 5 multilingual functions that generate natural language sentences. This is a milestone towards building an Abstract Wikipedia.
WE2.6.3	If the Content Transform Team resolves wikitext-support tasks necessary to using wikifunctions on wikipages cross-wiki, it unblocks the Abstract Wikipedia team's work to integrating wikifunctions on a small language wikipedia.
WE2.6.4	If we establish and meet performance standards, we can have confidence that rolling out Wikifunctions access to more wikis will not disrupt those wikis' experiences or colleagues' work.
WE2.6.5	If we roll out Wikifunctions access to more Wikimedia wikis, we will see wider use to deliver content and learn how well it works with different languages and communities to address content gaps.
WE2.7.2	If 3,000 well-described images of South American and/or African species are released to the wider biodiversity community through 2-3 editing events and an on-wiki worklist, 300 new images will be utilized on Spanish, French, and Portuguese Wikimedia projects.
WE3.1.9	If we create a daily-use Wikipedia-based trivia game in the Android app, logged-out readers who engage with this feature will open the app on multiple days within a 20-day period at a rate at least 5% higher than those who do not engage with the feature.
WE3.1.12	If we introduce a pre-generated summary feature as an opt-in feature on a the mobile site of a production wiki, we will be able to measure a CTR greater than 4%, ensure no negative effects to session length, pageviews, or internal referrals, and use this data to decide how and if we will further scale the summary feature.
WE 3.1.13	If we approach summary moderation design collaboratively with communities — through surveys and other on-wiki discussions, we will be able to determine the minimal viable moderation workflow required for initial scaling of the feature and clarify whether moderation should be community-led, automated (at the prompt level), or some combination of both.
WE3.1.14	If we scale a daily-use Wikipedia-based trivia game in the Android app, logged-out readers who complete the game will open the app on multiple days at a rate 5% higher than those who do not receive a promotion for the game, and thus do not play it.
WE3.1.15	If we introduce personalized reading lists in the Android app and recommend articles based on articles users are interested in, we will see a 5% increase in reading list feature retention.
WE3.1.16	If we put an ideal version of a WikiPodcasting feature and a scrappy Wiki Text-to-speech feature on Android in front of users, they will convey they’d repeatedly use the WikiPodcasting feature, but would not use the scrappy Text-to-Speech version outside of accessibility needs.
WE3.2.7	Increasing the prominence of entry points to donations on the logged-out experiences of the Minerva web mobile and desktop experience will increase the clickthrough rate of the donate link by 30% Year over Year.
WE3.4.1	If we were to explore the feasibility by doing an experiment of setting up smaller PoPs in cloud providers like Amazon, we can expand our data center map and reach more users around the world, at reduced cost and increased turn-around time.
WE3.5.2	If we address the major formatting and display issues with charts raised during the pilot wiki phase, we will feel confident scaling up deployments to more wikis by the end of Q3.
WE3.5.3	If we implement a solution for filtering data sets used to generate charts using Lua, volunteers will have the flexibility they need to cover the majority of their data management needs and will be satisfied with the state of the MVP when the project winds down in Q4.
WE3.5.4	If we improve test coverage and documentation for Charts, we will be comfortable handing off maintenance and future feature development [to reading engineering, contractors, and volunteers], allowing us to wind down the project and task force
WE3.5.5	If we seed the Community Wishlist with Charts features we know volunteers have asked for that are out of scope for the MVP, there will be a central place for volunteers and staff to discuss future Charts-related work, allowing the future maintainers to manage expectations and source input for annual planning.
WE4.1.3	If we deploy the Incident Reporting System MVP to x more wikis (representative sample) we will be able to gather valuable data that will help us identify patterns of harmful conduct across wikis.
WE4.1.4	If we engage stakeholders across key departments in structured discussions, we can collaboratively define a shared vision and realistic scope for the Incident Reporting System in the coming year, aligned with organizational priorities and compliance requirements, providing valuable insights to inform annual planning.
WE4.1.5	If we create a dashboard to monitor key metrics, we will be able to evaluate how people are using the system and what type of incidents are being reported which will help us make decisions about possible countermeasures in Q4.
WE4.2.11a	If we define a terminology and thresholds for revert risk scores across wikis, we will make it possible to use revert risk scores in a wider range of user facing anti-abuse tools. This hypothesis impacts the WE4.2 KR by doing the background work necessary to build upon revert risk scores.
WE4.2.14a	If we analyze IP reputation data associated with problematic editing activity and user accounts, we will be able to prioritize a set of IP reputation facets that can be provided as variables in AbuseFilter. This analysis would then be used by WE4.2.14b later Q3 to build out the variables in AbuseFilter, along with specific guidance about what mitigations would be reasonable to use alongside a given set of IP reputation variables. For example, the recommended mitigation for one IP reputation variable might be to block edits outright, while the recommended mitigation for a different IP reputation variable might be to tag the edit for further review, or to show a CAPTCHA.
WE4.2.14b	If we introduce IP reputation data variables in AbuseFilter variables, we will enable mitigations that can reduce the amount of submissions of vandalism, spam and abuse. Context:This directly contributes to the KR goal by introducing a new signal (IP reputation) to allow for more precision in mitigations (only actions matching the variable are impacted). We could measure the impact of this hypothesis by examining the volume of reverted edits on wikis before/after the variables are introduced. (Other ideas?) We would initially introduce variables like “is likely a VPN” or “is likely a proxy”. We could also consider exposing other variables, depending on discussions in T354599: Make IP reputation available as a variable in AbuseFilter.
WE4.2.15	If we analyze attributes of blocked user accounts on multiple wikis, we will identify patterns across these accounts and assign weights based on the relative importance of each attribute on block rates to use in calculating a user account reputation score. The success of this hypothesis would be measured by whether we are successful in defining a formula for multiplying attributes of an account to provide an account reputation score that maps to blocked users.
WE4.2.18	If we design and build a clickable component to display public data related to user account reputation, we will be able to learn if this is useful to them by observing the number of repeat usages of the tool
WE 4.2.20	Implement a trial enablement which will gather data on the efficacy of the new CAPTCHA on enabled wikis at preventing sockpupppet account creation and bot-based spam edits to measure the efficacy and value of a production rollout of the new technology.
WE4.3.3b	If we deploy a proof of concept of the 'Liberica' load balancer, we will measure a 33% improvement in our capacity to handle TCP SYN floods
WE4.3.6b	If we integrate the output of the models we built in WE 4.3.1 with the dynamic thresholds of per-ip concurrency limits we've built for our TLS terminators in WE 4.3.2, we should be able to increase our ability to neutralize automatically attacks with 20% more volume, as measured with the simulation framework we're building.
WE4.3.8	If we deploy the liberica load balancers to all datacenters, we will increase the capacity to handle TCP SYN floods by 33% everywhere
WE4.3.9	If we establish and follow a verified procedure for the regular testing of large-scale abuse scenarios, then we will consistently measure and improve our ability to respond effectively to such incidents.
WE4.3.10	If we define a policy for review and maintenance of requestctl rules, we will keep the system understandable and manageable over time
WE4.3.11	If we can create an algorithm for web request patterns of our logs, we will be able to differentiate different user behaviors. We will be able to manually run the analysis to generate txt files to review possible scraping patterns that of high cost to the foundation
WE4.4.3	If we improve the interface of the iOS app, we will be able to clearly communicate how temporary accounts work to users as they edit without logging in, and the iOS app will be prepared for the imminent release of temporary accounts to all projects.
WE4.4.4	If we update the data models in the data lake, and the corresponding data pipelines and dashboards, to accurately represent the new user account types, we'll be able to provide accurate analytics reporting related to activities of corresponding user types.
WE4.4.5	If we resolve all remaining product, design and legal blockers for the engineering work that needs to be done before the major pilots deployment, we will be able to complete the engineering work on time for the next round of pilot deployment.
WE5.1.11	The Observability team aims to sunset graphite by enabling read-only mode and disabling new metric ingest by the end of Q3 FY2024/2025. To achieve this goal, the team has set a 90% coverage target of converting the remaining dashboard and retiring legacy metrics and panels that point to graphite metrics.
WE5.2.11	If we finalise the new interface for Notifications in MediaWiki core, we will be able to deprecate the existing interfaces used by Echo and move to more sustainable Notifications feature development by moving extensions to an interface that is simpler and more decoupled.
WE5.2.12	We will effectively demonstrate a sustainable domain event pattern if we complete modeling, implementation, and adoption of in-process PHP domain events for all page state changes (update, delete, move, create, undelete, protection, visibility) and document the intended next steps to achieve the long term value of this work.
WE5.2.13	If we update the EventBus extension to utilize in-process PHP events for page state changes and conduct initial research to verify the feasibility of implementing a long-lived PHP Kafka listener, we will demonstrate that domain events are a viable option for both broadcasting and consuming events for use cases beyond MediaWiki.
WE5.4.4	If we decouple the legacy dumps processes from their current bare-metal hosts and instead run them as workloads on the DSE Kubernetes cluster, this will bring about demonstrable benefit to the maintainability of these data pipelines and facilitate the upgrade of PHP to version 8.1 by using shared mediawiki containers.
WE5.4.7	If we use the newly developed and tested infrastructure for progressively deploying PHP8.1 to production completely, this will help MediaWiki drop unsupported PHP versions in the upcoming May release.
WE5.5.2	If, by the end of May, we create a consolidated view of developer personas and use cases collected through a listening and discovery tour, then we will uncover lesser understood gaps and opportunities in this space. This will leverage existing work completed by stakeholder teams in their respective areas (eg: Dumps, WME), in addition to creating new insights by conducting interviews with WMF staff, technical volunteers, and high impact content reuse partners (eg: WME customers and prospects).
WE6.1.7	If we review the user feedback, decide on a code search and code browsing solution, deploy it to the production infrastructure as an officially supported service and enable indexing of both existing and new repositories from both code tracking systems, we will increase the scope of code that is indexed and searchable and simplify the process of locating code in day to day operations as well as during incident response.
WE6.1.10	If we publish a machine-readable list of WMF-deployed repositories that aligns with Bitergia’s schema and maintain it through CI/CD, we will reduce maintenance overhead, ensure data accuracy and enable efficient filtering of repository data in our developer experience dashboards enabling us to answer two questions systematically.
WE6.2.7	If we make a deployment web UI available behind our single sign-on system and open it to the Wikimedia development community it will increase the number of backport deployers.
WE6.2.8	Continuing on the capabilities of Catalyst to deliver pre-merge test environments of MediaWiki and its extensions & skins on Kubernetes, if we facilitate deployments of pre-merge patches for MediaWiki services, by running pre-merge tests for Wikifunctions, then contributors will be able test more MediaWiki projects with stable, well-defined, isolated test environments.
WE6.3.7	By establishing detailed measurement criteria and evolution guidelines for our sustainability framework, we will create an actionable scoring system for platform improvements.

Signals & Data Services (SDS) Hypotheses [ SDS Key Results ] 議論
仮説の短縮名	Q4 text	詳細と議論
SDS1.1.1	If we partner with an initiative owner and evaluate the impact of their work on Core Foundation metrics, we can identify and socialize a repeatable mechanism by which teams at the Foundation can reliably impact Core Foundation metrics.
SDS1.1.2	If we assess the impact of the new South American data center (MAGRU) on our relevance metric (unique devices), we will be able to produce a report that provides insights into the return on investment of current and future data center investments.
SDS1.4.1	If the DPE team supports the migration of the knowledge gaps metric pipeline to the new wmf_content.mediawiki_content_history_v1 table by expediting resolution of blocking issues, then we can prove the usefulness of this new table, while also improving the reliability of the knowledge gaps pipeline.
SD1.4.2	The research team will adopt the wmf_content.mediawiki_content_history_v1 on all existing use cases in which they currently use the deprecated wmf.mediawiki_wikitext_history.
SDS1.4.3	If we provide a daily updated table wmf_content.mediawiki_content_current_v1 in the datalake that includes the content of the current revision for all pages for all wikis, we will then simplify the integration work and reduce compute resources necessary for downstream consumers that only care about the latest state.
SDS1.4.4	If we adopt the new wmf_content.mediawiki_content_history_v1 datalake table to produce image suggestions, then the IS data pipelines will be more stable.
SDS2.4.3	If we prepare for external community engagement, we can engage stakeholders, structure the narrative to clarify what we want to achieve, and establish the project plan for a productive community engagement that determines whether or not we will greenlight the deployment of unique cookies for logged out users in the future.
SDS2.4.4	If we successfully implement and deploy Edge Uniques cookies in our production CDN, we will have a basis upon which robust A/B testing for anonymous readers can be implemented.
SDS2.4.7	If we update the Experiment Platform’s Javascript and PHP client libraries to handle experiment enrollment data for logged-out users, we can enable A/B testing on anonymous users.
SDS2.4.8	If we modify EventGate to accept experiment enrollment data and opt out of collecting user agents, we can enable collection of data for A/B testing, and teams can lower their data collection risk tier
SDS2.4.9	If hashed versions of edge unique cookie hash values can be generated, transmitted, and validated as being collision-resistant, then it will become possible to use them for experiment analysis using both current bespoke methods as well as Growthbook.
SDS2.4.10	If we create an Experiment Manager API in MediaWiki, we can standardize experiment configuration and data collection
SDS2.4.11	If we conduct at least one end-to-end A/A test on anonymous users using Edge Uniques (see SDS 2.4.4), we can validate our experiment enrollment sampling algorithm (SDS 2.4.9) working with Edge Uniques and the accuracy of our data collection.
SDS2.4.13	If we make Experimentation Lab’s UI compatible with the technical infrastructure that supports experimenting with anonymous users, we’ll enable experiment owners to A/B test with logged-out traffic and validate the new functionality end-to-end, using our MVP platform.
SDS2.4.16	If we create a Superset dashboard to report experiment results based on interaction (non-fundraising) metrics from the measurement plan (SDS 2.4.14), the product team will have fast and ready access to initial insights as soon as data becomes available in our Data Lake – and full insights shortly after the A/B test has concluded – without depending on a data specialist (product analyst).

Future Audiences (FA) Hypotheses [ FA Key Results ] 議論
仮説の短縮名	Q4 text	詳細と議論
FA1.1.2	Can we reach less-engaged younger audiences by remixing community-curated Wikipedia content into short video and posting on popular short video platforms?
FA1.1.3	Can a Discord bot help us learn about whether and how people might want to interact with a conversational-AI-powered Wikipedia off-platform, and help us reach/increase engagement with Wikipedia among younger audiences?
FA1.1.4	If we build a new Wikipedia experience on Roblox, we will learn if this could be an effective way to introduce our brand to younger (Gen Alpha) audiences.

Product and Engineering Support (PES) Hypotheses [ PES Key Results ] 議論
仮説の短縮名	Q1 本文	詳細と議論
PES1.1.2	If we choose three main areas in which to highlight efforts being made to improve our culture of review, and communicate about them in the right channels, we will see improvements in the responses for iterative development, decision-making, and collaboration in the next culture survey (Jan 2025).
PES1.3.5	If we create a Wikipedia-based game for daily use that highlights the connections across vast areas of knowledge, it will encourage consumers to visit Wikipedia regularly and facilitate active learning, leading to increased interaction with content on Wikipedia and longer session lengths.
PES1.5.3	If we contextualize the relevance of the Roles and Responsibilities document and the Service Catalog for a senior leadership audience, by connecting these tools' value to dept strategic goals, then we will be prepared to deliver a thorough Decision Brief for leadership to review. The decisions from the Decision Brief will determine if we have the necessary foundation for tracking, reporting, and decision-making via SLOs as a standard and scalable practice.
PES1.5.4	If we draft an Experiments Lab SLO, and practice incorporating the EditCheck and Citoid SLOs into regular workflows and decisions, we'll expand our understanding of cross-team SLOs by applying the approach to projects in new stages of their organizational lifecycle.
PES1.7.1	If we research how wishes should be processed (internally and externally), then we can gather actionable insights to augment the Wishlist in the short term, and put forth long-term recommendations on the needs of the Wishlist to serve internal stakeholders.
PES1.7.2	If we draft a decision brief about addressing the long-term maintainability of the Wishlist software, including inputs from stakeholder research and feedback from wishlist-consultants, we'll be able to choose a specific approach that meets the needs of our users.
PES1.7.3	If someone runs point between the wishes and wishlist-consultants, we can effectively communicate responses back to volunteers, where “effective” is communicating the “why” of decisions and the consequences don’t cause negative ripple effects in community sentiment.
PES1.7.4	If we run a Wishathon, we will get at least 20 patches on wishes during the event, which will tell us if those wishes are suitable to be worked on.

それぞれのバケットの説明

ウィキにおける体験

このバケットの目的は無償の知識を世界中に配布するため、ウィキ体験の提供と改善、革新を効率化することです。このバケットは運動戦略の勧告事項その2（利用者体験の向上）ならびにその3（安全性と包括性の提供）に沿っています。私たちの観衆とは私たちのウェブサイト群に寄稿するすべての人々にとどまらず、閲覧者その他、無償の知識の消費者も含まれます。私たちがサポートするのは世界トップ10 のグローバルなウェブサイトであり、その他の無償だが重要な文化のリソースもその対象です。システムとしてこれらには、世界最大手のIT企業と同等にパフォーマンスと稼働時間に要件があります。具体的にはウィキ群の利用者インターフェイス、翻訳、開発者 API（そしてさらにあれこれ）、加えてそれらを支援するアプリケーションとインフラを提供しており、これらはいずれも世界規模の堅牢なプラットフォームを形成し、そこでボランティアの皆さんが協働して無償の知識を生み出しているのです。このバケットの目標が高めるものとはコア技術と機能であり、かつまたプロジェクト群に参加するボランティア編集者と調停者の皆さんの経験は継続して向上させ、ウィキ体験を改善または強化しようと取り組むすべての技術貢献者の体験をも高めると、やがて無償の知識を求める世界中の閲覧者と消費者に確実に素晴らしい経験をしてもらうことになります。これらを実現する私たちは、製品や技術の取り組み、さらには調査研究やマーケティングを切り口にしていきます。このバケットに入れる目標は最大で5つあると予想しています。

知識の構築を担うのは人！人であるからこそ、私たちの年次計画はコンテンツにとどまらず、コンテンツに貢献する人々やそれにアクセスして読む人々にも焦点を当てるものになります。

私たちが目指すものは、既存の戦略に基づいて運用計画を作成することであり、中心に投稿者、消費者およびコンテンツの「勢車」（はずみぐるま）に関する仮説を据えます（「flywheel」）。私が追求したい主な転換とは、このコンテンツの部分に重点を据えて、将来のコミュニティの健康指標を特定するという目的のもと、現在、仲裁者や関係者の皆さんが私たちに何を求めているのか探求することです。

信号およびデータサービス

運動戦略の勧告事項として意思決定の公平性（勧告事項の4）（※1）、利用者体験の向上（勧告事項の2）（※2）、評価し繰り返し適用する（勧告事項の10）（※3）がありますが、これらを達成しようとするウィキメディア運動全域の意思決定者には戦略的な意思決定をより適切に行う上でデータやモデル、洞察やツールが欠かせないし、それらには信頼できて関連性がありタイムリーにアクセスできるべきで、意思決定者自身やコミュニティの（実現したものと潜在的なものの両方の）作業の影響評価に役立つ必要があります。（※：1＝Ensuring Equity in Decision Making。2＝Improving User Experience。3＝Evaluating, Iterating and Adapting。）

信号とデータ・サービスのバケットでは主な対象者4区分を特定しました。すなわちウィキメディア財団の職員、ウィキメディア提携団体と利用者グループ、コンテンツを再利用する開発者、ウィキメディアの研究者であり、これら観衆のデータと洞察のニーズに優先順位を付けて対応します。私たちの仕事はさまざまな活動に及びます。格差の定義、指標の開発、指標の計算処理のパイプライン構築を手がけ、データと信号の探索体験と経路の開発などにより、意思決定者を補佐して当該のデータと洞察のやり取りをより効果よくかつ楽しくできるようにします。

将来の観衆とは

このバケットは無償の知識のエコシステムの不可欠なインフラとして世界中のすべての人に真の意味で響くこと、既存の消費者や投稿者という観衆を超えて拡大する戦略の模索を目指しています。このバケットは運動戦略の勧告事項その9（無料の知識の革新）と一致します。私たちは記事を載せたウェブサイトという従来のままのサービスを提供しているのに、人々の情報消費はそれとは異なる体験や形式で実施され – 音声アシスタントを用いたり、動画鑑賞に時間を費やしたり AI と連携させるなどなど、その傾向はますます強まっています。このバケットで仮説を立ててテストしたいのは、無料の知識のエコシステムが迎えるであろういくつかの長期の将来像と、さらにどの場合も私たちが不可欠なインフラであり続ける方法です。私たちはこれを製品や技術の取り組みにとどまらず、調査研究、パートナー関係、マーケティングの各方面でも実現していきます。有望な将来像を確実に描けたなら、このバケットで得た学びはバケット1と2に波及し複数年の年次計画に拡張され、当財団の製品と技術は、知識を求める将来の人々へのサービス提供を担当する存在としての立ち位置を示してもらえるはずです。無料の知識の未来像にきちんとピントを合わせようとする私たちだからこそ、このバケットでは実験と探索を促す目標を持たなければなりません。

子バケット

上記に加え、私たちには他に2つの「子バケット」があります。どちらも重要な機能の領域で構成され、これらは私たちの基本的な業務をサポートするゆえに財団に置く必要があり、そのいくつかはあらゆるソフトウェア組織とも共通しています。これら「子バケット」固有の上位の目標はないものの、他のグループがいだくそのような目標に意見を述べたり、それらに賛同して支えます。具体的には次の各項目です。

インフラの基盤整備。このバケットの対象は一般社会に向けたサイトとサービス類の運用関連のツールとプロセスについて、データセンターや電子計算機およびストレージ用のプラットフォーム、それらを運用するサービスを維持および進化させるチームです。（Infrastructure Foundations。）
製品と技術面のサポート。このバケットでは「大規模な」運用を行い、他のチームにサービスを提供し先方の生産性と運用を向上させるチームが対象です。含まれます。（Product and Engineering Support。）