Jump to content

WikiContrib/Proposed implementation

From Meta, a Wikimedia project coordination wiki

Background information

[edit]

Wikimedia has numerous tools to gather mw:Development statistics, one of them being Bitergia's analytics tool. This tool provides useful information and is convenient for community managers who are familiar with its every know-how. However, this tool is cumbersome to use for others as it requires too many steps to obtain statistics for a topic and there is a bit of learning curve to get comfortable with the tool. For example event's scholarship committee reviewers who need developer contribution statistics while reviewing applications for Wikimedia events juggle between different platforms like Github, Gerrit, Phabricator to view developer activity for deciding on an applicant.

Proposed workflow

[edit]

The WikiContrib tool is currently a work in progress and aims to give a sneak peek into a developer’s contributions on Wikimedia platforms: Gerrit, Phabricator, and Github. The event organizers can log in to the app and perform three steps to retrieve the results.

  1. Organizers type in a list of users with their Wikitech/Gerrit, MediaWiki/Phabricator, and Github usernames or provide the same data in a CSV format.
  2. Organizers choose to filter the data by timestamp, the status of the commit (merged, abandoned, declined), and project name.
  3. Displays data in a tabular format with the ability to sort the data by username or activity.

Any user will be able to use this tool, but the event organizers will have slightly more advantage and will be able to authenticate and gain access to additional features such as search history, uploading the data in CSV format, etc. As of now, the plan is to allow one of the authenticated users to validate the newly registered user.

The idea for the WikiContrib tool is inspired from AWMD stats that generates monthly statistics of technical contributors to Wikimedia projects from Africa.

This tool will be available for use on Toolforge.


Note: The project was called as Contraband during the development phase. After the Initial development phase, the title of the project is updated to WikiContrib

Mockups & wireframes

[edit]

Mockups

[edit]
a) Enter a list of usernames
b) Provide a filter
c) Display user contributions in tabular format
d) Display user contributions in graphical format

Wireframes

[edit]
a) Input usernames
b) Provide advanced filters
c) Display the fetched result in a tabular format
d) Display the graphs and timeline of user contributions
e) Show User activity for a certain timeframe in tabular format

Technical implementation

[edit]

Fetch Gerrit and Phabricator contributions of a user

[edit]

For Gerrit contributions, all changesets, new, merged, and abandoned will be considered. For Phabricator, all tasks authored by and assigned to a user will be considered. The visualization will look as shown in mockup a).

[WIP Section] The solutions to both the above questions is not proposed by me. It is the way how wikimedia.bitergia.io gets the statistics. I am just following it’s method.

For retrieving contributions, here are some identified solutions:

ElasticSearch

[edit]
Request payload
{
  "aggs":{
    "2":{
      "terms":{
        "field":"status",
        "order":{
          "_count":"desc"
        }
      }
    }
  },
  "query":{
    "query_string":{
      "query": "*Rammanojpotla"
    }
  }
}

Phabricator API

[edit]

Request URL: https://phabricator.wikimedia.org/conduit/method/maniphest.search

Request payload
{
  "constraints": {   
    "authors": [
      "PHID-USER-utkozuokiv4qi3otfgny"
    ]
  }
}

Notes:

  • It is currently only possible to fetch "authored and assigned" and not "authored or assigned." For development purpose, only authored count will be considered. For production use, an API request to fetch both authored and assigned will be performed separately, and then their responses will be merged.
  • The output of the above API search will be paginated for which API requests need to be continued till all the results have been fetched.

Gerrit API

[edit]

Request URL: https://gerrit.wikimedia.org/r/changes/?q=owner:rammanoj&o=DETAILED_ACCOUNTS

Request Payload: None

Response
[
  {
    "id": "mediawiki%2Fextensions%2FParserFunctions~master~I5695a4cce0bfc92a047e611353c10640a299d2f0",
    "project": "mediawiki/extensions/ParserFunctions",
    "branch": "master",
    "topic": "point",
    "hashtags": [],
    "change_id": "I5695a4cce0bfc92a047e611353c10640a299d2f0",
    "subject": "Fix incorrect handling of strings with multiple decimal points",
    "status": "ABANDONED",
    "created": "2018-01-28 10:10:08.000000000",
    "updated": "2018-06-12 18:03:59.000000000",
    "insertions": 31,
    "deletions": 2,
    "unresolved_comment_count": 0,
    "has_review_started": true,
    "_number": 406485,
    "owner": {
      "_account_id": 4632,
      "name": "Rammanojpotla",
      "email": "rammanojpotla1608@gmail.com",
      "username": "rammanoj"
    }
  },
  {
    "id": "mediawiki%2Fextensions%2FParserFunctions~master~Ida573d94d0df8862f3189bb9e9735decaa12eecf",
    "project": "mediawiki/extensions/ParserFunctions",
    "branch": "master",
    "topic": "complex",
    "hashtags": [],
    "change_id": "Ida573d94d0df8862f3189bb9e9735decaa12eecf",
    "subject": "Fix give errors on using complex number in {{#expr:}}",
    "status": "ABANDONED",
    "created": "2018-01-27 05:25:44.000000000",
    "updated": "2018-06-12 17:59:18.000000000",
    "insertions": 11,
    "deletions": 3,
    "unresolved_comment_count": 0,
    "has_review_started": true,
    "_number": 406391,
    "owner": {
      "_account_id": 4632,
      "name": "Rammanojpotla",
      "email": "rammanojpotla1608@gmail.com",
      "username": "rammanoj"
    }
  }
  ... .... .... 
]

Note: All the above objects need to be added to get a count of all contributions.

Retrieve user contributions for different dates and times

[edit]

The visualization will look as shown in mockup b).

ElasticSearch

Request URL for fetching contributions from Gerrit: GET gerrit/_search?filter_path=took,hits.total,aggregations

Request payload
{
"aggs":{
      "2":{
         "date_histogram":{
            "field":"grimoire_creation_date",
            "interval":"1D",
            "time_zone":"Asia/Kolkata",
            "min_doc_count":1
         },
         "aggs":{
            "3":{
               "terms":{
                  "field":"status",
                  "size":4,
                  "order":{
                     "_count":"desc"
                  }
               }
            }
         }
      }
   },
  "query": {
   "query_string": {
            "query": "*Rammanojpotla"
    }
        
  }
}
Response
{
   "took":468,
   "hits":{
      "total":198
   },
   "aggregations":{
      "2":{
         "buckets":[
            {
               "3":{
                  "doc_count_error_upper_bound":0,
                  "sum_other_doc_count":0,
                  "buckets":[
                     {
                        "key":"MERGED",
                        "doc_count":1
                     }
                  ]
               },
               "key_as_string":"2017-03-27T00:00:00.000+05:30",
               "key":1490553000000,
               "doc_count":2
            },
            {
               "3":{
                  "doc_count_error_upper_bound":0,
                  "sum_other_doc_count":0,
                  "buckets":[

                  ]
               },
               "key_as_string":"2017-03-28T00:00:00.000+05:30",
               "key":1490639400000,
               "doc_count":3
            },
            {
               "3":{
                  "doc_count_error_upper_bound":0,
                  "sum_other_doc_count":0,
                  "buckets":[
                     {
                        "key":"MERGED",
                        "doc_count":2
                     }
                  ]
               },
               "key_as_string":"2017-04-14T00:00:00.000+05:30",
               "key":1492108200000,
               "doc_count":13
            }
         ]
      }
   }
}

Request URL for Phabricator: GET maniphest/_search?filter_path=took,hits.total,aggregations. For Phabricator and Gerrit, same API Calls will be performed via the following APIs:

Display user activity in a tabular format

[edit]

The visualization will look as shown in mockup e).

ElasticSearch

Request URL: GET gerrit/_search

Request payload
{
  "query":{
    "bool":{
      "must":[
        {
          "query_string":{
            "query":"*rammanoj"
          }
        },
        {
          "match_phrase":{
            "created_on":{
              "query":"2017-04-14"
            }
          }
        }
      ]
    }
  }
}
Response
{
  "took": 396,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 2,
    "hits": [
      {
        "_index": "gerrit_wikimedia_180406b_enriched_190527",
        "_type": "items",
        "_id": "383a235373831a130a06686ddca0a0a28306491a_changeset_348225",
        "_score": 2,
        "_source": {
          "changeset_author_org_name": "Independent",
          "author_name": "Rammanoj",
          "timeopen": "21.00",
          "grimoire_creation_date": "2017-04-14T14:21:12+00:00",
          "patchsets": 5,
          "closed": "2017-05-05T14:24:07+00:00",
          "owner_bot": false,
          "owner_uuid": "1a76fb77b4bd3fcbda44de685cf4a0739dfe37fe",
          "changeset_author_uuid": "1a76fb77b4bd3fcbda44de685cf4a0739dfe37fe",
          "type": "changeset",
          "demography_min_date": "2017-03-27T18:02:55.000Z",
          "name": "Rammanojpotla",
          "author_bot": false,
          "changeset_author_gender": "Unknown",
          "project": "Wikimedia",
          "githash": "Icc487bc6932027e4652dc24743c664c245e0222b",
          "owner_domain": "gmail.com",
          "opened": "2017-04-14T14:21:12+00:00",
          "owner_id": "bdc986f25cd05e52add2651b214c3f7a22ac5d3a",
          "last_updated": "2017-05-05T14:24:07+00:00",
          "metadata__filter_raw": null,
          "cm_title": "wikimedia",
          "author_id": "bdc986f25cd05e52add2651b214c3f7a22ac5d3a",
          "changeset_author_user_name": "",
          "is_gerrit_review": 1,
          "owner_org_name": "Independent",
          "author_gender_acc": 0,
          "author_user_name": "",
          "author_uuid": "1a76fb77b4bd3fcbda44de685cf4a0739dfe37fe",
          "origin": "gerrit.wikimedia.org",
          "metadata__timestamp": "2017-05-05T14:26:51.023552+00:00",
          "changeset_author_id": "bdc986f25cd05e52add2651b214c3f7a22ac5d3a",
          "uuid": "383a235373831a130a06686ddca0a0a28306491a",
          "changeset_author_bot": false,
          "summary_analyzed": "Ruby gem documentation should state license",
          "created_on": "2017-04-14T14:21:12+00:00",
          "owner_gender_acc": 0,
          "changeset_author_gender_acc": 0,
          "is_gerrit_changeset": 1,
          "changeset_author_domain": "gmail.com",
          "domain": "gmail.com",
          "url": "https://gerrit.wikimedia.org/r/348225",
          "changeset_author_name": "Rammanoj",
          "summary": "Ruby gem documentation should state license",
          "status": "MERGED",
          "changeset_number": "348225",
          "metadata__enriched_on": "2019-05-28T03:58:21.723693+00:00",
          "offset": null,
          "metadata__gelk_backend_name": "GerritEnrich",
          "demography_max_date": "2019-03-19T05:29:56.000Z",
          "author_gender": "Unknown",
          "metadata__gelk_version": "0.54.0",
          "owner_gender": "Unknown",
          "author_org_name": "Independent",
          "repository": "mediawiki/ruby/api",
          "owner_user_name": "rammanoj",
          "owner_name": "Rammanoj",
          "metadata__updated_on": "2017-05-05T14:24:07+00:00",
          "project_1": "Wikimedia",
          "tag": "gerrit.wikimedia.org",
          "branch": "master",
          "id": "383a235373831a130a06686ddca0a0a28306491a_changeset_348225",
          "author_domain": "gmail.com"
        }
      }
      ...
     ]
   }
}

Request URL for Phabricator: GET maniphest/_search?filter_path=took,hits.total,aggregations. For Phabricator and Gerrit, same API Calls will be performed via the following APIs:

Benefits of using ElasticSearch over Gerrit and Phabricator APIs

[edit]

With Phabricator API, there is a limit of 100 objects per call that cannot be performed in a parallel manner. With Bitergia contribution count can be easily fetched, the response is retrieved much quickly and that too with a single API request. This would also allow displaying data to the user in realtime.