Jump to content

Research:Revision scoring as a service/Revscoring library

From Meta, a Wikimedia project coordination wiki

Key features

[edit]

Scorer abstraction

[edit]

...todo...

Feature extraction garden

[edit]

When supporting an ecosystem with multiple models that use similar features, it's important that features are (1) well defined and (2) don't duplicate work. #Feature dependencies depicts a set of example features, their dependencies on datasources and other features. By using a dependency injection strategy for specifying and actualizing relationships between features/datasources, we can allow for easy development of new features based on old features and datasources. We can also minimize the work that the system will need to perform when building feature sets for a large set of different models.

Dependencies for features and datasources are presented. Datasources can depend on other datasources. Features can depend on both datasources and other features.
Feature dependencies. Dependencies for features and datasources are presented. Datasources can depend on other datasources. Features can depend on both datasources and other features.

Example Makefile style dependency expression for MisspellingRaioDifferential

WordsAdded: RevisionDiff
	<parse revision diff> \
	return count

MisspellingsAdded: RevisionDiff Dictionary
	<parse revision diff and use Dictionary to find misspellings> \
	return count

PreviousWords: ParsedPreviousRevisionText
	<parse non-markup content> \
	return count

PreviousMisspellings: ParsedPreviousRevisionText Dictionary
	<parse non-markup content and use Dictionary to find misspellings> \
	return count

MisspellingRaioDifferential: WordsAdded, MisspellingsAdded, PreviousWords, PreviousMisspellings
	return (MisspellingsAdded/WordsAdded) / \
	       ((MisspellingsAdded/WordsAdded)+(PreviousMisspellings/PreviousWords))

Model files

[edit]

...todo...