Research:New editor

From testwiki
Jump to navigation Jump to search

Template:Metric infobox New editor is a proposed standardized user class used to measure the number of first-time editors in a wiki project over time. It's used as a proxy for editor activation, and to a lesser extent, editor productivity. A "new editor" is a newly registered user who makes contributions within a given activation period since registration.

Discussion

The majority of new user accounts registered on Wikipedia do not attempt or fail to save an edit. So, when discussing the rate at which new editors are entering Wikipedia, it seems more relevant to measure the subset of new users who end up editing.

The n edits threshold

What amount of activity is necessary? This choice is arbitrary to a large extent. The higher the threshold, the fewer newly registered editors will cross it.

The t time cutoff

Since it is theoretically possible that a newly registered user may take years to make their first edit and observations at any time would truncate such future edits,[1] we artificially censor all observations using some time bound t since the user signed up for a new account. By specifying a t cutoff, we hold all new editors to the same standard, regardless of when they registered and when they make their first contribution.

Newly registered users only

An attached user is not considered a newly registered user and as a result is not counted as a new editor after completing any given number of edits.[2]

Since newly registered users may include accounts created for bot users if they are not registered by proxy, these users are also included in the new editor definition.

Edits across all namespaces

We propose to include in the definition of a new editor edits made to any namespace. When only edits to pages in a project's content namespace(s) are counted, we refer instead to a new content editor. In English Wikipedia, the only content namespace is the "article namespace", also known as namespace 0. Under the proposed "new editor" definition, contributions made to talk or user pages are considered edits as that qualify towards "new editor" status.

Edits to deleted pages

The proposed definition includes activity on pages that are later deleted (including page creation edits) as counting towards "new editor" status. This ensures that we provide a quantitative measurement of activation independent on the productivity or quality of contributions by a newly registered user (which we aim to measure using different metrics). Including activity on deleted pages also ensures that this measurement is not subject to censorship (historical data doesn't change as a function of a future deletion event). See this related discussion on the implications of counting or discounting activity on deleted pages.

Time lag

This metric can be generated t days after user registration. In the case of the WMF standardized parameterization, this is 1 day.

Analysis

There are three variables that need to be chosen in order to apply this metric:

To check how decisions about each of these parameters affect counts of the number of new editors over time, several variations of these parameters were tested on a sample of projects.

English Wikipedia

Template:Inline figure Template:Inline figure Template:Inline figure

Portuguese Wikipedia

Template:Inline figure Template:Inline figure Template:Inline figure

German Wikipedia

Template:Inline figure Template:Inline figure Template:Inline figure

Comparison

Template:Inline figure Template:Inline figure Template:Inline figure

The figures above help us visualize the effects of differences between parameters. When a proportion remains constant over time, that suggests that one metric is proportional to another. That means that both versions of the metric capture the exact same trend information at different scales.

The #Content vs. all and #t = day vs. week are mostly horizontal. This suggests that the type of edits that count and the timescale t that will be considered when generating stats for new editors will not affect overall trends.

However, #n = 1 vs. 10 edits shows strong trend in the proportion of editors who make to it to each threshold over time. This result suggests that different values for the n threshold can change what this metric measures.

Discussion

The number of new editors drops about an order of magnitude for each step: 1, 5, and 10. While n=1 appears to be largely flat after 2008, n=5 and n=10 tell a different story -- one of a steady decline since 2008 for Portuguese and since 2007 for German (see #n = 1 vs. n = 10). The value of t and whether edits outside of content namespaces will be counted seem to be less sensitive (see #t = day vs. week AND #Content vs. all).

Historical definition

Wikistats, the Wikimedia reportcard and the editor trends study define a "New Editor" or "New Wikipedian" as:

A registered and logged-in person (not known as a bot) who has made their 10th edit during the time-period under consideration. Number of edits is a cumulative count across all of time on one wiki.

The canonical restrictions apply to this definition: only edits on countable pages on content namespaces are considered.

Issues

  • Due to the fact that this metric considers a user as a "new editor" when the 10th edit milestone is reached regardless of the user registration time, it doesn't inform us about the behavior of new registered users. The historical definition of a new editor is a hybrid metric, partly driven by new user activation, partly by existing user retention.
  • The canonical definition doesn't distinguish between genuine new users and attached users, i.e., users with an existing record of contributions to their home project and starting for the first time to edit on another project.
  • The definition doesn't include activity on pages that are later deleted as counting towards "new editor" status. See this related discussion on the implications of discounting activity on deleted pages.
  • The definition applies a conventional 10-edit threshold and doesn't allow measuring how many users hit different thresholds that may be equally or more informative.

Comparison with New Wikipedians

The monthly count of New Wikipedians and New editors (ns=0 & t=24 hours) is plotted below for several wikis.

Template:Inline figure Template:Inline figure Template:Inline figure Template:Inline figure Template:Inline figure

The factor of difference between New Wikipedians and New editors is plotted below to help visualize deviations. The following function explains how the factor plotted is related to New Wikipedians and New editors.

new wikipedians×factor=new editors

Template:Inline figure Template:Inline figure

Notes

Template:Reflist

  1. see en:Censoring_(statistics), specifically "right censoring"
  2. Analysis of Wikipedia editor activation should be limited to users registered after 2006 because of inconsistencies in how the logging table recorded new registrations before 2006.