Research:Onboarding new Wikipedians/Rollout

From testwiki
Jump to navigation Jump to search

Template:Research Project

On February 11th, Extension:GettingStarted was deployed on 29 wikis. Later, it was updated to the current state of 30 wikis, including all of the top 10 Wikipedias by pageviews.

The purpose of this study is to measure the scale at which GettingStarted operates (e.g. how many newcomers on Wikimedia Projects received a GettingStarted intervention?) and to get a sense for the impact that the new features have on newcomer behavior.

Research questions

Template:RQ

  1. How is GettingStarted being used?
    1. How many newly registered users saw each CTA?
    2. How many of those editors edit -- through GS or otherwise?
  2. Are GettingStarted edits reverted more often than non-GettingStarted edits?

Template:RQ

  1. How did the proportion of new editors (editor activation) change after GettingStarted was deployed?
  2. How did the proportion of productive new editors (editor productivity) change after GettingStarted was deployed?

Methods

Code repository: https://github.com/halfak/Measuring-the-impact-of-GettingStarted

Deployment wikis

Based on config and Server admin log we can determine when GettingStarted was deployed.

Template:Hidden

Measuring usage

In order to measure the usage of GettingStarted, we observe and compare the number of newly registered users across Wikimedia projects with the number of users with a recorded impression of GettingStarted (see Schema:GettingStartedRedirectImpression). We also observe the number of edits made via GettingStarted through the application of a change tag: "gettingstarted edit".

Assuming a natural experiment

In order to address Template:RQ, we'll be assuming that a natural experiment took place immediately after GettingStarted was deployed. We take advantage of this by comparing metrics of new user activation and productivity before and after deployment. Since the only way to take advantage of GettingStarted's functionality is to be served a CTA immediately after registering an account, there shouldn't be substantial concern about measuring those editors who registered immediately before GettingStarted's deployment.

As opposed to controlled experiments, natural experiments have the potential for confounds to affect inference about causation. A trend that was taking place in a wiki independent of the deployment of the GettingStarted deployment will look like an effect of GettingStarted in the analysis. Thus, it's important when viewing the results to consider this potential issue.

Sample periods

Template:Inline figure

In order to compare new editor fitness before and after deployment, we sampled newly registered users from the two weeks immediately before and after the deployment dates. Figure #Natural experiment sample periods depicts these sample periods visually.

Template:Inline figure

In order to determine how many observations would need to be sampled, we performed a power analysis for several baseline rates and expected changes. Figure #Power analysis plots the p-value of a Chi-squared test for various levels of baselines and changes. We chose a minimum number of observations at 500 since that was the smallest number of observations that will still let us identify significance for large effects. We define "large effects" as twice the observed effect in English Wikipedia for GettingStarted (which ranged from 1.5-3% depending on the metric[1], so we settled on 5%). 16 wikis had at least 500 newly registered users in the sample periods (es, fr, zh, ru, de, pt, it, fa, nl, pl, vi, sv, uk, ko, hu, he, el). We set the maximum number of observations at 2000 since most changes would appear to be significant at that number of observations and setting an upper bound reduces the processing time necessary.

Comparison

Boolean measures

Differences in proportions between before and after periods are identified using a en:Chi-squared test.

Scale measures

Differences in expected values between before and after periods are identified using a logged en:t-test.

Results

What proportion of users saw/used a GettingStarted CTA?

Template:Inline figure

In order to get a sense for what proportion of newly registered users were affected by the deployment of GettingStarted, ran a set of queries to count the number of newly registered users we saw across all Wikimedia projects and tracked their activities as they navigated various funnels that GettingStarted provides. Figure #Group funnel proportions displays the proportion and raw counts of users who made it to each step in the funnel.

Who saw GettingStarted's CTA? Since the GettingStarted experience is currently only available for desktop users. (TODO: link to design docs for GS like experience on mobile) Of the 336,310 newly registered user who registered during our 30 day period after deployment, 273,169 (81.23%) of them registered though the desktop interface. 218,968 of these desktop users registered on one of the 30 wikis were GettingStarted was deployed. 143,627 of the desktop users who registered on GettingStarted wikis saw a GettingStarted CTA. In other words:

42.7% of newly registered users across all projects had the opportunity to take advantage of GettingStarted.

Which CTAs did they see? Of these users who saw a change to the their post-registration experience, the plurality (46.49%) saw the CTA that only asked them if they would like to see suggested tasks for them to perform (see Suggest only CTA). Most often, the "Edit this page" option was not available because the redirect page was a protected article (54.55%) or a page in the Project namespace. The next most common CTA was the combined "Edit this page or Find easy tasks" (see Edit & Suggest CTA). 39.6% of users who saw any CTA saw this one. Finally, 13.91% saw the CTA with only the option to "Edit this page" (see Edit only CTA). These users were predominantly on wikis that lacked suggested tasks (98.9%).


Reverts of GettingStarted edits

Template:Inline figure

One of our concerns with tagging edits "via Getting Started edit suggestions" was that it might draw additional attention from Wikipedians and encourage extra scrutiny of edits made through GettingStarted. If GS tagged edits are receiving extra scrutiny, then we'd expect the rate of reverts for these edits to be higher. To check this hypothesis, we gathered all of the 1st edits performed by newcomers who registered during our 30 day period and detected which revisions were reverted within 48 hours.

Figure #Comparison of revert rates plots the difference between the revert rate of 1st edits not made through GettingStarted with the revert rate of 1st edits made through GettingStarted. Note that in all but a couple of cases, the 95% confidence interval's error bars cross the zero line. This means that there's no significant difference between the revert rate for GettingStarted and non-GettingStarted edits on those wikis. However, there are three Wikis that did see significant differences: viwiki and cawiki, saw higher revert rates for GS edits and enwiki saw lower revert rates for GS edits.

It's important to note that, which such a high number of tests at a 95% error cutoff, we should expect to see a 1-2 wikis report a Type I error. With this in mind, the significant differences observed for viwiki and cawiki should be taken with a grain of salt. However, with English Wikipedia, we had such a large number of observations that the result is clearly significant. It appears that GettingStarted edits are reverted significantly less often than than non-GettingStarted edits.

In order to look for evidence of changes in the activation and productivity due to the introduction of GettingStarted, we used an array of metrics to measure newcomer performance before and after the deployment of GettingStarted.

The figures below plot the difference between metrics before and after the deployment. When the plotted value is above zero, that means an increase in the metric was observed. Overall, the results fail to demonstrate a clear difference in the before and after state of these Wikis.

While some wikis show significant differences under some metrics, this type of statistical error is expected to happen with 95 confidence intervals in about 1/20 tests. Here, we see 10 instances of significant results out of 112 tests:

  • Dewiki showed a significant drop in the rate of new editors
  • Plwiki showed a significant increase in the rate of returning new editors
  • Eswiki, Itwik and Plwiki show a significant increase in the number of productive edits newcomers performed in their first day.
  • Plwiki saw a significant increase in the number of newcomer edit sessions while Frwiki saw a significant decrease
  • Plwiki and Ukwiki saw a significant increase in the amount of time spent editing while Frwiki saw a significant decrease

Given the lack of a clear trend cross-wikis and the lack of an obvious correlation between the availability of suggested tasks in the user experience and performance outcomes, it's not clear from these results that GettingStarted is having a measurable effect in the short term. Future work may reduce noise and potential confounds by running a controlled experiment on these wikis.

Boolean measures

Template:Inline figure Template:Inline figure Template:Inline figure

Scalar measures

Template:Inline figure Template:Inline figure Template:Inline figure Template:Inline figure

References