Birdwatch, Twitter’s recently introduced community-driven approach to provide context to potentially misleading tweets, is the most ambitious and transparent effort by a social media company to combat misinformation on its platform. The tool is actively under development and they have recently introduced a new tool for rating notes and contributors using a PageRank style algorithm. This note provides an external perspective on this feature, highlights some of the remaining risks with such a system, and provides empirical and technical tools for monitoring and addressing those risks. All results preliminary and subject to change.
Birdwatch has done an unparalleled job in building out its product with openness and transparency and cultivating trust. Despite this, it still faces some… hostility?
— Ößlïgå†ê Çårñïvðrê 🥩 Ret’d CAF ((???)) June 25, 2021
When this hesitancy is widespread, no matter how shallow, it only takes a few errors to really erode whatever trust has been built up. That fragility could be viewed as an opportunity by individuals or groups of people who want to undermine such efforts. If actions could be taken to ‘break’ Birdwatch (i.e. rate otherwise helpful notes as not helpful and demonstrably false notes as helpful), groups could simultaneously spread misinformation and diminish trust in the institutions designed to combat it.
Importantly, we have precedent for similar coordinated actions on other platforms in other contexts.
“Cyberstriking” is a common tool used by politically motivated groups to trip the reporting systems of social media sites to get political opponents removed or temporarily banned. Identity Evropa, a white supremacist group now operating as the American Identity Movement, had a dedicated channel on their (leaked) Discord server to coordinate such actions with varying degrees of success:
The challenge with this type of coordination is that there is often limited spatial clustering or publicly shared social graphs of coordinated actors (i.e. people within the group often do not follow each other on Twitter or Facebook to maintain separation between their public profile and group affiliations). This makes it difficult to use standard community detection tools to identify these groups.
Birdwatch is actively addressing the challenge of coordinated adversarial actors and currently has many safeguards in place to prevent coordinated action, many of which I am sure are not in the public eye. Access to the pilot is limited to accounts that have verified contacts, use two factor authentication, and have not recently been found to violate Twitter’s terms and conditions (among others). Birdwatch is also intentionally recruiting a diverse contributor base to get broader and more varied perspectives.
Most notably for the purposes of the issue discussed here is that Birdwatch limits the degree to which any single author can improve the helpfulness score of another:
Each rater’s ratings of any particular author are only counted once in this weighted average, using the average helpfulness rating/ from that particular rater of the particular author. For example: if a rater rated 10 notes from the same author, those will count as 1 author-level rating instead of 10 ratings. - Birdwatch
However, as the program scales and more users are permitted to join the platform, the issue of coordinated abuse may still come to the fore. The goal of this analysis is to ask whether small, coordinated groups could overcome these safeguards to manipulate the platform or otherwise undermine public trust in Birdwatch as a whole.
I have done my best to match the methodology of Birdwatch in calculating author and note scores as precisely as possible, please shoot me an email if you spot any mistakes. All errors are my own.
The new PageRank-based system recently introduced by Birdwatch uses rankings of submitted notes to generate contributor helpfulness scores. If other Birdwatch contributors rate your notes as helpful, your score goes up. If other contributors with high helpfulness ratings themselves rate you as helpful, your score will go up even more.
PageRank was, of course, phenomenally successful and instrumental in Google’s rise to prominence as the default search engine for the World Wide Web. However, it was not without its vulnerabilities to abuse. In the early days of the blogosphere, link farms allowed websites to artificially increase their PageRank by creating many new websites and linking them to one another. Thus, allowing ill-intentioned people with access to a hosting server to boost content-poor, ad-rich websites to the top of search results, ruining the internet for the rest of us.
Birdwatch notes are anonymously contributed, meaning that contributors and Twitter users do not see the account names of those who write notes. This anonymity, combined with the account verification requirements, makes it much more difficult for disconnected users to batch produce ratings and preferentially support members of their own in-group (as a link farm might). Nevertheless, it may struggle to fully mitigate coordination now or in the future.
To illustrate how it might be feasible, in principle, to game such a system, we can look at a (charmingly?) simple agent based model.
Agent based modeling can be really helpful in understanding emergent behavior in complex systems. By design, Birdwatch (transparently) complicates community ratings system to make it exceedingly difficult for a stand-alone user to game the system. This also makes it quite hard to model from a pure game theoretic perspective. By recreating the system, modeling user behavior at their level, and observing outcomes of interest we can take a brute force approach to understanding the expected efficacy of different strategies and, therefore, potential vulnerabilities.
I try to replicate the data generating process of Birdwatch in as simple a framework as possible using only R
and the tidyverse. The flow of the simulation is straightforward:
Once we have this system established, I’ll define a set of strategies that coordinated users could employ to try to game the system and perform a grid search over that parameter space to identify potential vulnerabilities in how ratings are calculated. The benefit of codifying the system as that we can quickly and easily add in additional strategy parameters at a later date or update note rating protocols as the system evolves.
We’ll first create a universe of 1,000 posts that are split into five topics. All of these topics are super cool but are purely illustrative. In a real world scenario, we should think of these topics as being highly specific claims (e.g. vaccines make you magnetic). One of these illustrative topics (“Politics”) is the target topic of any potential bad actors who will try to undermine the ratings system with regard to that topic. Within this universe, each individual post has a 10% probability of being a pure fabrication, which we code as a blatant_lie
. Clearly, this is a dramatic over-simplification of the what Birdwatch is trying to accomplish, but it provides a useful and clear starting point for analysis.
👇 Code to generate posts.
# Setting seed for replicability ----
set.seed(1355)
#' @name create_posts
#' @description This function creates the universe of posts
#' split into five topics. Each post has a five-percent
#' chance of being a blatant_lie.
#' @param n_posts The number of posts to create.
create_posts <- function(n_posts) {
fabricatr::fabricate(
ID_label = "post_id",
N = n_posts,
topic = fabricatr::draw_categorical(
N = N,
prob = c(.2, .05, .3, .2, .25),
category_labels =
c("Formula One",
"Coffee",
"Data Science",
"Gardening",
"Politics")
),
blatant_lie = fabricatr::draw_binary(N = N, p = .1)
)
}
posts <- create_posts(n_posts = 1000)
post_id | topic | blatant_lie |
---|---|---|
0132 | Data Science | 1 |
0844 | Politics | 1 |
0581 | Coffee | 0 |
0506 | Coffee | 0 |
0072 | Politics | 0 |
0188 | Data Science | 0 |
We’ll now create a population of 1000 contributors. Most of them will be birders
and a small proportion \(\rho\) will be twitchers
. Birders follow the rules and faithfully try to report misleading information. Twitchers, on the other hand, try to seek out posts related to a specific topic and report misleading information as truthful and vice versa.
👇 Code to generate contributors.
#' @name create_contributors
#' @description This function creates a population of contributors
#' who are of two types: birders (good) and twitchers (bad).
#' @param n_contributors The size of the population of contributors.
#' @param rho The probability an account is a twitcher.
create_contributors <- function(n_contributors, rho) {
fabricatr::fabricate(
ID_label = "contributor_id",
N = n_contributors,
type = fabricatr::draw_categorical(
N = N,
prob = c(1-rho, rho),
category_labels = c("birder","twitcher"),
)
)
}
contributors_data <- create_contributors(n_contributors = 1000, rho = .02)
With the current random seed, our population of contributors has 977 birders and 23 twitchers. This is a really small number of inauthentic accounts, but we will see if they can have an outsized effect on individual cases (i.e. can they flag a true post as potentially misleading). To do this, we need to model actor search and rating behavior. A birder
acts in the intended way; randomly sorting through posts and identifying (with some error) posts that they think are misleading. If the birder
rates a post as misleading, it will be flagged and added to a notes database.
The twitcher
acts in much the same way with the exception of a topic of interest. Here, we’ll say that the twitcher
s are interested in the “Politics” topic. These contributors will go our of their way to identify posts in this topic and only flag them if they are true. So, blatant lies will not added to the notes database, but truthful posts will be flagged as potentially misleading and added as a note. The breakdown of how frequently the twitcher
s consider the targeted topic is encoded in the parameter \(\gamma\) which we can think of as defining how discreet the twitcher
s are. A \(\gamma\) of 1 would mean they only target the Politics topic and 0 would mean they would never look at posts in the politics topic. twitcher
s can also be more or less active than birder
s when considering posts to tag. This is encoded in the multiplier
parameter of the create_notes_dataset()
function.
We also need to provide a means for twitcher
s to coordinate with one another. I add a new variable named whistle
for this purpose. When a twitcher
adds a note to a post , they also change this variable to 1. We can think about this signalling as occurring either through a content based code word that twitcher
s know and birder
s do not (see 💬) OR through some other off-platform means of communication (e.g. a Discord server in the case of “Cyberstriking”). With a signal in place, twitcher
s can identify one another and rate themselves as helpful if they come across them.
👇 Code to generate a simulated notes dataset.
#' @name create_notes_dataset
#' @description creates the data frame of notes (posts that are flagged).
#' @param attention_span How many posts the contributor considers in total.
#' @param param_gamma The degree to which twitchers focus on target topic.
#' @param contributors_data_frame The data frame produced by `create_contributors`.
#' @param posts_data The data frame created by `create_posts`.
#' @param multiplier scalar to change twitcher attention relative to birder.
create_notes_dataset <- function (
contributors_data_frame = contributors_data,
posts_data = posts,
param_gamma = .1,
attention_span = 10,
multiplier = 1
) {
# randomly sample sum of attention spans with replacement and randomly assign
# attention_span posts to each contributor.
n_birders <- contributors_data_frame %>%
filter(type == "birder") %>%
count() %>%
pull()
n_twitchers <- contributors_data_frame %>%
filter(type == "twitcher") %>%
count() %>%
pull()
# Get birder Notes ----
birder_notes <- posts_data %>%
# get full list
slice_sample(
n = (attention_span * n_birders),
replace = TRUE
) %>%
# assign to contributors
mutate(contributor_id =
contributors_data_frame %>%
filter(type == "birder") %>%
select(contributor_id) %>%
slice(
rep(1:n(), each = attention_span)
) %>%
pull()
) %>%
# decide whether to flag or not
mutate(
error = rbinom((attention_span * n_birders), 1, .05),
flag = if_else(
error == 0,
as.integer(blatant_lie),
as.integer(1-blatant_lie)
),
whistle = 0
) %>%
filter(flag == 1)
# Get twitcher notes ----
non_target_notes <- posts_data %>%
filter(topic != "Politics") %>%
slice_sample(
n = round(multiplier * attention_span * (1 - param_gamma)) * n_twitchers,
replace = TRUE
) %>%
# assign to contributors
mutate(contributor_id =
contributors_data_frame %>%
filter(type == "twitcher") %>%
select(contributor_id) %>%
slice(
rep(1:n(), each = round(multiplier * attention_span * (1 - param_gamma)))
) %>%
pull()
) %>%
# decide whether to flag or not
mutate(
error = rbinom(round(multiplier * attention_span * (1 - param_gamma)) * n_twitchers, 1, .05),
flag = if_else(
error == 0,
as.integer(blatant_lie),
as.integer(1-blatant_lie)
),
whistle = 1
) %>%
filter(flag == 1)
target_notes <- posts_data %>%
filter(topic == "Politics") %>%
slice_sample(
n = round(multiplier * attention_span * param_gamma) * n_twitchers,
replace = TRUE
) %>%
# assign to contributors
mutate(contributor_id =
contributors_data_frame %>%
filter(type == "twitcher") %>%
select(contributor_id) %>%
slice(
rep(1:n(), each = round(multiplier * attention_span * param_gamma))
) %>%
pull()
) %>%
# decide whether to flag or not
mutate(
flag = if_else(blatant_lie == 0, 1, 0),
whistle = 1
) %>%
filter(flag == 1)
notes <- bind_rows(
birder_notes, non_target_notes
) %>%
bind_rows(target_notes)
return(notes)
}
notes <- create_notes_dataset(
contributors_data_frame = contributors_data,
posts_data = posts,
param_gamma = .1,
attention_span = 10,
multiplier = 1
)
We can look at the distribution of true (blatant) lies across topics. If everything is working as expected, there should be mostly lies across topics (though some true posts are reported due to honest errors) and many more “true” posts should be flagged by the twitcher
s, leading to an overall lower proportion of lies in the “Politics” topic. Looking at the plot below, we see that this is the case.
Now that we have this set of notes we can allow contributors to rate one another’s posts. Again this requires explicitly modeling contributor behavior. The birder
s will act like the good citizens they are and will give positive ratings to correctly flagged posts (those that are blatant lies). Twitcher
s, on the other hand, will actively search out posts with a whistle
and will always rate them as helpful. Doing so, under certain conditions, could allow them to artificially increase author and rater scores among members of their cabal.
For the purposes of constructing helpfulness scores, we now have contributors rate the helpfulness of a sample of notes and then use these values to build author helpfulness scores, rater helpfulness scores, and combined helpfulness scores as faithfully as possible to the methods described by Birdwatch.
👇 Code to create ratings dataset.
#' @name create_ratings_dataset
#' @description This function creates a set of ratings. Each contributor
#' looks at a set of tweets and gives it a rating based on their rate_ function.
#' @param attention_span The number of posts each contributor considers.
#' @param contributors_data_frame The data frame produced by `create_contributors`.
#' @param notes_data The dataframe produced by `create_notes_dataset`.
#' @param multiplier scalar to change twitcher attention relative to birder.
create_ratings_dataset <- function (
contributors_data_frame = contributors_data,
notes_data = notes,
attention_span = 30,
multiplier = 1
) {
n_birders <- contributors_data_frame %>%
filter(type == "birder") %>%
count() %>%
pull()
n_twitchers <- contributors_data_frame %>%
filter(type == "twitcher") %>%
count() %>%
pull()
# Get birder ratings ----
birder_ratings <- notes_data %>%
# get full list
slice_sample(
n = (attention_span * n_birders),
replace = TRUE
) %>%
# assign to contributors
mutate(rater_id =
contributors_data_frame %>%
filter(type == "birder") %>%
select(contributor_id) %>%
slice(
rep(1:n(), each = attention_span)
) %>%
pull()
) %>%
# decide whether to rate as helpful or not
mutate(
error = rbinom((attention_span * n_birders), 1, .05),
rate_helpful = if_else(
error == 0,
as.integer(blatant_lie),
as.integer(1-blatant_lie)
)
) %>%
select(-error, -flag, - whistle)
# Get twitcher ratings ----
twitcher_ratings <- notes_data %>%
filter(whistle == 1) %>%
slice_sample(
n = (multiplier * attention_span * n_twitchers),
replace = TRUE
) %>%
# assign to contributors
mutate(rater_id =
contributors_data_frame %>%
filter(type == "twitcher") %>%
select(contributor_id) %>%
slice(
rep(1:n(), each = multiplier * attention_span)
) %>%
pull()
) %>%
# decide whether to flag or not
mutate(
rate_helpful = 1
) %>%
select(-error, -flag, - whistle)
ratings <- bind_rows(birder_ratings, twitcher_ratings)
return(ratings)
}
ratings <- create_ratings_dataset(
contributors_data_frame = contributors_data,
notes_data = notes,
attention_span = 20,
multiplier = 1
)
Each contributor starts out with a default helpfulness score of 1 following the procedure outlined in Birdwatch’s ranking methodology notes. We then iterate over the following calculation until these scores are stable. Note that only contributors with at least one note that has been rated will receive an author helpfulness score. In this simulation, that means there are only 773 authors out of the original set of 1000 contributors that receive a score.
\[a_i(u) = \max\left(0, \frac{3}{2} \times \frac{2 + \sum_{\text{rater}\in R(u)} a_{i-1}(rater)\times \text{Rating}(\text{rater}, u)}{6 + \sum_{\text{rater}\in R(u)} a_{i-1}(\text{rater})} - \frac{1}{2}\right) \]
👇 Code to calculate author helpfulness ratings.
#' @name calculate_author_helpfulness
#' @description calculates author helpfulness scores as
#' outlined by Birdwatch methodology.
#' @param contributors_data_frame data frame made by `create_contributors`
#' @param ratings_data_frame data frame made by `create_ratings_dataset`
#' @param iterations number of iterations to calculate author scores.
calculate_author_helpfulness <- function(
contributors_data_frame = contributors_data,
ratings_data_frame = ratings,
iterations = 10
) {
# Initialize author_rating data frame
author_ratings <- contributors_data_frame %>%
mutate(author_helpfulness_score = 1,
iteration = 0) %>%
select(-type) %>%
filter(
contributor_id %in% unique(ratings_data_frame$contributor_id)
)
# Iterate until convergence
for (i in 1:iterations) {
# Get new author ratings
new_author_ratings <- ratings_data_frame %>%
# get average helpfulness rating by id pair
group_by(contributor_id, rater_id) %>%
summarise(rate_helpful = mean(rate_helpful)) %>%
# get current author ratings
left_join(., author_ratings %>%
filter(
iteration == max(author_ratings$iteration)
) %>%
rename(rater_id = contributor_id),
by = "rater_id"
) %>%
na.omit() %>%
mutate(
numerator_sum_item = author_helpfulness_score * rate_helpful
) %>%
summarise(
numerator_sum = sum(numerator_sum_item),
denominator_sum = sum(author_helpfulness_score)
) %>%
mutate(
author_helpfulness_score =
(3 / 2) * (2 + numerator_sum) / (6 + denominator_sum) - (1 / 2)
) %>%
mutate(
author_helpfulness_score =
if_else(
author_helpfulness_score < 0,
0,
author_helpfulness_score
)
) %>%
mutate(iteration = i) %>%
select(contributor_id, author_helpfulness_score, iteration)
# Append them to author ratings dataset
author_ratings <-
bind_rows(author_ratings, new_author_ratings)
}
# get only the most recent iteration
# --------------------------------------------
# NOTE: This is commented out *only* for the purposes of this document,
# see the full source code for deets.
# --------------------------------------------
#author_ratings <- author_ratings %>%
# filter(iteration == 10) %>%
# select(-iteration) %>%
# right_join(contributors_data_frame, by = "contributor_id") %>%
# mutate(author_helpfulness_score = replace_na(author_helpfulness_score, 0))
# --------------------------------------------
return(author_ratings)
}
author_ratings <- calculate_author_helpfulness(
iterations = 10,
contributors_data_frame = contributors_data,
ratings_data_frame = ratings
)
author_helpfulness_scores <- author_ratings %>%
filter(iteration == 10) %>%
select(-iteration) %>%
right_join(contributors_data, by = "contributor_id") %>%
mutate(author_helpfulness_score = replace_na(author_helpfulness_score, 0))
Looking across a random sample of twitcher
s and birder
s, we can see that scores do indeed converge over ten iterations.
Birdwatch calculates a preliminary note score using the following calculation: \[\text{preliminary_note_score}(n) = \frac{\sum_{\text{rater}\in R(n)}a(\text{rater})\times \text{Rating}(\text{rater}, n)}{\sum_{\text{rater}\in R(n)}a(\text{rater})} \]
👇 I replicate this in R
using the code outlined here.
#' @name calculate_preliminary_note_scores
#' @description calculates preliminary note scores for the purposes of
#' constructing rater helpfulness scores.
#' @param ratings_data_frame data frame made by `create_ratings_dataset`
#' @param author_scores_data data frame made by `calculate_author_helpfulness`
calculate_preliminary_note_scores <- function(
ratings_data_frame = ratings,
author_scores_data = author_helpfulness_scores
) {
prelim_note_scores <- ratings_data_frame %>%
left_join(
.,
author_scores_data %>%
rename(rater_id = contributor_id),
by = "rater_id"
) %>%
na.omit() %>%
group_by(post_id, contributor_id) %>%
summarise(
numerator =
sum(author_helpfulness_score * rate_helpful),
denominator = sum(author_helpfulness_score)
) %>%
mutate(
preliminary_note_score = numerator/denominator
) %>%
mutate(
prelim_rating = case_when(
preliminary_note_score >= .84 ~ "Currently Rated Helpful",
preliminary_note_score <= .29 ~ "Currently Rated Not Helpful",
TRUE ~ "Needs More Ratings"
)
) %>%
select(-numerator, -denominator)
return(prelim_note_scores)
}
prelim_note_scores <- calculate_preliminary_note_scores(
ratings_data_frame = ratings,
author_scores_data = author_helpfulness_scores
)
At this point, it may be helpful to recall that notes are uniquely identified by the combination of post_id
and contributor_id
. Using these preliminary scores, we can identity the set of notes which we will use to construct the rater helpfulness scores (i.e. those that are rated as helpful or not helpful).
I now recreate the Rater Helpfulness Score. The actual construction here is slightly different from that used by Birdwatch. Whereas Birdwatch takes the first 5 ratings in a given time period, here we randomly sample 5 ratings (we don’t have a time dimension in this very rudimentary framework). The ratings are filtered to only notes that have a definitive rating using the preliminary note score.
Currently only the first 5 ratings on each note that were made within 48 hours of the note’s creation are used when evaluating a Rater Helpfulness Score (hereafter called “valid ratings”). This is done to both reward quick rating, and also so that retroactively rating old notes with clear labels doesn’t boost Rater Helpfulness Score. - Birdwatch
If there is coordination, this may, however, open up additional vulnerabilities to abuse. One can imagine, for instance, if a twitcher
is forewarned that a note will be generated, they may be able to rate it before any birder
s get the opportunity. To model this, we can add a parameter twitcher_speed_param
which changes the probability that a twitcher
rating is randomly selected as a valid rating.
👇 Rater helpfulness function.
#' @name calculate_rater_helpfulness
#' @description Calculates the rater halpfulness score using
#' the ratings data set and the preliminary note scores.
#' @param ratings_data_frame data created by `create_ratings_dataset`
#' @param prelim_scores_data data created by `calculate_preliminary_notes`
#' @param contributors_data_frame data created by `create_contributors`
#' @param twitcher_speed_param Increases the probability that ratings from
#' twitchers will be selected as 'valid ratings'.
calculate_rater_helpfulness <- function(
ratings_data_frame = ratings,
prelim_scores_data = prelim_note_scores,
contributors_data_frame = contributors_data,
twitcher_speed_param = twitcher_speed
) {
# Rater Helpfulness Scores
rater_helpfulness_scores <- ratings_data_frame %>%
group_by(post_id, contributor_id) %>%
# Get preliminary note scores
left_join(.,
prelim_scores_data %>%
select(post_id, contributor_id, prelim_rating),
by = c("post_id", "contributor_id")
) %>%
# subset to only those with ratings
filter(prelim_rating %in% c(
"Currently Rated Helpful",
"Currently Rated Not Helpful"
)) %>%
# subset to where there are a least 5 ratings
mutate(count = 1) %>%
group_by(post_id, contributor_id) %>%
mutate(count = sum(count)) %>%
filter(count >= 5) %>%
select(-count) %>%
# Weight probability by twitcher speed
left_join(.,
contributors_data_frame %>%
rename(rater_id = contributor_id),
by = "rater_id") %>%
mutate(speed = if_else(type == "twitcher", twitcher_speed_param, 1)) %>%
# Randomly select 5
slice_sample(n = 5, weight_by = speed) %>%
select(-speed, -type) %>%
# Calculate consensus without current rating
mutate(
consensus = (5/4) * (mean(rate_helpful) - rate_helpful/5)
) %>%
mutate(
consensus = case_when(
consensus >= .75 ~ 1,
consensus == .5 ~ consensus,
consensus <= .25 ~ 0
)
) %>%
# Subset to notes with consensus
filter(consensus %in% c(0,1)) %>%
group_by(rater_id) %>%
# Calculate rater scores
mutate(
valid_rating = 1
) %>%
summarise(
num_valid_ratings_match = sum(rate_helpful == consensus),
valid_ratings = sum(valid_rating)
) %>%
mutate(
rater_helpfulness_score =
(3/2) * (2 + num_valid_ratings_match) / (6 + valid_ratings) - 1/2
) %>%
mutate(
rater_helpfulness_score = if_else(
rater_helpfulness_score < 0,
0,
rater_helpfulness_score
)
) %>%
select(rater_id, rater_helpfulness_score) %>%
rename(contributor_id = rater_id) %>%
right_join(
contributors_data_frame,
by = "contributor_id"
) %>%
mutate(
rater_helpfulness_score = replace_na(rater_helpfulness_score, 0)
)
return(rater_helpfulness_scores)
}
rater_helpfulness_scores <- calculate_rater_helpfulness(
ratings_data_frame = ratings,
prelim_scores_data = prelim_note_scores,
contributors_data_frame = contributors_data,
twitcher_speed_param = 1
)
To get the final note scores we first need to calculate the combined helpfulness score which is simply an average of the author helpfulness score and the rater helpfulness score.
👇 Combined helpfulness score function.
#' @name calculate_combined_helpfulness_score
#' @description Averages together the author and
#' rater helpfulness scores.
#' @param author_helpfulness_data data created by `calculate_author_helpfulness`
#' @param rater_helpfulness_data data created by `calculate_rater_helpfulness`.
calculate_combined_helpfulness_score <- function(
author_helpfulness_data = author_helpfulness_scores,
rater_helpfulness_data = rater_helpfulness_scores
) {
combined_helpfulness_scores <-
left_join(
author_helpfulness_data,
rater_helpfulness_data,
by = c("contributor_id", "type")
) %>%
mutate(
combined_helpfulness_score =
((author_helpfulness_score + rater_helpfulness_score) / 2)
)
return(combined_helpfulness_scores)
}
combined_helpfulness_scores <- calculate_combined_helpfulness_score(
author_helpfulness_data = author_helpfulness_scores,
rater_helpfulness_data = rater_helpfulness_scores
)
Finally, we are able to calculate the final note scores, where the numeric score is calculated following:
\[\text{note_score}(n) = \frac{\sum_{\text{rater}\in R(n)}c(\text{rater})\times rating(\text{rater}, n)}{\sum_{\text{rater}\in R(n)}c(\text{rater})} \]
Where this numeric score is greater than or equal to .84
, the note is classified as “Currently Rated Helpful”. When it is less than or equal to .29
it receives a “Currently Rated Not Helpful” rating. Otherwise, it’s tagged “Needs More Ratings”.
👇 Code to classify notes.
#' @name calculate_final_note_scores
#' @description This function uses the contributor helpfulness scores to
#' calculate the final note score.
#' @param ratings_data data created by `create_ratings_dataset`
#' @param combined_scores_data data created by `calculated_combined_helpfulness_score`
calculate_final_note_scores <- function(
ratings_data = ratings,
combined_scores_data = combined_helpfulness_scores
){
final_note_scores <- ratings_data %>%
left_join(
.,
combined_scores_data %>%
rename(rater_id = contributor_id,
rater_type = type),
by = "rater_id"
) %>%
group_by(post_id, contributor_id) %>%
summarise(
numerator = sum(combined_helpfulness_score * rate_helpful),
denominator = sum(combined_helpfulness_score)
) %>%
mutate(
note_score = numerator / denominator
) %>%
select(
post_id, contributor_id, note_score
) %>%
mutate(
note_rating = case_when(
note_score >= .84 ~ "Currently Rated Helpful",
note_score <= .29 ~ "Currently Rated Not Helpful",
TRUE ~ "Needs More Ratings"
)
)
}
note_scores <- calculate_final_note_scores(
ratings_data = ratings,
combined_scores_data = combined_helpfulness_scores
)
Now that we have our scores, we can take a look and see whether twitcher
s were able to accomplish anything with the strategy they pursued in this simulation. We’re primarily interested in two outcomes of interest 1) contributor ratings, and 2) note ratings.
Let’s take a look at the contributor scores. While twitcher
s were able to get their author scores to be comparable to those of birders, their ratings were sufficiently different from those of birders (who on average made up the consensus of valid ratings) that their Rater Helpfulness Scores, and thus their Combined Helpfulness Scores, suffered. I’ll call that a win for Birdwatch.
But what about the notes themselves? The most important indicator, from my perspective, is whether or not the system is able to sort out fact from fiction. If everything is working properly, notes that flag blatant_lie
s should be rated as helpful and those that flag non-misleading posts should be rated as not-helpful despite collusion by the twitcher
s.
In the figure below, we can see that this is the case. Under these settings and the random seed assigned above, we achieve near perfect separation. Some notes are classified as “Needs More Ratings”, but not a single note flagging a lie is classified as “not helpful” and no notes flagging not misleading posts are classified as “helpful”.
All things considered, Birdwatch works well against a naive strategy. So, now let’s try to break it 💪.
Now that we’ve run through one example, we can re-conduct this exercise varying certain dimensions of the twitcher
s’ strategy or the environment. Namely, we’ll think about \(\rho\) (the size of the twitcher
population), \(\gamma\) (how discreet twitcher
s need to be in targeting a particular topic), and twitcher
activity multipliers (i.e. how much more active twitcher
s are compared to birder
s in note-making and note-rating). Then, we’ll look at a number of outcomes: 1) how often are posts mis-classified in the targeted topic compared to the others and 2) what is the average author_helpfulness_rating
of twitcher
s. I write a simple helper function below to return these values.
👇 function to return measures we care about.
#' @name return_observables
#' @description This function returns the relevant observable
#' outcomes in a list of two objects. The first element in
#' the list is average scores by contributor type. The second
#' is the classification breakdown by blatant_lie and whether
#' or not the topic is the target topic.
return_observables <- function(
combined_scores_data = combined_helpfulness_scores,
note_scores_data = note_scores,
notes_data = notes
) {
# return average author ratings by type
scores <- combined_scores_data %>%
group_by(type) %>%
summarise(
rater_score = mean(rater_helpfulness_score),
author_score = mean(author_helpfulness_score),
combined_score = mean(combined_helpfulness_score)
)
# return accuracy by topic
classifications <- notes_data %>%
left_join(
.,
note_scores_data,
by = c("post_id","contributor_id")
) %>%
mutate(topic = if_else(topic == "Politics", "Politics", "Other")) %>%
group_by(
blatant_lie, topic
) %>%
summarise(
perc_helpful = mean(
note_rating == "Currently Rated Helpful",
na.rm = TRUE
),
perc_not_helpful = mean(
note_rating == "Currently Rated Not Helpful",
na.rm = TRUE
),
perc_needs_more = mean(
note_rating == "Needs More Ratings",
na.rm = TRUE
),
average_note_score = mean(
note_score,
na.rm = TRUE
)
)
return(
list(scores, classifications)
)
}
Then, we can cobble together a function that runs the whole system of functions with a set of arguments that allow us to change the value of these parameters on the fly. These parameters can be broken down into parameters which are nominally under control of the twitcher
s (i.e. their strategy) and those that are features of the environment:
Parameter | Description | |
---|---|---|
Strategy | rho |
Proportion of the population that are twitchers . |
Strategy | gamma |
Degree to which twitcher s focus on target topic. |
Strategy | notes_attention_multiplier |
Ratio of twitcher attention to birder attention. |
Strategy | ratings_attention_multiplier |
Ratio of twitcher attention to birder attention. |
Strategy | twitcher_speed |
Relative likelihood of sampling twitcher s to get valid ratings. |
Environment | number_posts |
Total posts in the ‘universe’. |
Environment | number_contributors |
Total contributors in the universe. |
Environment | notes_attention |
Number of posts birder s look at to add notes. |
Environment | ratings_attention |
Number of notes birder s rate. |
👇 birdwatchr()
.
#' @name birdwatchr
#' @description This function runs the whole system of functions we've
#' created and allows us to experiment with different strategy profiles
#' for the twitchers and see the resulting outcomes.
#' @param rho Proportion of the population that are twitchers.
#' @param gamma Degree to which twitchers focus on target topic.
#' @param notes_attention Number of posts birders look at to add notes.
#' @param notes_attention_multiplier Ratio of twitcher attention to birder attention.
#' @param ratings_attention Number of notes birders rate.
#' @param ratings_attention_multiplier Ratio of twitcher attention to birder attention.
#' @param number_posts Total posts in the 'universe'.
#' @param number_contributors Total contributors in the universe.
#' @param twitcher_speed Relative likelihood of sampling twitchers to get valid ratings.
birdwatchr <- function (
rho = .01,
gamma = .1,
notes_attention = 10,
notes_attention_multiplier = 1,
ratings_attention = 30,
ratings_attention_multiplier = 1,
number_posts = 1000,
number_contributors = 1000,
twitcher_speed = 1
) {
# create the environment and the players
#print("1. create the environment and the players")
posts <- create_posts(n_posts = number_posts)
contributors_data <- create_contributors(
n_contributors = number_contributors,
rho = rho
)
# create notes period
#print("2. create notes")
notes <- create_notes_dataset(
contributors_data_frame = contributors_data,
posts_data = posts,
param_gamma = gamma,
attention_span = notes_attention,
multiplier = notes_attention_multiplier
)
# create ratings period
#print("3. create ratings")
ratings <- create_ratings_dataset(
contributors_data_frame = contributors_data,
notes_data = notes,
attention_span = ratings_attention,
multiplier = ratings_attention_multiplier
)
# calculating scores
#print("4. calculate author helpfulness")
author_helpfulness_scores <- calculate_author_helpfulness(
iterations = 10,
contributors_data_frame = contributors_data,
ratings_data_frame = ratings
)
#print("5. calculate preliminary scores")
prelim_note_scores <- calculate_preliminary_note_scores(
ratings_data_frame = ratings,
author_scores_data = author_helpfulness_scores
)
#print("6. calculate rater helpfulness")
rater_helpfulness_scores <- calculate_rater_helpfulness(
ratings_data_frame = ratings,
prelim_scores_data = prelim_note_scores,
contributors_data_frame = contributors_data,
twitcher_speed_param = twitcher_speed
)
#print("7. calculate combined helpfulness")
combined_helpfulness_scores <- calculate_combined_helpfulness_score(
author_helpfulness_data = author_helpfulness_scores,
rater_helpfulness_data = rater_helpfulness_scores
)
#print("8. calculate note scores")
note_scores <- calculate_final_note_scores(
ratings_data = ratings,
combined_scores_data = combined_helpfulness_scores
)
# get observables
#print("9. return observables")
observables <- return_observables(
combined_scores_data = combined_helpfulness_scores,
note_scores_data = note_scores,
notes_data = notes
)
return(observables)
}
First we’ll conduct a preliminary exercise trying to identify whether or not there are any strategies that may allow twitcher
s to overcome Birdwatch safeguards. To do this, we’ll compare every potential strategy parameter at a baseline level (as close as possible to birder
behavior) and with each parameter dialed up to extreme, probably ridiculous levels. We’ll then (later) use the findings from these simulations to conduct more targeted simulation experiments where we conduct each trial many times.
From the following figure, we can see that yes, there are certain strategies that users can pursue to manipulate the current system. The more accounts they have in the system, the easier it is to manipulate. However, even in trials with as few as 30 users twitcher
s were able to get more than half of erroneous notes rated helpful. Greediness (\(\gamma\)) was also positively correlated with targeted misclassifications. Twitcher
speed did not appear to have too big an impact.
The largest differentiator in twitcher
success was the level of effort devoted to spamming ratings. The more notes they could rate compared to birder
s, the better. Doing so, and targeting members of their group, allows them to artificially increase their contributor scores which in turn allows them to boost the ratings of inaccurate notes.
This analysis serves as a useful proof of concept for the approach but is fairly limited in what we can actually discern about strategies we should anticipate. It seems that the strategy profiles that are most successful in getting more than 80% of the target topic mislabeled are also those that are very detectable, as we will discuss later.
The next step in this analysis is to conduct more experiments across the parameters that appear to have a demonstrable effect on twitcher
success. Rather than comparing only outcomes when the dial is at 1 or 11, we want to understand the relative tradeoffs at 1, 2, 3, and so on. I.e., what are the least detectable strategies twitcher
s could use to still consistently mislabel notes?
It is also worth noting that the model here is not inclusive of all potential twitcher
strategies. There are many schemes that are not included in the potential parameter space that we have built in here. However, the benefit of using this simulation approach is that it is relatively straightforward to update models of user behavior and re-run experiments.
We’ve demonstrated that, in theory, a coordinated group of participants could undermine the current system and flag true posts as misleading. However, it looks as though there are likely clear markers of these strategies in observable features of contributor behavior. Namely, we can examine the connectedness of users in a nominally anonymous system and their ratings behavior of participants if their group status is known (whether they are a twitcher
or a birder
).
The strategy we’ve identified relies on twitcher
s being able to build up each other’s contributor scores through targeted rating of notes. Identifying such behavior should be relatively straightforward from a community detection standpoint. Conditional on exposure to notes, which I can’t observe, the directional network of ratings should approximate a random graph if there is no coordination. If certain participants are coordinating, their in group network will be much more dense.
We can identify communities using standard community detection algorithms and then use measures of in-group network density to identify groups which are outliers in terms of the frequency of connections in local neighborhoods.
Based on the strategies we’ve identified above, we should also consider the following indicators based on rating behaviors:
For both of these measures, a higher value would be more suggestive of coordinated abusive behavior.
There are many approaches to identifying communities within larger networks. We take a standard, off-the-shelf approach and use the infomap algorithm presented in Rosvall and Bergstrom (2007) and implemented by Csardi, Nepusz, and others (2006). The advantage of this approach is that it is relatively good at detecting even small groups. Given that we are searching for potentially small or non-existent needles in a large haystack of helpful contributors, this is a decided advantage.
We can take our ratings data and convert it into a directional network on which we run infomap and calculate some basic network statistics.
👇 creating a directed network from ratings data.
#' @name create_network_data
#' @description Reads in contributors data and the ratings data to produce a
#' tbl_graph object which we use to manipulate and calculate network measures.
#' requires tidygraph.
#' @param ratings_data Ratings data frame.
#' @param contributors_data_frame data object with contributors.
#' TODO: allow users to identify participant IDs and don't
#' hard code to and from info.
create_network_data <- function(
contributors_data_frame = contributors_data,
ratings_data = ratings
) {
# create edges
edges_data <- ratings_data %>%
group_by(contributor_id, rater_id) %>%
slice(1) %>%
select(contributor_id, rater_id) %>%
rename(from = rater_id, to = contributor_id)
# convert to network data
network_graph <- tbl_graph(
nodes = contributors_data_frame,
edges = edges_data,
node_key = "contributor_id",
directed = TRUE
) %>%
# group with infomap
mutate(group_im = group_infomap()) %>%
# Calculate neighborhood graph size
mutate(neighborhood_edges =
map_local_dbl(
.f = function(neighborhood, ...) {
igraph::gsize(neighborhood)
}
))
return(network_graph)
}
network_graph <- create_network_data(
contributors_data,
ratings
)
Infomap identifies potential groups and we can then calculate some group-level statistics to identify communities that may be engaging in coordinated behavior. For now, we simply take the group that has the highest average connections in a local neighborhood to identify over-connected groups. However, in a real world setting, as we will see with the real Birdwatch data, we will want to also look at in-group/out-group rating behaviors and the substantive topics of notes.
When we run this simple set-up in our simulated data, it works:
Repeating this detection system 100 times in randomly generated data finds that 100% of the time, it works every time (in simulated data 🐁). However, we need to conduct additional sensitivity analysis and think about recursive strategies (i.e. what strategies might twitcher
s take to avoid detection).
For starters, we want to examine at what point a detection system falls apart. Does it fail with groups that are smaller than 20 participants? 10? 5? Can twitcher
s balance their ratings behavior to avoid detection while still undermining the ratings system?
We would also want to formally incorporate rating behavior into detecting outlier groups and potentially look for more than one group. Most importantly, we need to build up our simulation to allow our synthetic data to be as similar as possible to the real world Birdwatch data.
#' @name detection_summary
#' @description Simple helper function to calculate the proportion of the
#' identified group which is actually twitchers.
#' @param network_graph_object Data object created by `create_network_data`.
detection_summary <- function(network_graph_object) {
network_summary <-
network_graph_object %>%
activate(nodes) %>%
as_tibble() %>%
group_by(group_im) %>%
mutate(
grp_neighborhood_size = mean(neighborhood_edges)
) %>%
ungroup() %>%
mutate(
twitch_grp = if_else(
grp_neighborhood_size == max(grp_neighborhood_size),
1,
0
)
) %>%
mutate(twitcher = if_else(type == "twitcher", 1, 0)) %>%
group_by(twitch_grp) %>%
summarise(
perc_twitcher = mean(twitcher),
num_twitcher = sum(twitcher)
)
return(network_summary)
}
With these caveats in mind, we can walk through the same community detection process on real world data to see whether there is any evidence of coordinated activity. With innumerable thanks to the @birdwatch team for making their data publicly available, we’ll use “ratings-00000.tsv” to construct a directed network of contributors. We’ll then use community detection approaches to identify groups of participants, and then we’ll look for groups that are outliers in terms of local network density and in-group favoritism.
When we do this, we see that there are at least five identified groups that are outliers in terms of the density of local connections. Is it possible that these are just active users who are exposed to similar posts and, therefore notes? Totally. But, they definitely merit a closer look.
If these groups are indeed coordinating, there should be some additional digital breadcrumbs aside from just connectivity. Most importantly, twitcher
s should be rating members of their in group more positively compared to out groups.
For each group, we can calculate the percentage of notes members rate as helpful for both in-group members and out-group members. Then, dividing the former by the latter gives us a simple ratio of how much more likely a group is to rate its own group as helpful compared to others. A value greater than one would suggests preferential treatment.
Of the groups we identified as outliers, we see that none of them look particularly suspicious. While group 80 appears to preferentially rate members of their own group as helpful, this is based on only 7 in-group ratings which isn’t necessarily what we would expect from a group trying to boost its ratings. Group 28 was not identified as an outlier in terms of local connectivity, but it has a very high helpfulness ratio and more in group ratings but still far from what we would expect from coordinated activity.
More to the point, these groupings are totally consistent with similar users engaging with similar content. For example, if I only follow Formula 1 accounts and another Birdwatch contributor only follows Formula 1 accounts, I am more likely to see their notes. Furthermore, if we are both good and right-thinking @McLarenF1 supporters, I may also be more likely to rank their comments as helpful. I do not have the data to account for these dynamics, but it would likely be straightforward to do so for the folks over @birdwatch based on the topics and accounts participants follow.
In conjunction with additional research, Agent Based Modeling can be a useful tool for identifying vulnerabilities in community ratings systems. By reconstructing the Birdwatch rating systems and providing a simple tool to simulate user behavior, I’ve demonstrated that there are certain strategies that coordinated groups could take to abuse the Birdwatch ratings system, labeling erroneous notes as helpful and vice versa. Fortunately, these strategies leave clear fingerprints in observable user behavior data which can be used for detection. I find no evidence for coordinated abuse in the existing Birdwatch data, but there is still a lot to do to improve detection and understand the limits of potential approaches and how they fit into a broader strategy of building trust on the platform and limiting the spread of misinformation.
There are many limitations to this approach in regards to answering the narrow question of whether or not coordinated behavior can undermine the Birdwatch ratings system. Chief among these is the degree to which the simulated environment is an abstraction of real world user behavior. However, while there are lots of small things that can be improved from a modeling perspective, the findings here are best understood in conjunction with traditional UX research aiming to understand how Twitter contributors perceive note ratings, the degree to which they attribute errors to the system as a whole or individual users, and trust in Birdwatch/Twitter as a whole.
With this in mind, it is not clear whether the best approach to remedying potential vulnerabilities is one of improving detection procedures or design procedures. In other words, is it better to have a more easily understood system for contributors and focus on detecting potential abuse or is it better to build a more robust system that makes it more difficult to undermine the ratings system.
These aren’t questions that simulations can answer. Understanding how and why Birdwatch participants and Twitter users writ large might come to trust or distrust ratings is a whole research program in its own right. Key to this program, I believe, is understanding how potential participants may react to seeing clearly false notes rated as helpful.
Nevertheless, there are many limitations of these simulations that shape how we should think about the scope and urgency of those questions. Namely, We currently run the simulation with a fixed topic size (i.e. the proportion of the universe of tweets devoted to a specific topic). It is likely easier to mess with the rating systems of smaller, less frequent topics.
Moreover, the over-connectedness of coordinated actors is somewhat exaggerated in the model which likely overestimates the success of the very preliminary community detection approach outlined above. Making these connections as close to what we would observe in the real world will be key to improving detection, understanding its limitations, and most importantly, understanding what strategies twitcher
s might take to limit detection while still mislabelling notes on a particular topic.
👇 Next steps:
twitcher
s to vary the degree to which they rate other twitcher
s versus birder
s.twitcher
s undermine?).birder
s that preferentially follow different topics.Thanks to the amazing open source software developers who have built the suite of packages used to build this analysis. This note uses the following packages (all available on CRAN): tidyverse, showtext, gt, DT, ggiraph, fabricatr, knitr, tidygraph, and distill. Many thanks also to Birdwatch and Twitter for making their data publicly available.
Csardi, Gabor, Tamas Nepusz, and others. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal, Complex Systems 1695 (5): 1–9.
Rosvall, Martin, and Carl T Bergstrom. 2007. “An Information-Theoretic Framework for Resolving Community Structure in Complex Networks.” Proceedings of the National Academy of Sciences 104 (18): 7327–31.