Monthly Archives: October 2017

Sick echo chambers

Suppressing the Search Engine Manipulation Effect (SEME)

1 Reply

Suppressing the Search Engine Manipulation Effect (SEME)

Authors
- Robert Epstein, (American Institute for Behavioral Research and Technology) Epstein and Robertson have found in multiple studies that search rankings that favor a political candidate drive the votes of undecided voters toward that candidate, an effect they call SEME (“seem”), the Search Engine Manipulation Effect.
- Ronald Robertson (Northeastern University) I design experiments and technologies to explore the ways in which online platforms can influence the attitudes, beliefs, and behavior of individuals and groups. Currently, I am a PhD student in the world’s first Network Science PhD program at Northeastern University and am advised by Christo Wilson and David Lazer.
- David Lazer (Northeastern University) professor of political science and computer and information science and the co-director of the NULab for Texts, Maps, and Networks
  - Who Wants to Deliberate – and Why?
- Christo Wilson (Northeastern University) Assistant Professor in the College of Computer and Information Science atNortheastern University. I am a member of the Cybersecurity and Privacy Institute and the Director of the BS in Cybersecurity Program in the College.

Abstract: A recent series of experiments demonstrated that introducing ranking bias to election-related search engine results can have a strong and undetectable influence on the preferences of undecided voters. This phenomenon, called the Search Engine Manipulation Effect (SEME), exerts influence largely through order effects that are enhanced in a digital context. We present data from three new experiments involving 3,600 subjects in 39 countries in which we replicate SEME and test design interventions for suppressing the effect. In the replication, voting preferences shifted by 39.0%, a number almost identical to the shift found in a previously published experiment (37.1%). Alerting users to the ranking bias reduced the shift to 22.1%, and more detailed alerts reduced it to 13.8%. Users’ browsing behaviors were also significantly altered by the alerts, with more clicks and time going to lower-ranked search results. Although bias alerts were effective in suppressing SEME, we found that SEME could be completely eliminated only by alternating search results – in effect, with an equal-time rule. We propose a browser extension capable of deploying bias alerts in real-time and speculate that SEME might be impacting a wide range of decision-making, not just voting, in which case search engines might need to be strictly regulated.

Introduction
- Recent research has shown that society’s growing dependence on ranking algorithms leaves our psychological heuristics and vulnerabilities susceptible to their influence on an unprecedented scale and in unexpected ways
- Experiments conducted on Facebook’s Newsfeed have demonstrated that subtle ranking manipulations can influence the emotional language people use
- Similarly, experiments on web search have shown that manipulating election-related search engine rankings can shift the voting preferences of undecided voters by 20% or more after a single search
- While “bias” can be ambiguous, our focus is on the ranking bias recently quantified by Kulshrestha et al. with Twitter rankings
- Our results provide support for the robustness of SEME and create a foundation for future efforts to mitigate ranking bias. More broadly, our work adds to the growing literature that provides an empirical basis to calls for algorithm accountability and transparency [24, 25, 90, 91] and contributes a quantitative approach that complements the qualitative literature on designing interventions for ranking algorithms
- Our results also suggest that proactive strategies that prevent ranking bias (e.g., alternating rankings) are more effective than reactive strategies that suppress the effect through design interventions like bias alerts. Given the accumulating evidence, we speculate that SEME may be impacting a wide range of decision-making, not just voting
Related Work
- Order effects are among the strongest and most reliable effects ever discovered in the psychological sciences [29, 88]. These effects favorably affect the recall and evaluation of items at the beginning of a list (primacy) and at the end of a list (recency).
  - There does not seem to be an equivalent primacy effect in maps that I can find
- online systems can: (1) provide a platform for constant, large-scale, rapid experimentation, (2) tailor their persuasive strategies by mining detailed demographic and behavioral profiles of users [1, 6, 9, 18, 121], and (3) provide users with a sense of control over the system that enhances their susceptibility to influence
  - Is this flocking from the flock’s perspective? Sort of an Ur-flock?
  - This is that Trust/Awareness equation again
- A recent report involving 33,000 people found that search engines were the most trusted source of news, with 64% of people reporting that they trust search engines, compared to 57% for traditional media, 51% for online media, and 41% for social media [10]. Similarly, a 2012 survey by Pew found that 73% of search engine users report that “all or most of the information they find is accurate and trustworthy,” and 66% report that “search engines are a fair and unbiased source of information” [105].
- Suggestions for fostering resistance can be broken down into two primary strategies: (1) providing forewarnings [43, 49] and (2) training and motivating people to resist [79, 120].
  - Interesting that alternate, non-ordered design approaches aren’t even mentioned
- Part of the reason that forewarnings work is explained by psychological reactance theory [12], which posits that when people believe their intellectual freedom is threatened – by exposing an attempt to persuade, for example – they react in the direction opposite that of the intended one
- In the context of online media bias, researchers have primarily explored methods for curbing the effects of algorithmic filtering and selective exposure [87, 96] rather than ranking bias [71]. In this vein, researchers have developed services that encourage users to explore multiple perspectives [97, 98] and browser extensions that gamify and encourage balanced political news consumption [19, 20, 86]. However, these solutions are somewhat impractical because they require users to adopt new services or exert additional effort.
Methods – Experiment Design
- To construct biased search rankings we asked four independent raters to provide bias ratings of the webpages we collected on an 11-point Likert scale ranging from -5 “favors Cameron” to +5 “favors Miliband”. We then selected the 15 webpages that most strongly favored Cameron and the 15 that most strongly favored Miliband to create three bias groups
- The query in the search engine was fixed as “UK Politics ‘David Cameron’ OR ‘Ed Miliband’”, and subjects could not reformulate it.
- On top of assignment to a bias group, subjects were randomly assigned to one of three alert experiments.We drew from the literature on decision-making and design intervention to implement so-called debiasing strategies for improving decision-making in the presence of biased information [39, 78, 82]. Specifically, we constructed and placed alerts in the search results produced by our mock search engine that provided forewarnings with salient graphics, autonomony-supportive language, and details on the persuasive threat
Methods – Procedure
- After providing informed consent and answering basic demographic questions
  - Do this and use this phrase!
- Subjects then rated the two candidates on 10-point Likert scales with respect to their overall impression of each candidate, how much they trusted each candidate, and how much they liked each candidate. Subjects also indicated their likelihood of voting for one candidate or the other on an 11-point Likert scale where the candidates’ names appeared at opposite ends of the scale and 0 indicated no preference, as well as on a binary choice question where subjects indicated who they would vote for if the election were held today.
  - This is a good way to set up the game. People read the dilemma, formulate an initial solution and their level of commitment to it. They can choose to make it “public” as their first statement or to keep it private and display a “no opinion” initial statement
- We asked: “While you were doing your online research on the candidates, did you notice anything about the search results that bothered you in any way?” and prompted subjects to explain what had bothered them in a free response format: “If you answered “yes,” please tell us what bothered you.” We did not directly ask subjects whether they had “noticed bias” to avoid the inflation of false positive rates that leading questions can cause
Methods – Participants
- We recruited 3,883 subjects between April 28, 2015 and May 6, 2015 on Amazon’s Mechanical Turk (AMT; https://mturk.com), a subject pool frequently used by behavioral, economic, and social science researchers [8, 13, 102]. We excluded from our analysis subjects who reported an English fluency level of 5 or less (on a scale of 1 to 10) (n=26)
  - MTurk would be a good source of participants as well
Analysys – Search metrics
- Utilizing Kolmogorov-Smirnov (K-S) tests of differences in distributions, we found significant differences in the patterns of time spent on the 30 webpages between subjects in the no alert experiment (correlation with ranking: Spearman’s ρ = -0.836, P <0.001) and the high alert experiment (ρ = -0.654, P <0.001) (K-S D = 0.467, P <0.01), and between subjects in the low alert experiment (ρ = -0.719, P <0.001) and the high alert experiment (K-S D = 0.400, P <0.01)
  - A way of looking for explore/exploit populations? And how fast can it be determined? Google uses a mechanism to stop an experiment once a confidence level is reached. Also, bootstrap would be good here
- Similarly, we also found significant differences in the patterns of clicks that subjects made on the 30 webpages between subjects in the no alert experiment (ρ = -0.865, P <0.001) and the high alert experiment (ρ = -0.795, P <0.001) (K-S D = 0.500, P <0.001), and between subjects in the low alert experiment (ρ = -0.876, P <0.001) and the high alert experiment (K-S D = 0.367, P <0.05)
- Among all conditions,we found that differences in the patterns of time and clicks on the individual rankings primarily emerged on the first SERP, but less so on the second, fourth, and fifth SERPs
Analysys – Attitude Shifts
- we found that the mean shifts in candidate ratings for the bias groups significantly converged on the mean shift found in the neutral group as the level of detail in the alerts increased, with high alerts creating higher convergence than low alerts
  - As more diverse information is injected, populations compromise
Analysys – Vote Shifts
- Vote Manipulation Power (VMP)is the percent change in the number of subjects, in the two bias groups combined, who indicated that they would vote for the candidate who was favored by their search rankings. That is, if x and x ′ subjects in the bias groups said they would vote for the favored candidate before and after conducting the search, respectively, then VMP = (x ′ − x)/x.
  - This could also be applied to the game to watch how votes for an outcome change over time. In the case of the game, new candidates can come into existence, so we need to watch for that.
Analysys – Bias Awareness
- We found 8.1% of subjects that showed awareness of the bias in the no alert experiment, a figure identical to the 8.1% awareness rate found by Eslami et al. in their audit of Booking.com [37], and similar to the 8.6% of subjects who showed awareness in the original study [30]. The percentage of subjects showing bias awareness increased to 21.5% in the low alert experiment, and 23.4% in the high alert experiment.
Discussion
- However, despite the additional suppression of the high alerts, the lowest VMP was found among the neutral group subjects: rankings alternating between favoring the two candidates prevented SEME.
  - This configuration forces users to “explore” more, within the context of a list affordance.
- As with previous research on SEME [30], and with research on attitude change and influence more generally [3, 72, 120], we found that subjects vary in their susceptibility to SEME, as well as in their responsiveness to the alerts, based on their personal characteristics (Figure 6 and Figure 7 in the Appendix).
  - Explorer and exploiter populations?
- As more people turn to the internet for political news [85, 115], designing systems that can monitor and suppress the effects of algorithm biases, like ranking bias, will play an increasingly important role in protecting the public’s psychological vulnerabilities.
  - And one of the big issues is finding bias at scale with domain independence
- Real-time automated bias detection could potentially be achieved by utilizing a Natural Language Processing (NLP) approach. One could utilize opinions [75], sentiment [99], linguistic patterns [109], word associations [14], or recursive neural networks [59] with human-coded data to classify biased language.
  - Scale and domain problems.
Discussion – Awareness of bias
- Awareness of ranking bias appears to suppress SEME only when it occurs in conjunction with a bias alert, perhaps because an alert is a kind of warning–inherently negative in nature.
  - According to Moscovici, an inherently negative construct should reduce polarization movement.
- Awareness of ranking bias in the absence of bias alerts might increase VMP because people perceive the bias as a kind of social proof [111, 112], made all the more powerful because of the disproportionate trust people have in search rankings [10, 95, 105]. The user’s interpretation might be, “This candidate MUST be good, because even the search results say so.”

Some thoughts about awareness and trust

1 Reply

I had some more thoughts about how behavior patterns emerge from the interplay between trust and awareness. I think the following may be true:

Awareness refers to how complete the knowledge of an information domain is. Completely aware indicates complete information. Unaware indicates not only absent information but no knowledge of the domain at all.
Trust is a social construct to deal with incomplete information. It’s a shortcut that essentially states “based on some set of past experiences, I will assume that this (now trusted) entity will behave in a predictable, reliable, and beneficial way for me”
Healthy behaviors emerge when trust and awareness are equivalent.
Low trust and low awareness is reasonable. It’s like walking through a dark, unknown space. You go slow, bump into things, and adjust.
Low trust and high awareness is paralytic.
High trust and low awareness is reckless. Runaway conditions like echo chambers. The quandary here is that high trust is efficient. Consider the prisoner’s dilemma:
1. 2. In the normal case, the two criminals have to evaluate what the best action is based on all the actions the other individual could choose, ideally resulting in a Nash Equilibrium. For two players (p), there are 4 choices (c). However, if each player believes that the other player will make the same choice, then only the two diagonal choices remain. For two players, this reduces the complexity by half. But for multiple dissimilar players, the options go up by c^p, so that if this were The Usual Suspects, there would be 32 possibilities to be worked out by each player. But for 5 identical prisoners, the number of choices remains 2, which is basically “what should we all do?”. The more we believe that the others in our social group see the world the same way, the less work we all have to do.
Diversity is a mechanism for extending awareness, but it depends on trusting those who are different. That may be the essence of the explore/exploit dilemma.
Attention is a form of focused awareness, can reduce general awareness. This is one to the reasons that Tufekci’s thoughts on the attention economy matter so much. As technology increases attention on proportionally more “marketable” items, the population’s social awareness is distorted.
In a healthy group context, trust falls off as a function of awareness. That’s why we get flocking. That is the pattern that emerges when you trust more those who are close, while they in turn do the same, building a web of interaction. It’s kind of like interacting ripples?
This may work for any collection of entities that have varied states that undergo change in some predictable way. If they were completely random, then awareness of the state is impossible, and trust should be zero.
1. Human agent trust chains might proceed from self to family to friends to community, etc.
2. Machine agent trust chains might proceed from self to direct connections (thumb drives, etc) to LAN/WAN to WAN
3. Genetic agent trust chain is short – self to species. Contact is only for reproduction. Interaction would reflect the very long sampling times.
4. Note that (1) is evolved and is based on incremental and repeated interactions, while (2) is designed and is based on arbitrary rules that can change rapidly. Genetics are maybe dealing with different incentives? The only issue is persisting and spreading (which helps in the persisting)
Computer-mediated-communication disturbs this process (as does probably every form of mass communication) because the trust in the system is applied to the trust of the content. This can work in both ways. For example, lowering trust in the press allows for claims of Fake News. Raising the trust of social networks that channel anonymous online sources allows for conspiracy thinking.
An emerging risk is how this affects artificial intelligence, given that currently high trust in the algorithms and training sets is assumed by the builders
1. Low numbers of training sets mean low diversity/awareness,
2. Low numbers of algorithms (DNNs) also mean low diversity/awareness
3. Since training/learning is spread by update, the installed base is essentially multiple instances of the same individual. So no diversity and very high trust. That’s a recipe for a stampede of 10,000 self driving cars.

Since I wrote this, I’ve had some additional thoughts. I think that our understanding of Awareness and Trust is getting confused with Faith and Doubt. Much of what we believe to be true is no longer based on direct evidence, or even an understandable chain of reasoning. Particularly as more and more of our understanding comes from statistical analysis of large sets of fuzzy data, the line between Awareness and Faith becomes blurred, I think.

Doubt is an important part of faith, and it has to do with the mind coming up against the unknowable. The question does God exist? contains the basics of the tension between faith and doubt. Proving the existence of God can even be thought of as distraction from the attempt to come to terms with the mysteries of life. Within every one of us is the ability to reject all prior religious thought and start our own journey that aligns with our personal understandings.

Conversely, it is impossible to increase awareness without trusting the prior work. Isaac Newton had to trust in large part, the shoulders of the giants he stood on, even if he was refining notions of what gravity was. So too with Albert Einstein, Rosalind Franklin and others in their fields. The scientific method is a framework for building a large, broad-based, interlocking tapestry awareness.

When science is approached from a perspective of Faith and Doubt, communities like the Flat Earth Society emerge. It’s based on the faith that the since the world appears flat here, it must be flat everywhere, and doubt of a history of esoteric measurements and math that disprove this personally reasonable assumption. From this perspective, the Flat Earthers are a protestant movement, much in the way that the community that emerged around Martin Luther, when he rejected the organized, carefully constructed orthodoxy of the Catholic Church, based on his personally reasonable interpretation of scripture.

Confusing Awareness and Trust with Faith and Doubt is toxic to both. Ongoing, systemic doubt in trustworthy information will destroy progress, ultimately unraveling the tapestry of awareness. Trust that mysteries can be proven is toxic in its own way, since it gives rise to confusion between reality and fantasy like we see in doomsday cults.

My sense is that as our ability to manipulate and present information is handed over to machines, that we will need to educate them in these differences, and make sure that they do not become as confused as we are. Because we are rapidly heading for a time where these machines will be co complex and capable that our trust in them will be based on faith.

Phlog

nearly decomposable

Monthly Archives: October 2017

Sick echo chambers

Suppressing the Search Engine Manipulation Effect (SEME)

Some thoughts about awareness and trust