Trustworthy Anonymous Citizen Journalism From Really Scary Places

Tentative findings/implications for design

Pervasive serendipity
Civilizational vs. tribal affordances

The current applications:

“The Friction of Fear versus the Currency of Trust”

Would you like to see reporting information from places like Syria or North Korea safer and more trustworthy? Are you familiar with apps like Secret and Whisper but wish they were better? Me too.

People have a need to speak out about things even when that may land them in trouble. So far, the internet has provided many ways for people to speak out, blogs, facebook, twitter etc. Unfortunately, it’s all to easy to get in trouble speaking your mind in these mediums, because we are very traceable, unless we take extraordinary precautions.

I believe that technology is failing to provide these people with a usable means of expressing themselves without fear of discovery. People will always need to speak from the safety of true (or at least good enough) anonymity.

But anonymity isn’t enough. We, the public need to be able to know how trustworthy information from such anonymous sources is. Without some way of determining trust, true news can easily get lost in a wash of misinformation, trolling and rumour.

I’m researching ways to anonymously report trustworthy news by trying to produce technologies that will allow people to report the news from scary places without fear of retaliation for them, their family or their friends.

This work is in support of my PhD thesis which has the working title of “Trustworthy Anonymous Citizen Journalism from Really Scary Places” The goal of this effort is to provide a means of producing useful, reliable information/news in areas where gathering information is particularly difficult (e.g. the Syrian Civil War, Mexican drug cartels, etc). Information should be produced anonymously in a way that a user cannot be counterfeited, and users are not ever identified sufficiently to be placed at risk. It will initially be a web-interactive platform where individuals could post news items anonymously, but in such a way that the information in the post can be cross-referenced to determine the likely veracity or ‘trustworthiness’ of the post.

In such a system anonymity is critical since other actors may desperately wish to determine the identity of the poster. The goal for this research is to determine ways that anonymity can be so thoroughly “baked in” to the design that even if the servers were completely hacked, no information that could reliably point to a particular individual could be recovered.

To determine trustworthiness, it helps greatly if the software can “recognize” a user from their interaction, without ever knowing any other identifying information (such as a login). Work has been done that can identify users [1][2][3], and that can evaluate the users level of cognitive stress [4] from typing patterns. Using this work in a browser-context, I intend to determine the viability of using instrumented measures of user behavior (typing patterns, word use, etc.) to recognize returning users to websites or internet-capable applications without using any specific identifying information.

In this way, the system need never know the user’s offline identity. Only the information provided by the user becomes valuable, and the connections or correlations between information from various users provides the basis for determining relative trustworthiness.

In other words, if a number of users assert over a period of time that “the sky is blue”, then that element gains in trustworthiness. On the other hand, even if one user “trolls” the system with repeated statements that the sky is “yellow with purple polka-dots”, that information can be correlated with a particular (still anonymous) user and classified accordingly.

Currently I’m working on a pilot study that results in software that can reliably (though not absolutely) distinguish among multiple submissions by multiple users so that one user’s corpus can be reliably distinguished from another’s while maintaining absolute anonymity. Once users can be recognized, topics and tags associated with those users can be associated using clustering and other statistical means.

100% recognition is not require here, so this is different from efforts to replace passwords, for example. If the system is only 50% confident that a novel news item is coming from a “trustworthy” source, then the reliability weight of the news item is proportianately reduced. WIth luck, the system should prove to be reasonably robust. Further integration with other, external fact-checking sites may also be used to determine the veracity of items posted by users.

Subsequent studies will attempt to extend the work of the pilot study across progressively larger population of users so that any limitations of the initial studies can be uncovered.

Chang, M., et al. “Capturing Cognitive Fingerprints from Keystroke Dynamics for Active Authentication.” (2013): 1-1.
Monrose, Fabian, and Aviel D. Rubin. “Keystroke dynamics as a biometric for authentication.” Future Generation computer systems 16.4 (2000): 351-359.
Haider, Sajjad, Ahmed Abbas, and Abbas K. Zaidi. “A multi-technique approach for user identification through keystroke dynamics.” Systems, Man, and Cybernetics, 2000 IEEE International Conference on. Vol. 2. IEEE, 2000.
Vizer, Lisa Michele. Detecting cognitive stress and impairment using keystroke and linguistic features of typed text: Toward a method for continuous monitoring of cognitive status. Diss. UNIVERSITY OF MARYLAND, BALTIMORE COUNTY, 2013.

What follows are my notes and thinking on the topic

This came up in an online discussion and I thought it was worth sharing.

http://infolab.stanford.edu/~widom/paper-writing.html

If you don't have time to read this, this is probably the most useful part: the Introduction section covers the major points that any
research project needs to be able to answer quickly and clearly:

1. What is the problem?
2. Why is it interesting and important?
3. Why is it hard? (E.g., why do naive approaches fail?)
4. Why hasn't it been solved before? (Or, what's wrong with previous proposed solutions? How does mine differ?)
5. What are the key components of my approach and results? Also
include any specific limitations.

Shaun

Current trustworthiness project

Pull stories from Google News (top-level feeds: World, national, entertainment, etc.) RSS, parse them, and put them in the database
Using Alchemy NLP, pull out the authors, subjects, links, etc from the stories. Search for them in the Alchemy News API. Use this to populate relevant tables (author, etc) that can point back to the main article
We’ll need some ratings tables as well. The information should include the rating and the links that support the statement. If there is freeform text, we could run some NLP on it for sentiment, etc.
Provide the list to the browser as the navigator, with the trustworthiness annotations
When a story is clicked, show the associated network with that story. Each item can be clicked to bring up information about that attribute (as a pop-up?)

This is why the world needs this

An example of a really scary place – NPR
Someone Is Spilling ISIS’s Secrets on Twitter – Daily Beast
Afghans flock to Kabul Taxi, a satirical Facebook page that spares no one
Twitter Was So-So at Debunking That False Rumor About Female Genital Mutilation and ISIS – New York Magazine
Israel, Gaza, War & Data social networks and the art of personalizing propaganda – Medium.com
She Tweeted Against the Mexican Cartels. They Tweeted Her Murder.
- McGahan, Jason. “She Tweeted Against the Mexican Cartels. They Tweeted Her Murder.” The Daily Beast. Newsweek/Daily Beast, 21 Oct. 2014. Web. 21 Oct. 2014.
- Important quote: “But when the kidnappers went through the doctor’s cellphone, according to the Zócalo story, they saw her Twitter account, realized she was Felina, and executed her. With her cellphone, they were able to terrorize her followers with the photos and messages.“
- This is why there is no app. The webpage should also redirect to something bland like Google when accessed through history.
GamerGate (trolling, fake facts, etc)
Storyful
- Behind The Scenes, Storyful Exposes Viral Hoaxes For News Outlets (NPR – 10.29.14)
The World Cracks Down on the Internet – Vouhini, Vara. “The World Cracks Down on the Internet – The New Yorker.”The New Yorker. Conde Nast, 4 Dec. 2014. Web. 04 Dec. 2014.
Why newsrooms should train their communities in verification, news literacy, and eyewitness media
News as collaborative intelligence: Correcting the myths about news in the digital age (overview on Brookings)
Mukto-Mona (www.mukto-mona.com) is an Internet congregation of freethinkers, rationalists, skeptics, atheists & humanists of mainly Bengali and South Asian descent who are scattered across the globe. Our mission is to promote science, rationalism, secularism, freethinking, human rights, religious tolerance, and harmony amongst all people in the globe.

What is the one thing you know and how do your research questions relate to that?

Model-based systems vs. Pattern-based (Markerless-registration). See Peter Norvig’s page. Probability and spell check.
Microsoft Azure Machine Learning service article from Computer World

And things to consider…

The research questions that pertain to this effort are:

How to determine identity (uniqueness?) in an anonymous way? Just to make things harder, we need ways that can’t be used indirectly to identify someone, like GPS movement patterns. The assumption to be tested is that identity can be recognized by detecting patterns of action, rather than a login, for example. Also, the anonymization needs to happen on the client. Websites can be hijacked (although for initial testing and gaming, raw data on the server makes sense).
How to determine the trustworthiness of information gathered anonymously, using crowdsourcing techniques?
How do people behave when they know they are anonymous? And given the tradeoffs that hide identity such as aggregation?

The initial approaches for answering these questions will be to look at the following:

Examine the biometrics that can be detected using mobile technology, and see if this information can be used to reliably and uniquely detect if a single user is interacting with the mobile device. Examples of work in this field are
1. Unique in the Crowd: The privacy bounds of human mobility
2. Identifying User Traits by Mining Smart Phone Accelerometer Data
3. Human Identification via Gait Recognition Using Accelerometer Gyro Forces
4. A Password So Secret, You Don’t Consciously Know It (similar)
5. Accelerometer-Based Transportation Mode Detection on Smartphones
6. Capturing Cognitive Fingerprints from Keystroke Dynamics (overview article)
7. Cell Phone-Based Biometric Identification
8. LatentGesture
9. Using hidden markov models for accelerometer-based biometric gait recognition
10. Biometric Gait Authentication Using Accelerometer Sensor
11. And here’s a data source with approximately 60 million unique samples of accelerometer data collected from 387 different devices. With code that can be used apparently.
12. Researchers develop ‘narrative authentication’ system – not sure it this is directly relevant, but think about other possible but nonobvious ways that identity could be built up by observing patterns of usage, activity patterns, etc.
13. Extracting insights from the shape of complex data using topology – This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find.
14. Using Machine Learning and NodeJS to detect the gender of Instagram Users
15. Typing patterns could be good for a proof-of-concept, or as part of a final system. It looks like all the calculation could be done on the device to produce an ID. My guess is that the value would tend to drift, so users would be recognized using probabilities.
  1. Typing Patterns: A Key to User Identification
  2. Privacy: Gone with the Typing! Identifying Web Users by Their Typing Patterns
  3. Keystroke dynamics as a biometric for authentication
  4. A multi-technique approach for user identification through keystroke dynamics
  5. How the way you type can shatter anonymity—even on Tor
16. Once we can recognize people, can we tell when they are under stress or unreliable? (a related article on normalicy profiles in computers)
17. Avoiding Crowdsourcing problems
  1. How Mechanical Turk is Broken | MIT Technology Review
  2. Creating speech and language data with Amazon’s Mechanical Turk
  3. Soylent: a word processor with a crowd inside
  4. The DARPA balloon challenge?
18. Via Andrea Choiniere
  1. Thesis on this topic, including code used for walking detection and step cycle calculation:
  2. Research paper implementing accelerometer biometric test using controlled phone position and controlled walkway:
  3. Example using controlled phone position and quasi-controlled activities:
  4. Example using various activities but under quasi-controlled conditions:
19. Meeting with Andrea on 12.30.13
  1. Looks like uniqueness can be determined by word choice in a corpus as small as 500. That does make things easier. It also allows for triangulation against other metrics, which would allow for looking at accelerometer data from several body positions for example. Though I’m not sure that’s needed.
  2. An issue to consider is that people who might use the site would have other examples of their writing. This means that an anonymous source could be identified. As a way around this, a vector-based translation algorithm could be trained using identification code to remove/modify the parts of the user’s language that are identifiable.
  3. And actually, this means that I could build a simple website that you train once, then write to. The site then transpiles the user’s words and publishes to twitter or Facebook for example. This just addresses the anonymous part of the problem, not the trustworthiness part, but it’s nice low-hanging fruit.
20. Fika – 10.17.14
  1. Flame War Detection using Naive Bayes – Amy
  2. Typing Patterns: A Key to User Identification – Amy
  3. Keystroke data – Amy
  4. Andrés Monroy-Hernández – Helena
  5. Jeanine Finn – Alyson
Examine how groups of people interact with “newsworthy” events. Are there means by which the use up multiple visual and audio perspectives can make it unlikely that the event is counterfeit in some way. Some work has already been done with respect to event authenticity as seen in this Poynter article. The technology behind shooting acoustic analysis could be useful here, as discussed by the Washington Post. And of course, there’s Storify, Storyful and Project EPIC.
1. Cornell Social Lab – CityBeat automated news gathering from social networks: http://www.news.cornell.edu/stories/2013/12/apps-make-sense-social-media-noise
2. Social Physics
A third element combines the previous two, by attempting to determine the “trustworthiness” of an individual based on prior interactions with “newsworthy” events.
1. Belief Dynamics and Decision Making “..models must consider the roles of beliefs, attitudes, and sacred values within a culture, and how they interact with institutional constraints and perceived external pressures. They must address behaviors within a culture at the levels of the individual, the group, and the governing body. The most important objective of our MURI is to bring together models of beliefs and behaviors at each of the three levels, showing how the levels interact and influence one another.”
2. Understanding Support Vector Machines (a collection of nice blog postings and tutorials)
3. RELEVANCE: A Review of and a Framework for the Thinking on the Notion in Information Science. Basic thinking about how connections determine relevance from 1976. OCR copy in project folder.
4. Proximity of multiple users reporting aspects of the same story might be helpful. One way to determine if multiple observers were colocated at the same time could be to use a secondary “acoustical” network. Ultrasonic (timestamped?) signals from one device could then be incorporated into another devices’ feed, which could be used to help validate both signals. A paper that touches on this for other reasons is here: http://www.jocm.us/uploadfile/2013/1125/20131125103803901.pdf
5. Evidentiality for text trustworthiness detection: Evidentiality is an important clue for text trustworthiness detection. With the binarized vector setting, evidential based text representation model has considerably performaned better than both the bag-of-word model and the content word based model. Most crucially, we show that the best trustworthiness detection result is achieved when evidentiality is incorporated in a linguistically sophisticated model where their meanings are interpreted in both semantic and pragmatic terms.
6. Analyzing collective behavior from blogs using swarm intelligence: We introduce a nature-inspired theory to model collective behavior from the observed data on blogs using swarm intelligence, where the goal is to accurately model and predict the future behavior of a large population after observing their interactions during a training phase. Specifically, an ant colony optimization model is trained with behavioral trend from the blog data and is tested over real-world blogs. Promising results were obtained in trend prediction using ant colony based pheromone classier and CHI statistical measure.
7. No, Torture Doesn’t Make Terrorists Tell The Truth — But Here’s What Actually Works “This approach can also separate liars from truth-tellers. When recalling their experiences in a cognitive interview, people who are telling the truth give longer and more detailed answers. Their recollections also tend to grow as more details come back into focus. Liars, on the other hand, typically tell a bare-bones story that doesn’t develop with retelling.”“Credibility is all in the words people use,” Meissner told BuzzFeed News. “It’s in the way they tell their story.” And crucially, it seems hard to game the system. Telling a lie is more mentally demanding than telling the truth, and hiding this cognitive effort is harder than concealing signs of stress.”
A possible fourth element is a way of determining the anonymous identity of a device. As this article shows, it is possible to uniquely identify a portable device from the characteristics of its sensors. This means that it may be possible o identify the person and the device. Should trust be reduced if a known person is using a new device? Is a device that was used by a trusted person an indicator that the next person is more/less trustworthy than normal?
Since “following” an individual allows the creation of a social network that can be used to identify an individual, subscribers to the repositories can track “themes” or “ideas”. It might be that these themes are automatically generated, crowdsourced using tags, or some other means of categorization. A question to be addressed is whether a category can consist of reports from only one reporter
1. Creating a repository or portal could be something like Zooniverse. (Wikipedia entry)
2. Gaining Wisdom From Crowds (Communication of the ACM)
Education. How to help potential reporters become good ones, while not providing clues to their identity? Is it possible to add a level of awareness so that the system can look at a report that someone wants to upload and identify potential ways that they could be identified from the information?
Trolling and/or teamwork
1. Teamwork OP: Riot on making ‘good’ the easy choice
videogame studies

Software:

Wickr – The Most Trusted Messenger in the World Trusted by world leaders, executives, journalists, human rights activists and your friends.
Google NewsLab – “collaborate with journalists and entrepreneurs to help build the future of media.”
AlchemyData News API: (video) (API details) Simple query provides news and blog content enriched with NLP and highly targeted search, trend analysis and historical access to news and blog content.
Alchemy Natural Language Processing (demo) (features) Offers 12 API functions as part of its text analysis service, each of which uses sophisticated natural language processing techniques to analyze your content and add high-level semantic information.
Watson Developer Cloud offers a variety of services for developing cognitive applications. Each Watson service provides a Representational State Transfer (REST) Application Programming Interface (API) for interacting with the service. IBM Bluemix™ is the cloud platform in which you deploy applications that are developed using Watson services.
Microsoft Search API (News) – Lots of oddities to use. Here’s my example.
Newswhip Spike – Know exactly what stories, writers and events are getting engagement in real time. Focus on niche topics, places and specialist publications
Infobitt – Manifesto(s) long and short. Larry Sanger is a co-founder of Wikepedia. To contact (another committee member?)
Jstacs – A Java framework for statistical analysis and classification of biological sequences
Mining Twitter with R
MentionMap
WEKA
JGAAP (Java Graphical Authorship Attribution Program)
SigmaJS is a JavaScript library dedicated to graph drawing. It makes easy to publish networks on Web pages, and allows developers to integrate network exploration in rich Web applications.
dev.twitter.com
breakingnews.com
Media Cloud – Media Cloud is a project that seeks to track news content comprehensively – providing open, free, and flexible tools for quantitative analysis of media trends.(The Berkman Center for Internet & Society at Harvard University)
Truthy – Information diffusion research at Indiana University (paper and article)
Let’s Encrypt – The objective of Let’s Encrypt and the ACME protocol is to make it possible to set up an HTTPS server and have it automatically obtain a browser-trusted certificate, without any human intervention. This is accomplished by running a certificate management agent on the web server.

ipInfo – service to get location information from ip address

// PHP example from 
// stackoverflow.com/questions/409999/getting-the-location-from-an-ip-address
$ip = $_SERVER['REMOTE_ADDR'];
$details = json_decode(file_get_contents("http://ipinfo.io/{$ip}/json"));
echo $details->city; // -> "Mountain View"

~~YUI~~ AngularJS
PHP
Apache
Quora API – Quora is an interesting ‘news’ site that has it’s own ways of determining veracity by tracking who posts what and how it’s upvoted. It turns out they have an unofficial REST api. It might be possible to tie into it using a sort of calculated identity. And have an exchange with the users of the site.
Reddit-related
- Handling ‘doxxing’ on Reddit.
- Automatically-generated documentation for the reddit API.
- Communiviz
- Navigating the massive world of reddit: Using backbone networks to map user interests in social media
  - We report on a method using the backbone of a network to create a map of the primary topics of interest in any social network. To demonstrate the method, we build an interest map for the social news web site reddit and show how such a map could be used to navigate a social media world. Moreover, we analyze the network properties of the reddit social network and find that it has a scale-free, small-world, and modular community structure, much like other online social networks such as Facebook and Twitter. We suggest that the integration of interest maps into popular social media platforms will assist users in organizing themselves into more specific interest groups, which will help alleviate the overcrowding effect often observed in large online communities.

Conferences

Computation + Journalism: The Computation+Journalism Symposium is a celebration and synthesis of new ways to find and tell news stories with, by, and about data and algorithms. It is a venue to seed new collaborations between journalists and computer and data scientists: a bazaar for the exchange of ideas between industry/practice and academia/research.

Relevant literature (newest on top):

Computational Fact Checking from Knowledge Networks (different version?) : Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.
- Articles citing from Google Scholar: https://scholar.google.com/scholar?cites=7908403792444066662&as_sdt=20000005&sciodt=0,21&hl=en
Stanford Computational JournaLism Lab
Computational Journalism: from Answering Question to Questioning Answers and Raising Good Questions: This paper proposes a Query Response Surface (QRS) based framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate and tackle practical fact-checking tasks — reverse-engineering vague claims, and countering questionable claims — as computational problems. Within the QRS based framework, we take one step further, and propose a problem along with efficient algorithms for finding high-quality claims of a given form from data, i.e. raising good questions, in the first place.
A Unifying Framework for Behavior-based Trust Models and who’s cited this
A definitive guide to verifying digital content for emergency coverage: Authored by leading journalists from the BBC, Storyful, ABC, Digital First Media and other verification experts, the Verification Handbook is a groundbreaking new resource for journalists and aid providers. It provides the tools, techniques and step-by-step guidelines for how to deal with user-generated content (UGC) during emergencies.
Human Rights Citizen Video Assessment Tool: Assists human rights researchers to systematically assess citizen videos that depict potential human rights violations. It integrates best practices of citizen video authentication and brings the myriad of required verification steps into one, linear format. At the end of the guide, users will be able to download the collected information as a pdf or word document, which can be saved together with the assessed video, or shared with other researchers or experts to aid with further investigations.
Google wants to rank websites based on facts not links
- Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources
  - Cites of the above. Will need this for the CSCW paper, probably
Emergent (real-time rumor tracking). From TOW center for digital journalism
Blogs & Bullets
Five technologies that betrayed Silk Road’s anonymity
Facebook now allows you to report links to fake news stories
Reading News with Maps by Exploiting Spatial Synonyms – Goes into depth describing clustering as ways to find topics and locations. A UMCP project, so it may be very good to talk to these people. Hanan Samet is the lead professor. He also wrote Foundations of Multidimensional and Metric Data Structures (discount code 85511), which looks useful.
A Dataset for Active Linguistic Authentication
Practical attacks against authorship recognition techniques – The major contribution of this research is that it demonstrates that attacks work very well. The obfuscation attack reduces the effectiveness of the techniques to the level of random guessing and the imitation attack succeeds with 68-91% probability depending on the stylometric technique used.
A comparative study of machine learning methods for authorship attribution.
Internet Trends 2014
Thai police: We’ll ‘get you’ for junta criticism (CBC)
Cyberactivism in the Egyptian Revolution: How Civic Engagement and Citizen Journalism Tilted the Balance
Role of the New Media in the Arab Spring
Twitter and Microblogging: Instant Communication with 140 Characters Or Less
Who’s reporting the protests?: converging practices of citizen journalists and two BBC World Service newsrooms, from Iran’s election protests to the Arab uprisings
TWITTERING THE NEWS The emergence of ambient journalism
The Evolution of the Twitter Revolution
Social media’s ‘law’ of short messages
The Evolution of Automated Breaking News Stories (MIT Press)
- A Google engineer has developed an algorithm that spots breaking news stories on the Web and illustrates them with pictures. Currently filing its stories on Twitter. (Source paper: Telling Breaking News Stories from Wikipedia with Social Multimedia: A Case Study of the 2014 Winter Olympics)
New Map of Twitterverse Finds 6 Types of Networks
Citizen Sensing, Social Signals, and Enriching Human Experience
RE-MEDIATION, INTER-MEDIATION, TRANS-MEDIATION The cosmopolitan trajectories of convergent journalism
- This article draws on performativity theory in order to analyse convergent journalism as a form of journalism that privileges the civil disposition of “I have a voice”, or citizen-driven acts of deliberating and witnessing, over the professional act of informing.
Trust and Matching Algorithms for Selecting Suitable Agents
On Taxis and Rainbows – Lessons from NYC’s improperly anonymized taxi logs.
How Edward Snowden Changed Journalism – The New Yorker
- The author, Steve Coll, is the dean of the Graduate School of Journalism at Columbia University, and reports on issues of intelligence and national security in the United States and abroad. Might be a useful contact
The Tow Center for Digital Journalism
- Source Protection Blog
- Research Papers
Bitcoin
- Out in the Open: An NSA-Proof Twitter, Built With Code From Bitcoin and BitTorrent.
  - Uses the bitcoin protocol, though not the network that actually drives the digital currency. Basically, the protocol handles user registration and logins. Just as machines — called miners — verify transactions over the bitcoin network to ensure no one double-spends bitcoins and everyone spends only their own coins, a network of Twister computers verifies that user names aren’t registered twice, and that posts attached to a particular user name are really coming from that user. Posts are handled through the BitTorrent protocol. This lets the system distribute a large number of posts through the network quickly and efficiently, and it lets users receive near-instant notifications about new posts and messages — all without the need for central servers.
- Bitcoin and the Byzantine Generals problem (NY Times)
- The Byzantine Generals Problem ACM paper
- LiteCoin – peer-to-peer Internet currency that enables instant payments to anyone in the world. It is based on the Bitcoin protocol but differs from Bitcoin in that it can be efficiently mined with consumer-grade hardware.
Anonymous networks
- NPR – Apps That Allow You To Post Anonymously Gain Popularity
- BuzzFeed – The Return Of The Anonymous Internet
- Wired – Secrecy Is the Key to the Next Phase of Social Networking
- BBC – Iraqis use Firechat messaging app to overcome net block
“Anonymized” data really isn’t—and here’s why not
BROKEN PROMISES OF PRIVACY: RESPONDING TO THE SURPRISING FAILURE OF ANONYMIZATION
The Complete Guide to Anonymous Apps – New York Magazine
How to Anonymize Everything You Do Online – Wired (mostly TOR, and relevant as a discussion of what’s being done)
Whisper
- Article in NY Magazine
- The background images for Whispers can be uploaded by the user, but are more often automatically generated by the app from an internet search.
- Revealed: how Whisper app tracks ‘anonymous’ users
  
  Lewis, Paul, and Dominic Rushe. “Revealed: How Whisper App Tracks ‘anonymous’ Users.” The Guardian. Guardian News and Media Limited, 16 Oct. 2014. Web. 16 Oct. 2014.
- Secret-sharing app Whisper to the Guardian: You published a ‘pack of vicious lies’ about us Washington Post, Oct 16, 2014.
- Whisper: the facts The Guardian, Oct 17 2014
Secret
- ValleyWag article “Silicon Valley Can’t Stop Shit-Talking Itself on This New App“
- New York Magazine uses Secret as a basis for a story: The Secret Shame of an Unacquired Tech Worker
YikYak – Yik Yak acts like a local bulletin board for your area by showing the most recent posts from other users around you. It allows anyone to connect and share information with others without having to know them.
- A Gossip App Brought My High School to a Halt
Ushahidi
- ‘Peace Technologies’ Enable Eyewitness Reporting When Disasters Strike
Hollywood Stock Exchange. (wikipedia)
N2Sky – Neural Networks as Services in the Clouds – might need this to train systems to recognise individuals from data streams. Not sure if it’s better to process the raw data on the device or later.
“Honey Encryption” Will Bamboozle Attackers with Fake Secrets
American Press Institute announces major project to improve fact-checking journalism
Society for Professional Journalists
- Code of Ethics
- Position paper on anonymous sources
NPR’s guidance on anonymity
Designing for Virality
Your own Deep Web site will soon be just a point-and-click away
SecureDrop is an open-source whistleblower submission system managed by Freedom of the Press Foundation that media organizations use to securely accept documents from anonymous sources. It was originally coded by the late Aaron Swartz. (Includes GitHub links)
End-To-End is a Chrome extension that helps you encrypt, decrypt, digital sign, and verify signed messages within the browser using OpenPGP. With bonus JavaScript Crypto library.

Presentation of the information

Wikis?
Automated
- Automated Insights
- The AP is automating its earnings stories

People who might be useful to involve

Jonathan Grudin
Roy Rada – agreed to be on committee
Lina Zhou
Bin Zhou
Kevin Crowston
Leysia Palen. This hews closely to Project EPIC. – agreed to be on committee.
Jon Callas
James Graves – Keyboard pattern recognition.
Delip Rao – Lead author on several papers during his time at the Human Language Technology Center of Excellence at Johns Hopkins University that dealt with algorithmic methods of identification. Many of his co-authored papers talk about “latent attributes,” those implicit specific details about people that can be surfaced, including ethnicity and gender.

Committee

Fellow Students

Amir Karami <amir3@umbc.edu> https://sites.google.com/site/karamihomepage/
Ali Azari <azari2@umbc.edu> http://umbc.academia.edu/AliAzari

The experiment to determine the validity of the the approaches will be twofold, and will split into two parts.

The first will be the analysis of the crowdsourced mobile data to determine “identity” of users, their trustworthiness, and the likelihood that a documented event is authentic.

To produce data for the first part of the system to analyze, the second part will be the development of a multiplayer online game (MOG) that will track users as they engage in a game of “space invaders” loosely based on the television series “V”. In this game, registered users will interact with an Augmented Reality scenario where events such as UFO appearances, alien artifact discoveries, and so forth will be presented as game play elements. Participants will have to discover and document these events, which will become more varied and complex as the story unfolds. Some research to bear in mind.

Using this framework, we will have the ability to know exactly which registered user did what in the context of the game. We will know what information with respect to game events has been created. As such, the analysis systems will have both a “real world” dataset produced by the biometric and recording game components as well as clean meta information about how the data originated. This means that all analysis results will be testable with respect to the actual events.

The primary experiment will consist of having users interact with the game scenario in the roles of insurgents or occupiers. The game will start with a surreptitious alien invasion and (depending on the number of players), builds towards a larger scale conflicts. All “weapons” in the game are indirect fire, where a target is designated using a mobile device. Initially, players will only be able to track aliens (to join with or oppose). Weapons will become available over time as the story arc progresses. Players have to develop sufficient trust to be granted control of weapons. Other players may try to interfere with the trust relationship, etc.

The result of all this activity is to produce a large amount of clean data for the purposes of analysis and to test possible biometric solutions and information validation techniques. Given that MMOGs can easily have millions of users, it seems reasonable that this mechanism for gathering data should result in thousands to millions of high quality data sets.

Once the analytic components are sufficiently robust and accurate within the game context, the possibility of the software elements being successful in an actual event is quite high.

Gates (Bold items are major milestones)

Determine cross-platform mobile development environment that will support biometrics and AR game concept
If needed determine AR library
Develop proof of concept AR game “scene” that allows for the recording, commenting and saving of an event.
- http://www.gamesforchange.org/
Develop proof of concept biometric “recognizer”
Story development and production of game “bible”
Initial front-end game framework design (app and server)
- User registration and roles
  - Resistance Forces
  - Occupation Forces
- Push notifications
- AR event presentation
- AR documenting
  - Video
  - Stills
  - Text
  - Other? (shooting down UFOs, etc)
- Server GIS-based game engine.
  - Event production, management and recording of meta information
    - Who saw what and where
    - What they did
- Recording and storage of data for later analysys
Initial back-end game framework design (Server and webpage)
- Biometric integration (automatic login later?)
- Event recognition and synchronization
- Communication between users
  - Since the game is based on anonymous users. How do they establish communication channels? F2F meetings? (hold two phones together and shake to guarantee proximity?)
  - C3 could be based on groups that have met F2F reaching out to other users based on behavior
  - User behavior tracking – needs to be useful enough to see if someone is worth recruiting but not enough to trap them.
- Recording and storage of data for later analysis
  - Development of interfaces for various customers, such as the news media
Integration engine for correlating front-end and back-end data
Front-end coding
Back-end coding
Generation of production game assets
Closed alpha release (invitation)
Debugging and initial data analysis.
Initial paper(s)
Open game alpha release
Debugging and data analysis
Beta release
Production release
Algorithm refinement until accuracy goals are achieved
Decoupling of back-end from game engine for use “in the wild”

Phlog

nearly decomposable

Trustworthy Anonymous Citizen Journalism From Really Scary Places

“The Friction of Fear versus the Currency of Trust”

1 thought on “Trustworthy Anonymous Citizen Journalism From Really Scary Places”

Leave a comment Cancel reply

“The Friction of Fear versus the Currency of Trust”

Share this:

1 thought on “Trustworthy Anonymous Citizen Journalism From Really Scary Places”

Leave a comment Cancel reply