Skip to main content

Highlight

Classifying the Political Leaning of News Articles and Users from User Votes

Achievement/Results

In U.S. politics, opinions on a variety of issues involving taxes, the role of government, domestic policy, and international relations are substantially though imperfectly correlated with each other and with party affiliation and with an overall self-identification as liberal or conservative. Thus, classifying people, media outlets, and opinions expressed in individual articles as liberal or conservative conveys meaning to most people.

We applied three semi-supervised learning methods that propagate classifications of political news articles and users as conservative or liberal, based on the assumption that liberal users will vote for liberal articles more often, and similarly for conservative users and articles. We use data from the social news aggregator Digg, where readers vote for articles they think should be elevated to the front page. Starting from a few labeled articles and users, the algorithms propagate political leaning labels to the entire graph. In cross-validation, the best algorithm achieved 99.6% accuracy on held-out users and 96.3% accuracy on held-out articles. Adding social data such as users’ friendship or text features such as cosine similarity did not improve accuracy. The propagation algorithms, using the subjective liking data from users, also performed better than an SVM based text classifier, which achieved 92.0% accuracy on articles. The automatic classifier will be useful for a variety of scientific purposes, as well as in the development of news services that try to prevent political polarization by nudging people reading a mixture of both liberal and conservative items. (Zhou, Resnick and Mei 2011)

Address Goals

The automatic classifier developed in this project will be useful for a variety of scientific purposes, in particular, in classifying text data.

Doctoral student, Xiaodan Zhou, and several REU students worked on this project, gaining valuable knowledge in text mining.