As Officials Crack Down on Web Chatter, New Technology Finds Out Why
July 2, 2012
By PAUL MOZUR
The Wall Street Journal
China's government isn't the only one paying close attention to what the country's citizens are saying on social media sites.
As China's 500 million Internet users rapidly adopt social media, academics and entrepreneurs are figuring out ways to track online messages and blog posts to better understand what the government censors—and even how to predict its intent.
China's government employs software and an army of thousands to police the Internet, but it leaves much of the censoring to social-media sites like Sina Corp. to take down posts that violate local and national rules issued each week. While it is generally known that certain words or phrases, such as the Tiananmen Square massacre, will trip the censors, the scope isn't fully understood.
These sites usually offer clues that they deleted a post due to censorship—rather than by the user or due to a technical problem—leaving special messages or images such as an Internet police cartoon character. That is helping researchers figure out how China's opaque power structures work to control its citizens.
"We have a degree of translucence now about censorship we never had," said David Bandurski, a researcher at Hong Kong University's China Media Project.
Hong Kong University developed software called WeiboScope, which scans posts from Sina Corp.'s Weibo, a popular Twitter-like Web messaging service. The software was designed to understand how Chinese respond to different news events, but it is also useful for analyzing censorship trends, said King Wa Fu, a research assistant professor at HKU, who helped develop WeiboScope.
WeiboScope operates somewhat like a search engine and scans about 300,000 user accounts, emphasizing those with influence. Besides collecting posts on various topics and allowing researchers to search the data in English, WeiboScope can also check the same post several times to see whether it becomes inaccessible. If it does, an error message indicates it was blocked, and WeiboScope adds it to the list of censored posts.
Researchers at HKU's China Media Project frequently use WeiboScope to flag and explain topics targeted by censors on their widely read blog.
One removed post showed that Chinese censors had begun cracking down on references to planned protests in Hong Kong on the anniversary of the city's hand-over to China, three days before the actual milestone.
The post also showed why humans are needed to comb through posts. Instead of directly referring to the protests, the post used an approaching typhoon as a metaphor for what would happen on the anniversary.
Once the data is collected, the challenge is to make sense of it all. Harvard University professor Gary King recently found that social-media analytics technology he developed could be applied to China's censorship patterns. It might even be able to predict major news events before they happen.
Mr. King in 2007 co-founded Boston start-up Crimson Hexagon Inc., which measures consumer sentiment on social media sites for large companies like Microsoft Corp. MSFT -0.10%and Starbucks Coffee Co. SBUX -0.99%Instead of simply scanning for keywords, Crimson's software uses an algorithm to analyze data based on a set of categories and themes determined by the user.
Last year, Mr. King used Crimson's social-media databank to begin analyzing a large set of China social-media data that encompass more than 11 million posts made on 1,382 Chinese forums.
Mr. King chose 85 topics that range in political sensitivity—from protests in Inner Mongolia to a popular videogame—and classified posts based on whether the context was related to news, government policies, pornography, censorship and "collective action," or posts that could lead to public assembly. Mr. King then used the Crimson software to examine how many posts in each category had later been censored.
In a newly issued report, Mr. King and other researchers found that 13% of social media posts were censored.
Notably, the government generally left alone scathing criticism of national policies and government leaders. Instead, it homed in on posts that threatened protests during major events. The most censored topics included discussion about Inner Mongolia and Zencheng, the arrest of political dissident Ai Weiwei and the bombings over land claims in Fuzhou.
So, the censors might generally ignore nationalistic comments about China's claims to the South China Sea, but during the country's dispute with Vietnam last year they might wipe out posts on the topic for fear of people marching in protest.
Pornography and comments about censorship were also almost universally censored.
Much of the censorship activity occurs within 24 hours of the original post. "This is a stunning organizational accomplishment," wrote Mr. King, "requiring large scale militarylike precision." The government must decide what to censor, communicate it to tens of thousands of individuals who execute the censorship within 24 hours.
But figuring out why posts disappear is half the battle. Mr. King said the company is looking at whether the software could predict China's policy moves.
As part of his analysis, Mr. King found that censorship rates soared during China's dispute with Vietnam, but censored posts on the topic plummeted five days before a surprise peace agreement in June 2011. In another example, posts discussing Chinese artist and political activist Ai Weiwei began falling off several days before his arrest.
"Hundreds of thousands of people are involved to help the government keep secrets…and the interesting paradox is an enormous program like that, designed to keep people from seeing things, actually exposes itself," Mr. King said. "An elephant leaves big footprints."
Crimson Hexagon has already been using the software—for which Mr. King received a patent in May—to help companies understand the nuances of brand recognition via subscription and consulting services.
But the company says it has high hopes for the China market. Given the software is based around a person's categorization of social media posts, it can easily jump linguistic and cultural barriers, and that means it could help more companies understand how the elusive Chinese customer perceives them, or at least talks about them, online.
Mr. King said this is just the start. He hopes to break down the censorship data based on geography to examine differences between local and national censorship policies, and further look at whether post deletions can be harbingers of policy change.
He's not alone. Others, like researchers at Carnegie Mellon University's School of Computer Science who recently conducted a massive study of Weibo censorship, are trying to find patterns buried in all that Internet chatter that is disappearing from China's Web.