On the evening of July 4, Russian Internet users realized that the search engine Yandex has been indexing a surprising array of information stored on Google Docs, including files containing passwords, credit card numbers, and corporate documents. Yandex’s press service says the company’s actions were perfectly legitimate, but within hours of Wednesday’s discovery the search engine stopped producing any hyperlinks to Google Docs. Meduza takes a closer look at what information leaked to the public because of this unintended loophole.
In 2009, Google announced that public documents on Google Docs would start appearing in results on Internet search engines, but only when public hyperlinks exist for those documents. In other words, a document that hasn’t been shared with anyone (or a document that has only been shared with specific people by email or messenger) should not appear in search results (even if the author hasn’t restricted public access).
For the past decade, Google, Mail.ru, Yahoo, Bing, and others have all indexed tens of thousands of public documents on Google Docs. The situation with Yandex is special because of how thoroughly the company indexed this information, returning search results with hyperlinks to files containing highly sensitive data.
On other search engines, you can also find documents with private information, but simple search terms like “password” won’t turn up hyperlinks to spreadsheets with passwords. Instead, you’ll get links to a bunch of document-traps. Using Google, even when typing in people’s full names, Meduza was unable to find the same private documents that appeared on the first page of results at Yandex.
On July 5, on its Russian-language company blog, Google posted a statement reaffirming its indexing rules. The text does not mention Yandex.
Technically speaking, responsibility for the information leaking in the first place falls on all the people who selected the wrong privacy settings for their Google Docs. (If you don’t want your data available to the world, don’t set the access to “Public on the Web”!) But it’s still unclear how all this information ended up in Yandex’s search results.
So far, Yandex isn’t offering any answers. The only comment the company has made since the start of the scandal has been this: “Yandex indexes only the open part of the Internet, meaning the pages that are accessible by hyperlink without a login and password. Yandex does not index websites whose administrators have added the robots exclusion standard, even if these sites are openly accessible. On Wednesday evening, Internet users filed complaints with our help desk about a file accessibility issue at docs.google.com. Our security team is now in contact with our colleagues at Google, to draw their attention to the fact that these files may contain private information.”
The robots exclusion standard is a special file named “Robots.txt” stored at the root of a website’s hierarchy that tells Web crawlers and other Web robots which files and pathways on the server to scan, and which to ignore. Google Docs uses the robots exclusion standard, telling web crawlers to index hyperlinks that begin “docs.google.com/document.” In other words, Yandex had the right to index the documents that Google Docs users made openly accessible, and it would have been impossible for the company to access private Google Docs, in any event. Gaining access to those public documents wouldn’t have been easy, however, as these hyperlinks are made up of long strings of random characters. To index these files, Yandex would have needed to find the hyperlinks somewhere.
Russia’s federal media regulator, Roskomnadzor, has sent an official request to Yandex in response to the data leak, but the agency has not clarified what exactly it has asked Yandex to do.
We can safely assume that people who create files with passwords or financial records aren’t going around sharing public hyperlinks to that information. Some Internet users have speculated that Yandex has been indexing the hyperlinks opened through Yandex Browser, or emailed over Yandex Mail.
There’s a history behind the concerns about Yandex Browser. In 2015, Internet users discovered that Yandex Browser had been transmitting users’ Web history data to Yandex’s servers (one of the application’s documented features). These hyperlinks then landed in a database that was indexed by the company’s Web crawlers. As a result, Yandex even indexed the hyperlinks that users had visited privately, and was able to visit websites without the users’ passwords. Theoretically, these private pages (accessible only by direct hyperlink) could have turned up in Yandex’s search results, but it didn’t happen then. The company put out a statement claiming that its Web robot gained access to Yandex Browser search histories by mistake, and it wouldn’t happen again.
Yandex did not respond to Meduza’s questions about whether or not the files indexed by its search engine could have come from Yandex Browser or Yandex Mail users opening or sharing direct hyperlinks to content on Google Docs.
Several Internet users have called attention to a document that appears to be instructions for Tinkoff Bank’s human resources department. According to the text, the bank has a policy against hiring men “of North Caucasian nationalities” and people “with non-Slavic names,” except for Armenians and Dagestanis who have previously worked in the banking industry. There is apparently a strict prohibition on hiring members of the LGBT community and “members of the Negroid race.”
On the morning of July 5, spokespeople for Tinkoff Bank told the website TJournal that the bank doesn’t have any hiring restrictions and denied that it ever circulated such instructions. The Google Doc in question was quickly deleted. Several hours later, the bank issued a new statement, saying, “An employee produced this text for reasons unknown to us, and shared it on the Internet.” According to Tinkoff Bank’s spokespeople, the author is still employed at the bank and there won’t be any immediate disciplinary action. For now, the employee is reportedly receiving “additional instructions” about the bank’s corporate values.
Members of the “Sonar” election monitoring movement discovered files on Google Docs containing the personal information of Moscow voters. In a post on Facebook, the group claims that Svetlana Istomina, the head of the Social Security Administration in Moscow’s Northern Administrative District, created a Google Doc on June 27, titled “Polling Station Resource Map,” that contains information about voters who cast their ballots from home (disabled persons and others who depend on state welfare). The document also designates people responsible for “measures to mobilize resources.” The activists speculate that this is how city officials propose to mobilize the electorate for this September’s mayoral election.
“Sonar” members also say they found a list of 23,000 voters registered in the Vostochnoye Degunino District. On the spreadsheet, some names are listed “Call,” while other voters are marked “Lives in the countryside,” “Refuses to vote,” “Refuses to participate in election,” and so on.
Moscow’s City Election Commission told Meduza that it is unaware of any voters’ personal data leaking online, and it “thinks nothing” about the information uncovered by “Sonar.” Spokespeople for City Hall were unavailable for comment.
Ekho Moskvy columnist Alexander Plushev discovered a Google spreadsheet tracking negative comments made about speeches by Galina Panina, the public relations director for the Russian branch of the home improvement and gardening retailer “Leroy Merlin.”
Plushev guessed that the document is part of a PR campaign against Panina. Spokespeople for Leroy Merlin later confirmed the spreadsheet’s authenticity, explaining that the company was tracking negative comments for its own analytical purposes.
Ilya Varlamov says he found a report compiled by officials in Yekaterinburg detailing a retaliatory campaign against him in response to his blog posts criticizing the city’s landscape and urban development. The document apparently calls for planted stories published under the name of “a local resident and blogger” in praise of the city’s parks and streets. Conversely, city officials also allegedly wanted to plant social media posts pinning Yekaterinburg’s problems on former Mayor Evgeny Roizman, amplifying the message that “populists” elected to office are ineffective managers who “can’t sustain the city economy or major federal projects.”
A spokesperson for Yekaterinburg City Hall told Meduza that the document shared by Varlamov is a fake, insisting that “the city doesn’t work in Google Docs due to security requirements.”
The website Znak.com speculates that this spreadsheet could be part of a manual distributed to professional trolls who are paid to write comments on social media in support of the Russian government.