Big Tech has become notorious for its hoarding of its users’ personal data, collected with great breadth and down to minute details. Billions have been paid by online platforms to settle legal charges over their invasive and reckless privacy follies. Facebook in particular is associated with this, especially after a series of major scandals involving leaks or hacks of personal data. But Google is inarguably the greediest of these companies in its data collection, to an extent that can surprise even jaded users. This makes sense economically, since the collection of data is a key part of the network effect of online search—more searches and click data mean algorithms that deliver more accurate searches, attracting more users and searches, in the familiar positive feedback cycle of what economists call “network effects.”
From early days, Google held onto all the data it could get its hands on—who searched for what, what kind of results were likely desired, where searches came from, and so on. A major step in this was the release of Google’s email service, Gmail. It caused a large stir itself as users learned the free, high-storage email service served ads on-screen that were targeted to the user by scanning the text of their emails. The scanning was conducted automatically by software algorithms similar to those used to filter out spam from inboxes, but the company was completely unprepared for the backlash, not realizing that their huge scale and power made such moves feel creepier. However the service had a crucial ancillary benefit for the company—it required a login. With that, Google could cross-reference people’s email data with their search history on Google and their YouTube platform (which also required login to post video), along with precise location data from Maps and GPS data from phones running Android—the beginning of its program to synthesize its data into comprehensive individual profiles.
But the real turning point was the acquisition of the major display ad agency DoubleClick, which brought pivotal changes to the company’s “cookies.” Cookies are pieces of software planted on your computer or phone by sites as you browse the Web, recording where you’ve been for the purpose of presenting ads you’re likely to be interested in. Cookies are now stupendously widespread—visiting a typical websites like CNN or dictionary.com can put dozens of them on your PC or phone.
Google’s AdSense system had always used these cookies, but the escalation was dramatic, as Wired’s pro-industry reporter Steven Levy wrote covered in his book In the Plex. He reported that the company gained “an omniscient cookie that no other company could match.” As a user browses, the cookie:
develops into a rather lengthy log that provides a fully fleshed out profile of the user’s interests…virtually all of it compiled by stealth. Though savvy and motivated consumers could block or delete the cookies, very few knew about this possibility and even fewer took advantage of it. The information in the DoubleClick cookie was limited, however. It logged visits only to sites that ran DoubleClick’s display ads, typically large commercial websites. Many sites on the Internet were smaller ones that didn’t use big ad networks…Millions of those smaller sites, however, did use an advertising network: Google’s AdSense. AdSense had its own cookie, but it was not as snoopy as DoubleClick’s. Only when the user actually clicked on an ad would the AdSense cookie log the presence of the user on the site. This ‘cookie on click’ process was lauded by privacy experts…Google now owned an ad network whose business hinged on a cookie that peered over the shoulder of users as it viewed their ads and logged their travels on much of the web. This was no longer a third-party cookie; DoubleClick was Google. Google became the only company with the ability to pull together user data on both the fat head and the long tail of the Internet. The question was, would Google aggregate that data to track the complete activity of Internet users? The answer was yes…after FTC regulators approved the DoubleClick purchase, Google quietly made the change that created the most powerful cookie on the Internet. It did away with the AdSense cookie entirely and instead arranged to drop the DoubleClick cookie when someone visited a site with an AdSense ad…Now Google would record users’ presence when they visited those sites. And it would combine that information with all the other data in the DoubleClick cookie. That single cookie, unique to Google, could track a user to every corner of the Internet.
Amazingly, Google co-founder Sergei Brin dismissed fears about this mega-cookie as “more of the Big Brother type,” meaning exaggerated. But even that might be putting a positive gloss on today’s data hoarding—Lawrence Lessig, who has defended the company in areas like its book scanning, noted that in Orwell’s book 1984 where Big Brother was introduced, at least the characters “knew where the telescreen was…In the Internet, you have no idea who is being watched by whom. In a world where everything is surveilled, how to protect privacy?”
And in 2016, Google went even further by changing its terms of service, asking users to activate new functions that would give them more control over their data, and let Google serve more relevant ads. But what the change did was merge its tracking data with your search history and the personal information in your Gmail/YouTube/Google + accounts, into “super-profiles.” And Google wasn’t done—beside using the mega-cookie to record our browsing history, combined with our search logs and Gmail contents, Google “Now Tracks Your Credit Card Purchases and Connects Them to Its Online Profile of You,” as a recent MIT Technology Review headline indicates. By contracting with third party data firms that track 70% of all credit and debit card purchases, Google can now offer advertisers further confirmation of which ads are working, not just to the point of clicking but to the point of sale.
With its new TOS, Google does let users view some of the data it holds on them, but it takes “an esoteric process of clicks,” as Ken Auletta put it in his book Googled, and again most users are unaware of these issues in the first place, and since we’re opted in, most fail to view their data files. Additionally, each Google service has its own privacy terms and settings, and they change without warning, so we have to be constantly vigilant for their changes and subtleties. And Google joins the tech community in its use of “dark patterns,” repetitive tactics that wear down users into allowing data access. And finally, even opting-out of customization doesn’t end the data collection, just the use of it to target ads to you—your movements, browsing, searching, emailing and credit card buying are all still compiled. In time Google announced it would soon stop the unpopular scanning of Gmail text to place ads—the catch was that the company had enough data on users from its super-profiles that it could personalize them without the scanning.
And for all its hoarding, the pile isn’t secure—Google had allowed software developers to design applications like games for Google +, the company’s unsuccessful attempt to compete with Facebook in social media. But a glitch in the software allowed developers access to private portions of Google + user profiles over a three-year period before its discovery, including full names, email, gender, pictures, locations, occupation and marital status. An internal memo indicates that as with Facebook’s own developer data leaks, there’s no way to know if the data was misused in any way. But most important, Google learned of the issue in spring 2018 but refused to announce or disclose it, fearing “reputational damage” to itself.
Whatever this company is, it rhymes with “shmevil.”
Rob Larson is Professor of Economics at Tacoma Community College and author Bit Tyrants: The Political Economy of Silicon Valley, out now from Haymarket Books.