Illustration by Doug Chayka; Erika Goldring/FilmMagic (Billie Eilish)

What the Internet Really Knows About You

A digital media expert argues that your private life is exposed online, no matter how cautious you think you’re being

People concerned about privacy often try to be “careful” online. They stay off social media, or if they’re on it, they post cautiously. They don’t share information about their religious beliefs, personal life, health status, or political views. They think they’re protecting themselves.

But they are wrong. Because of technological advances and the sheer amount of data now available about billions of other people, discretion no longer suffices to protect your privacy. Computer algorithms and network analyses can now infer, with a sufficiently high degree of accuracy, a wide range of things about you that you may have never disclosed, including your moods, your political beliefs, and your health.

There is no longer such a thing as individually opting out of our privacy-compromised world.

People concerned about privacy often try to be “careful” online. They stay off social media, or if they’re on it, they post cautiously. They don’t share information about their religious beliefs, personal life, health status, or political views. They think they’re protecting themselves.

But they are wrong. Things have changed because of new technology and how easy it’s become to access data about billions of people. Discretion is no longer enough to protect your privacy. Computer algorithms and network analyses can now more accurately infer a wide range of things about you. That includes things you may have never shared, including your moods, your political beliefs, and your health.

Individually opting out of our privacy-compromised world is no longer an option.

Tracking Minds and Emotions

The idea of data inference is not new. Magazine subscriber lists have long been purchased by retailers, charities, and politicians because they provide useful hints about people’s views. For example, a subscriber to The Wall Street Journal is more likely to be a Republican voter than is a subscriber to Rolling Stone.

But today’s technology works at a far higher level. In 2017, for example, the newspaper The Australian published an article revealing that Facebook had told advertisers that it could predict when younger users were feeling “insecure,” “worthless,” or otherwise in need of a “confidence boost.” Facebook was apparently able to draw these inferences by monitoring photos, posts, and other social media data. (The company denied letting advertisers target people based on those characteristics, but it’s almost certainly true that it has that capacity.)

Today’s computational inference does not merely check to see if Facebook users posted phrases like “I’m depressed” or “I feel terrible.” The technology is more sophisticated: Machine-learning algorithms are fed huge amounts of data, and the computer program categorizes who is most likely to become depressed.

Consider another example. In 2017, researchers, armed with data from more than 40,000 Instagram photos, used machine-learning tools to identify signs of depression in a group of 166 Instagram users. Their computer models turned out to be better predictors of depression than humans who rated whether photos were happy or sad.

Used honorably, computational inference can be a wonderful thing. Predicting depression before the onset of clinical symptoms would be a boon for public health, which is why academics are researching these tools; they dream of early screening and prevention.

The idea of data inference is not new. Magazine subscriber lists have long been purchased by retailers, charities, and politicians. These lists provide useful hints about people’s views. For example, a subscriber to The Wall Street Journal is more likely to be a Republican voter than is a subscriber to Rolling Stone.

But today’s technology works at a far higher level. In 2017, for example, the newspaper The Australian published an article revealing that Facebook had told advertisers that it could predict the emotions of younger users. The company claimed it could pinpoint when users were feeling “insecure,” “worthless,” or otherwise in need of a “confidence boost.” Facebook was apparently able to draw these inferences by monitoring photos, posts, and other social media data.

Facebook denied letting advertisers target people based on those characteristics. Still, it’s almost certainly true that the company has that capacity.

Today’s data inference does not merely check to see if Facebook users posted phrases like “I’m depressed” or “I feel terrible.” The technology is more advanced. Machine-learning algorithms are fed huge amounts of data. The computer program then categorizes who is most likely to become depressed.

Consider another example. In 2017, researchers got hold of data from more than 40,000 Instagram photos. They used machine-learning tools to identify signs of depression in a group of 166 Instagram users. Their computer models turned out to be better predictors of depression than humans who rated whether photos were happy or sad.

Used honorably, data inference can be a wonderful thing. Predicting depression before clinical symptoms begin would be a boon for public health. That’s why academics are researching these tools. They dream of early screening and prevention.

Discretion no longer suffices to protect your privacy.

But these tools are worrisome too. Few people posting photos on Instagram are aware that they may be revealing their mental health to anyone with the right computational power.

Computational inference can also be a tool of social control. The Chinese government is trying to use big data and artificial intelligence to single out “threats” to Communist rule, including the country’s Uighurs, a mostly Muslim ethnic group.

Such tools are already being marketed for use in hiring employees, for detecting shoppers’ moods, and for predicting criminal behavior. Unless these tools are properly regulated, in the near future we could be hired, fired, granted or denied insurance, accepted to or rejected from college, granted or denied housing, and extended or denied credit based on facts that are inferred about us.

This is unsettling enough when it involves correct inferences. But because computational inference is a statistical technique, it also often gets things wrong. What happens when someone is denied a job on the basis of an inference that we aren’t even sure is correct?

But these tools are worrisome too. Few people posting photos on Instagram are aware that they may be revealing their mental health to anyone with the right computational power.

Data inference can also be a tool of social control. The Chinese government is trying to use big data and artificial intelligence to single out “threats” to Communist rule. They’ve been keeping tabs on the country’s Uighurs, a mostly Muslim ethnic group.

Such tools are already being marketed for use in hiring employees, for detecting shoppers’ moods, and for predicting criminal behavior. These tools need to be properly regulated. If they aren’t, soon we could be hired, fired, granted or denied insurance, accepted to or rejected from college, granted or denied housing, and extended or denied credit based on facts that are inferred about us.

This is unsettling enough when it involves correct inferences. But because data inference is a statistical technique, it also often gets things wrong. What happens when someone is denied a job on the basis of an inference that we aren’t even sure is correct?

Selling Your Location

Another troubling example of inference involves your phone number. Even if you have stayed off Facebook and other social media, your phone number is almost certainly in many other people’s contact lists on their phones. If they use Facebook (or Instagram or WhatsApp), they have been prompted to upload their contacts to help find their “friends,” which many people do.

Once your number surfaces in a few uploads, Facebook can put you in a social network, which helps it infer things about you since we tend to resemble the people in our social set. (Facebook even keeps “shadow” profiles of nonusers and deploys “tracking pixels” all over the web—not just on Facebook—that transmit information to the company about your behavior.)

In 2018, an investigation revealed that Verizon, T-Mobile, Sprint, and AT&T were selling people’s real-time location data. And other recent inquiries showed that weather apps, including the Weather Channel, AccuWeather, and WeatherBug, were selling their users’ location data. This kind of data is useful not just for tracking you but also for inferring things about you, like why you were at a doctor’s office.

What’s to be done? Designing phones and devices to be more privacy-protected would be a start, and government regulation of the collection and flow of data would slow things down. But we also need laws that directly regulate computational inference: What will we allow to be inferred, under what conditions, and subject to what kinds of accountability, disclosure, controls, and penalties for misuse?

Until we have good answers to these questions, you can expect others to continue to know more and more about you—no matter how discreet you may have been.

Another troubling example of inference involves your phone number. Even if you have stayed off Facebook and other social media, your phone number is almost certainly in many other people’s contact lists on their phones. If they use Facebook (or Instagram or WhatsApp), they have been prompted to upload their contacts to help find their “friends.” Many people do.

Once your number surfaces in a few uploads, Facebook can put you in a social network. That helps the company infer things about you since we tend to resemble the people in our social set. Facebook even keeps “shadow” profiles of nonusers. The company deploys “tracking pixels” all over the web, giving it a reach beyond its platform. These pixels send information to the company about your behavior.

In 2018, an investigation revealed that Verizon, T-Mobile, Sprint, and AT&T were selling people’s real-time location data. And other recent inquiries showed that weather apps, including the Weather Channel, AccuWeather, and WeatherBug, were selling their users’ location data. This kind of data is useful for tracking you. Beyond that, it also helps infer things about you, like why you were at a doctor’s office.

What’s to be done? Designing phones and devices to be more privacy-protected would be a start. Government regulation of the collection and flow of data would also slow things down. But we also need laws that directly regulate computational inference. In other words, what will we allow to be inferred? Under what conditions will we allow it? And what kinds of accountability, disclosure, controls, and penalties for misuse would be put in place?

Until we have good answers to these questions, you can expect others to continue to know more and more about you, no matter how discreet you may have been.

Zeynep Tufekci is a professor of information science at the University of North Carolina. She writes about the social effects of technology for The New York Times.

Zeynep Tufekci is a professor of information science at the University of North Carolina. She writes about the social effects of technology for The New York Times.

videos (1)
Skills Sheets (6)
Skills Sheets (6)
Skills Sheets (6)
Skills Sheets (6)
Skills Sheets (6)
Skills Sheets (6)
Lesson Plan (1)
Leveled Articles (1)
Text-to-Speech