Česká verze příspěvku

On this blog, I usually review papers that are usually arund 10 pages long. This time, I am going to write about a book that is several hundred pages long and discusses important issues that I believe people dealing with AI should from time to time think about.

The book is called The Cost of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism and promotes the idea that the economic model of gathering data from people (users, employees, customers, etc.) and using them for the economic profit of those who gather them is a new form of colonialism and poses a potential threat for the future of democracy and overall political stability. The authors of the book are professors of sociology Nick Couldry from the London School of Economics and Ulises Mejias from the State University of New York. The book was published by Stanford University Press in 2019.

Because what I do for a living is developing new methods of how to make use of language data and teaching university students the already existing methods, this topic is something I need to think about. This book takes an interesting left-wing view on issues related to data collection and processing. Even though I do not agree with everything that is in the book, I still believe the ideas in the book are very inspirational.

A cliché says that data is the new oil. It metaphorically puts data with the most important economic resources. The purpose of colonialism always was the appropriation of new resources for bigger profit and power. In the past, these resources were land, raw materials, and human labor. Now, the new resource is human life itself. Most of the time when I read the book, I thought, the authors meant this metaphorically, in the end, I am pretty sure they meant it literally.

They provide the following definition of data colonialism:

Emerging new order for the appropriation of human life so that data can be continuously extracted for profit.

They do not view issues connected with data collecting and processing as particularities within the established economical and political system that could be solved by improving particular policies or by individual actions. They view it as a new social order which is inherently neither good or bad (although they criticize most of the time) but which needs to be well understood in order to make adequate political decisions.

The liberal view and liberal counterarguments

First, I will start with a mainstream (or at least what I consider mainstream) view on how the business that includes data gathering works and what the authors of the book would probably consider a naive view to the problem.

Everyone uses data-collecting products voluntarily. Legislation requires that the way the collected data will be used is explained in the end-user agreement that clearly states what service the user gets, what data the service provider can keep and to what purposes they can use the data (improving the service, showing advertisement, marketing, etc.). The users/clients can read the agreement and (rationally, ha ha) decide if it is a good deal from them or not.

The first objection is that the choice is not really voluntary, because opting out from using some commonly used technologies and services is virtually impossible when participating in mainstream social life. Alternatives that do not collect data for profit do not exist. You might be able to live without social network accounts, but it is virtually impossible not to have a bank account (and every bank does some form of data mining), not to have a mobile phone or to have an email address not hosted in the cloud.

The other problem is that you never know the real value of the data. Without that, it is impossible to know whether the terms and conditions are a good deal or not. An end user (of a social network, a shared economy platform, a bank client, a mobile operator customer) cannot estimate both what are the costs and what are profits from using the data (when compared to services like a car repair or getting a haircut).

Another important issue is that one cannot anticipate the development of technology. Many people who would not upload their photos to Facebook now, might have thought it was a good idea back in 2009. The only way Facebook could have profited from that was by showing advertisements next to the photos. Now, ten years later, with deep learning methods, the company can automatically analyze the photos and extract information about individuals’ lives that the users would otherwise never agree to provide and use it for marketing.

At first sight, it seems these issues are fixable by thorough regulation and insisting on business ethics that would allow everyone to be better informed and thus create pressure on the companies not to exploit their users or clients. Ultimately, competitors may appear offering what the clients want.

The book tries to argue that this is not possible. These problems not singularities, but inherent properties of today’s capitalism which is in this sense functionally similar to historical colonialism. Data is appropriated (justified to be a free thing to use) in the same way natural and human resources were appropriated by colonialists. Colonizers and colonized can never be in an equal position to make a fair deal.

Data Colonialism

The central idea of the book is a functional analogy of today’s social and economical situation with historical colonialism: Cheap and fast exploitation of human lives for corporate profit without the exploited people actually knowing what is going on and being able to take action. The form is, of course, incomparable: historical colonialism used brutal violence, nothing like this is going on with data collection, it is only the function that is analogical.

The companies that take the data act like colonizers calling the newly found territories “terra nullius”, the country of no one, which is free to appropriate. Historic colonialism also provided a legal framework for the appropriation of land. For example, Spanish colonizers were supposed to read aloud a document called the Requiremento to the people that were going to become subjects of the Spanish kingdom. If they did not object, which they did not because they did not speak Spanish, they were considered under the Spanish rule. The endless texts of EULAs (End-User Licence Agreement) for services that collect our data is the data-colonialism analogy of the Requiremento. In many cases, our understanding of these is very similar to how indigenous villagers in South America understood Spanish Requiremento.

Monopolies and monopsonies

Early capitalism of the 19th century led to a highly unequal distribution of profit generated by labor which threw industrial workers in extreme poverty with quality of life below the standards of the pre-industrial era. One of the reasons were monopsonies (the opposite of monopoly) on the labor market.

Whereas in the case of monopoly, there is a single seller that can increase the price as much as they want, in the case of monopsony, there is a single buyer that can decrease the price as much as they want. Both monopsony and monopoly prevent the market from setting a fair price of whatever is traded. In early capitalism, industrial enterprises became monopsonies buying labor (of effectively formed cartels to become labor monopsonies) and put the price of labor to a minimum, which resulted in the extreme poverty of the working class that very limited resources to move elsewhere.

When we look at businesses based on user data (and products based on data such as advertising), five big companies are not far from being a monopoly-monopsony combination: the single collector of a certain type of data and a single seller of products based on the data. Antitrust legislation of national states tries to prevent such a situation without much success. Whoever would want to become Google’s competitor would need to get user data and optimize the services using the data to provide services of comparable quality, but it is impossible to get the data when everyone only uses Google. Because of the data (money does not really matter), there is no way to enter the market later and compete.

Unrestrained data capitalism can have similar consequences as unrestrained capitalism in the 19th century. It only depends on the big companies: how they will use their position. The fact that I worked at Google and saw the company culture from inside makes me optimistic about this. Profit at any cost does seem to be the policy of any of the companies. The problem is that society needs to trust them without having any control mechanism.

This is a more serious issue in countries where these monopoly-monopsony data-driven companies are owned by the authoritarian state and people connected to the government. But the political situation can change everywhere at any time. Can you imagine a world when after a world-wide crisis Google, Facebook and Amazon with all their economic power are in the service of an authoritarian government? It seems like a dystopian nightmare.

Real colonialism in data colonialism

There are also aspects of the “real” colonialism repeated in the data colonialism. Economical inequality between the colonies and their metropolis continues without any prospects for change. Most of the global tech companies are based in the USA, but collect data and make profit globally. People in underdeveloped countries will gladly adopt the cool “free” services, but it means that at the same time that local businesses need to pay foreign companies to access their local customers. Colonizers in the “metropolis” still get rich from local economic life in colonies.

Myths that legitimize data colonialism

Companies need to configure people such that they generate the data the companies need. Historical colonialism justified itself via several myths that still dangerously resonate in nowadays society. The most important one is racism: it was natural that white colonizers ruled to those who were believed to be of an inferior race. And of course, all the brutal violence was considered just a small price for bringing scientific progress into the underdeveloped nations that were often so underdeveloped that they did not even know that they wanted the western progress.

Data colonialism creates myths that reinforce the data relations, and as a byproduct, these myths change society.

  1. The data are valueless for individuals. Individuals do not lose anything when they provide data to companies.

  2. Self-presentation is an important part of our culture. The more self-presentation is online, the more data is generated, regardless of the impact that the self-presentation culture might have on the mental health of the entire population.

  3. Data-driven technologies represent scientific and technological progress. By providing the data, we help the companies to build a better world where artificial intelligence can free us from boring routines. No one wants to slow down progress.

AI as a humble servant of data colonialism

In the 19th century, many people believed that railways and technology of mass-printing of newspapers would start much freer circulations of ideas of equality and human rights into colonized countries. There were also hopes for the economic growth of the colonized countries because of better access to western markets. None of this happened. Paradoxically, the railway became a symbol of colonial power in many countries and ultimately only led to more efficient exploitation of the colonized ones.

These hopes are very similar to what people (including me, have a look at the motto of this blog) put into AI. Technologies that we now call AI have the potential to liberate people from boring and routine, yet cognitively non-trivial jobs, and thus allow them to do something more human-like (whatever it means).

Nevertheless, when we look into the Amazon warehouses, we see the exact opposite. Workers get detailed instructions into their headsets from a computer program that in parallel optimizes all their movements and squeezes the last drop of productivity from the workers. We can hardly say such a job is more human-like.

Shared economy platforms (such as Uber or Lyft) take the advantage of having access to data from both clients and providers of the services that they mediate (using machine learning models they can train using the data). Owning the data allows them to move almost all risks to the customers and the service providers. They advertise themselves to the service providers that they offer freedom and flexible work hours. What people really get is a full-time job without traditional labor protection.

What are the takeaways?

According to the book, the data colonialism attacks the nature of human freedom and the authors try to explain that in a framework of Hegel’s philosophy. It is likely that I did not understand it fully, but these arguments did not really persuade me. (But they use cool passionate wording for that: “make the totality of human life available to capitalism at the cost of breaking the very basic personal integrity”.)

However, I agree with the book at the point that this situation is dangerous. Public discourse misses the exact conceptualization of what sort of power data colonialism creates. The liberal conceptualization is clearly not capable of that which makes it hard to oppose. Every great power eventually leads to the emergence of a counter-power. In modern history, we have witnessed many times that counter-power movements were not really broad-minded liberals spreading love and peace, but very often movements that preferred making war instead of love.

  • The power and greed of early capitalism exploiting the labour of the 19th-century workers gave rise to radical left-wing movements that turned into governments of some of the most brutal totalitarian states.

  • Misuse of power of the countries that won WW1 gave rise to national socialism in Germany.

  • Post-colonial continuation of exploiting the former colonies feeds the hate that drives current global terrorism.

In the starting point of each of these situations, there was a widely accepted (perhaps we can say liberal) explanation why the unjust situation was just, ignoring the inherent unfairness of the situation. Data colonialism can be seen in this way too. Data colonizers are today’s power that can give rise to a counter-power movement. We have no clue what the anti-power will be, but history teaches us that it might be something very unpleasant.