Toward a Complete Data Valuation Process. Challenges of Personal Data

Data should be considered as a new asset, requiring new valuation rules, which do not apply to old commodities or to intangible assets (patents, intellectual property).

In a paper published by the ACM Journal of Data and Information Quality, with Mihnea Tufiş, we discuss challenges related to the data valuation process, in connection to our work on the Horizon 2020 Safe-DEED project.

From data-as-a-commodity to a data valuation process

When discussing data valuation, we need to consider two paradigm shifts, both consequences of the big data revolution: the changes in the data production-consumption cycle and the structure of the data-transactions ecosystem.

Changes in the data production-consumption cycle. Much of the data produced every instant is a by-product of activities, behaviors, or processes that are not always the primary intended focus of data collection. Businesses switched from allocating resources for the identification and processing of data that supported a subset of activities to generating data about nearly every aspect of our lives.

Structure of the data-transactions ecosystem. The established model of Internet and data-driven companies was to offer a seemingly free service in exchange for their users’ personal data. However, the data deluge from the past decade and the gradual shift of businesses toward data-driven decision-making have created a fertile ground for data brokers.

A first important challenge is to build frameworks that go beyond views such as data-as-a-commodity, data-as-an-asset, or data-as-a-product and consider the valuation process in all its complexity.

Challenge 1. Define a data valuation process, grounded in the following three aspects: a formal description of the valuation context, data quality assessment in the given context, and an assessment of the extent to which the data is useful for achieving the intended goals as they have been stated in the context. Finally, the challenge amounts to finding a method that maps all these aspects to a measure of value for data.

Pricing personal data: The gap between expectation and reality

Missing information about the value of data is one of the barriers to establishing pricing models for data-as-a-commodity. We reviewed a number of online platforms that are monetizing personal data (the analysis can be found in the paper) and were thus able to distill two additional challenges, resulting from the practices surrounding the collection, packaging, and monetizing of personal data.

There is a wide range of personal data collected by data brokers: identification, demographic, location, behavioral, online activity, psychological, product, and political preferences. Most of the times, this data is sold in bundles, which prompts several questions: are all these equally important to a buyer, are they equally sensitive for a seller, and how do each of these stakeholders value them?

A study by the telecom company Orange, suggests the existence of three factors that influence the perceived value of personal data: (1) usefulness of the data to the beneficiary organization, (2) the type of data, and (3) the risk associated with sharing it. When considering the third factor, the study simply refers to the “perceived” risk associated to sharing personal data and does not go into details about how such risk might be quantified.

Challenge 2. Develop methods to quantify the risk of sharing different types of data.

In a 2016 survey, a UK credit report company asked 1,000 UK consumers to estimate the economic value of different categories of personal data and compared it with how much third-party companies would pay to acquire the same data to utilize it in marketing campaigns. The results revealed interesting attitudes and different data sharing practices, spread across demographic groups and types of data alike. The magnitude of this difference points to a third challenge of pricing personal data.

Challenge 3. Build digital literacy together with legal frameworks to bridge the gap in understanding (1) the permeability of our digital traces and (2) the ease with which data companies are able to collect and monetize them.