The Business of Personal Data Sales and Analytics: Governance
Source: S.Herman/F. Ritcher from Pixabay
Data fuels AI. The business of selling personal data can be overt or covert, and the challenge of privacy is always there. Because of technology, globalization, and lagging regulations, firms can gather and merge data from different sources, from across the world. Data is increasingly considered a valuable resource and forms the backbone of many competitive business operations. There are various business models to capture, collect, store, move, retrieve, analyze, facilitate, network, secure, and broker data. As of 2021, there was 1327 exabytes of data stored in data centers worldwide. This figure is growing as more populations get internet connectivity, as we use more devices that capture data or leave a digital footprint, and as we spend more time online. When we look at data business models from the point of view of transparency and legitimacy - there are four quadrants a business can fit in (for simplicity's sake).
Figure: The data model governance typology
Data is often collected for research, service delivery, or advertising. The data is supposed to be anonymous and used to improve existing technologies and services, customer customization, or to gather new insights. A company with a good data governance structure should invest considerable funding for making data anonymous and providing state-of-art data protection. It would not “sell” data where its individual clients/customers could be identified. Sadly, we know this is not true, and often the companies we trust have been known to break that trust as they look for new ways to augment their business revenues (how many times has a tele-marketer contacted you? Your website pops up a new search?).
As mentioned in this blog, a new business of data brokering has emerged.
"Data brokers oil the wheels of surveillance capitalism too. Data broking is a US $200 billion, unregulated industry of legitimate third-party businesses that don’t hold the data like Facebook and Google do, but rather collect (buy) it from various sources (such as stores with loyalty card programs), correlate and package it, and on-sell it to companies for as little as $79 so they can tailor advertising to those from whom the data was taken in the first place.” For example, a company like Acxiom, a part of Interpublic Group (IPG), according to this article, has 23,000 servers that collate 3,000 data points per person for 500 million consumers worldwide.
There are an estimated 4000 data brokering companies operating globally. A growing challenge for privacy enthusiasts is that the data gathered by these services may be collected and analyzed using various software platforms, stored in multiple devices, and handled through many people/organizations. So, for example, a speed camera or a building CCTV may have data that can be cross-verified with another source of data (driver's license database or public social media profile) to identify an individual at any time. Data can be collected in many ways: see these sources for faces used for research or the example of the NarxCare in healthcare.
Data could also be collected covertly by third-party vendors. This can happen through APIs, which are used for web-based applications. There are four types: public (open-source), partner, private and composite. While they store no data themselves, they can be used to link data or exchange data (using protocols like REST AND SOAP, which are more secure than RPC), and this can create data security issues. Most APIs use REST protocols (69%), and in 2020, 22,000 were open.
Open-source and often minimal authentication or moderation
Customized or developed for clients (for example, may help partners collect to a platform). Since they are owned and have a clear contribution to revenue indirectly, they often have higher levels of authentication, authorization, and security mechanisms
Used only within an organization and hence may not have high security or authentication levels
Combination of two or more APIs
Take the example of Amazon Web Services, which has 400 different, discrete services in its product portfolio, all of which have their APIs. Microservices connect discrete services and help them communicate via APIs. When Elon Musk took over Twitter in 2022, he complained about microservice bloating, which slows the system's efficiency and exposes it to security vulnerabilities.
A survey of 37,000 developers and API professionals found that 51% of the respondents felt that more than half their development efforts were spent on APIs and that 20% of the respondents had a security incident or breach at least once a month. In addition, a Gartner report stated, “many API breaches have one thing in common: the breached organization didn’t know about their unsecured API until it was too late. This is why the first step in API security is to discover the APIs which your organization is delivering, or which it consumes from third parties.”
In other cases, agencies may think of monetizing their databases (even though they do not have explicit permissions) by changing their Terms of Reference (TOR). This is more difficult to track for privacy, as it is unlikely you would discontinue a service when there is a change. Often the implications of these changes are difficult to assess for an individual. Companies change TORs very regularly and this is a growing challenge.
Here are some examples of data consent – Which company do you think is most transparent? A, B, C, D, E or None of them?
(A) I find this confusing for the user – for example I am assuming that grey is No and blue is YES?
(B) Here there are only two choices (a Fortune 500 company)
(C) This is the third choice: I cannot turn off essential cookies so am assuming that the grey is OFF.
(D) Another example where the default situation is on and hence most likely I can make out the colour settings for OFF (but I needed to click the default to see it does not move)
(E) This was surprising (while I am thrilled I had more choices – there were somethings I could not change – I am not on TikTok – why would they need my data?)
Of course, there is also an unfairness where TORs differ and this is obvious if you are based in the EU (thanks to GDPR), UK, or California. Some countries also have regulations on movement of local data and this may offer more protection for the individual.
Last but not least, data can leak via hardware. A recent study shows that VR headsets are not safe from a data privacy point of view as critical voice biometrics can be captured by built-in motion sensors, such as an accelerometer and gyroscope which are zero-permission sensors.
What is your companies data governance strategy? It cannot be a reflection of compliance with regulations (that seems like an after-thought). It cannot just be about the availability, usability, integrity, security, accountability, stewardship of the data. You need to think of your business model. Your methods for obtaining data insights. Think, as an investor, a board member, a manager, or an employee, are you comfortable with your data being used the same way? Why would you assume your customer may agree? Which quadrant do you fall into in the data model governance typology (above)? What type of consent forms do you and your suppliers or third part providers use to collect data?