Big Data Governance - Data, Algorithms and Auditors

In a previous post I introduced the topic of Big Data governance and that there are three different aspects that are part of big data governance: the data, the algorithms and the data auditors. I discussed accountancy as a metaphor, where accountancy is used within organizations for checks-and-balances, data governance should be used to ensure the data and algorithms are correct. Once data and algorithms will appear on the balance sheet as an asset, which is just a matter of time, organizations should better be sure that the data and the algorithms are correct and that is were the data auditors come into place.

The data

Organizations that collect, store and analyze data have the liability to ensure that the data provided by consumers through different applications is correct and up-to-data. Instead of placing the liability on the consumer, organizations should take their responsibility and guarantee that the data they use is correct as well as reliable. In addition organizations should make sure that their customers understand what data is collected when and for what purpose. They should inform users how the data will be used in the first instance, but also what the secondary usage of the data will be the moment this becomes clear.

In a previous post I discussed that this should be done via clear and understandable privacy policies and terms and conditions, which can also be understood by the digital immigrant. Users should also be kept up-to-date via email when the usage of their data changes over time. Making these documents difficult to understand and/or changing it rapidly does not fit in an information-centric organization that wants to make the most of big data. Organizations should therefore make it easy for customers to amend their privacy settings as well as to delete or edit their data whenever they deem suitable. Organizations should not be allowed to push the responsibility towards the users.

Taking the responsibility of how the data is used is only one aspect of data governance. Another important issue is that organizations, as well as governments, should do everything to guarantee that the data they collect is correct. If this is not done correctly, it can go wrong as showed the example of American Senator Edward Kennedy who was refused entrance in 2004 at several airports because his name some how appeared on a terrorist watch list. Quite often users do not know what information is collected and they do not have access to their own data to adjust it once it is stored in the cloud. Users should be able to correct data if they find out that their data is incorrect or is misread. This principle is incorporated into the Fair Credit Reporting Act (FCRA), which compels credit-reporting agencies to offer consumers access to their credit reports so they can have inaccuracies corrected. But this is only for credit-reporting agencies and the law was originally passed in 1970, way before all other industries started collecting massive amounts of data. Nowadays, the general public has no clue what entities are collecting what information and what they are or will be doing with it. Of course, giving consumers the ability to adjust incorrect data should require several security measurements as to prevent misuse.


Finally, organizations that collect and store data should take the necessary security measurements that the data is stored securely and cannot be hacked or stolen by criminals. Just as banks do everything to protect the money they have received from consumers and organizations as well as indemnify consumers and organizations when a bank is robbed, organizations should protect the data they have collected and indemnify users when their data is hacked.

The algorithms


Algorithms are capable of incredible analyses and can turn vast amounts of raw data into information. The first step is to certify that the data used by the algorithms is correct as we just discussed. The second step is to ensure that the algorithm that turns that data into the information that is used by information-centric organizations in their decision-making is correct. How do managers, but also consumers, know that an algorithm works correctly? That green is really green; meaning that an algorithm used to decide whether data can be used (green) or not be used (red) makes the right decision and that therefore the data marked ‘green’ can really be used. If major business decisions are based on such an algorithm and the algorithm appeared to be making the wrong decision, it can have major consequences for the organizations as well as for the consumers. Consumers that apply for a loan have to trust organizations that use algorithms to determine the risk profile of the consumer, establish it correctly and that they are not refused a loan incorrectly or have to pay more than strictly necessary.

Perhaps, big data startups that have created algorithms should receive a quality label that their algorithm is working correctly and appropriately, serving the purposes it was designed for. Organizations that use an algorithm developed by a big data startups or any big data technology vender that has such a quality label, can be trusted more and are more likely to receive a positive assessment by the big data auditors.

Organizations that develop their algorithms in-house, should have their algorithms also checked regarding correctness and sticking to (local) regulations. The authors of the book “Big Data”, Vicktor Mayer-Schönberger and Kenneth Cukier, therefore propose the rise of “Algorithmists”, who are capable of and allowed to check any algorithm that is created by organizations. These “Algorithmists” have to be very experienced with the different big data techniques that are available in the market and should specialize in different sections to be able to read and assess the algorithms that are developed. As algorithms are highly private company information, these “Algorithmists” should fall under confidentiality regulations.


Organizations that have had their algorithms checked and approved will give customers more confidence that they can trust those organizations and that they know that their data is used and analyzed correctly.

The data auditors



The data auditors are those who control the organizations and they can be external, just like the big four auditing firms or they can be internal; just like organizations have internal accountants that perform the required checks and balances. Data auditors should have 3 tasks:


  1. Confirm the correctness of the data and that the data is secured correctly;
  2. Confirm the accuracy of the algorithms performing the analyzes on the data;
  3. Confirm that the organization sticks to the four ethical guidelines, including giving consumers the ability to easily understand what data is stored as well as giving them the ability to edit or delete the data.



There should be different levels of examination the data auditors can perform. Organizations that deal with highly private personal information such as health records or financial data should undergo the most strict assessment, while organizations that use the data for innocent mobile applications can have less strict regulations to stick to. How these assessments or regulations exactly should look like will vary per country, but in the end a global set of data accounting standards should be developed, similar to the International Financial Reporting Standards (IFRS) or the Generally Accepted Accounting Principles (GAAP).


As this is a new part of big data, what do you think about big data governance ? Join the discussion in the comments below or via the social networks.

Image courtesy of jscreationzs / FreeDigitalPhotos.net

Image Credit: SWEviL/Shutterstock