In Information Systems design and theory, as instantiated at the Enterprise Level, Single Source Of Truth (SSOT) refers to the practice of structuring information models and associated schemata, such that every data element is stored exactly once (e.g. in no more than a single row of a single table).
Any possible linkages to this data element (possibly in other areas of the relational schema or even in distant federated databases) are by reference only. Thus, when any such data element is updated, this update propagates to the enterprise at large, without the possibility of a duplicate value somewhere in the distant enterprise not being updated (because there would be no duplicate values that needed updating).
For some looking on from the outside (they have never seen multiple versions/sources of the truth in an organization) this may not seem all that important or relevant. Let me tell you, this is extremely important and “Single Source of the Truth” should be the standard for any organization big or small. If you look at smaller companies, although single source data can be a true and distinct challenge this is much easier to fix than larger organizations.
What does single source of the truth really mean?
Its actually quite simple. Whereever you get your data there is one and only one place to get that exact data from. Let’s say you have a set of finance data that has a data element of “RevenueUSD”. When you need that data (with that specific data element in it) then you go to one place. The reverse could look like this: John goes to the finance department and asks “Can you send me the weekly finance report?” and promptly the finance department fulfills John’s request and sends him the weekly finance report. Then John needs a report that shows sales vs revenue and goes to the Sales department and asks for the report. The Sales Department also fulfills the request.
Now, John opens both of the reports and sees immediately that the Sales report shows Revenue for the “Product A” in 2011 at $1.7 million however in the Weekly finance report the same product and the same year is $2.5 million. A difference of $800,000! The problem…. multiple sources of the same data. Two different locations and processes where data managers choose two different ways to load, manipulate, and report on the data.
How do you fix it?
Short Answer: Good luck Chuck and Nice Tryin Brian.
Long Answer: “The “ideal” implementation of SSOT as described above is rarely possible in most enterprises. This is because most organizations have multiple information systems, each of which needs access to data relating to the same entities (e.g. customer). Usually these systems are purchased “off the shelf” from vendors and cannot be modified in non-trivial ways. Each of these various systems therefore needs to store its own version of common data or entities, and therefore each system must retain its own copy of a record (hence immediately violating the SSOT approach defined above).
For example, an ERP (Enterprise Resource Planning) system (such as SAP or Oracle e-Business Suite) may store a customer record; the CRM (Customer Relationship Management) system also needs a copy of the customer record (or part of it) and the warehouse dispatch system may also need a copy of some or all of the customer data (e.g. shipping address). It may not be possible to replace these records with pointers to the SSOT as the vendor(s) may not support such modifications.
For organizations (with more than one information system) wishing to implement a Single Source of Truth (without modifying all but one master system to store pointers to other systems for all entities), three supporting technologies are commonly used. These are an Enterprise Service Bus, Master Data Management (or MDM), and a Data Warehouse.(Source: Wikipedia)
Stay tuned as we will cover both Enterprise Service Bus, Master Data Management in more depth shortly.
Joshua is working to become a Data Scientist with focus on Analytics, Big Data, Machine Learning, and Statistics. His passion for Data and Information are second to none. He is a certified IBM Cognos Expert with more than 10 years experience in Business Intelligence & Data Warehousing, Analtyics, IT Management, Software Engineering and Supply Chain Performance Management with Fortune 500 companies. He has specializations in Analytics, Mobile Reporting, Performance Management, and Business Analysis.
- 4,678 feed subscribers
- Imbo on 5 Tools You Need To Know To Work With Big Data
- Netflix #bigdata #datascience http://t.co/yMXOZoz… on Big Data Analytics and Netflix’s House of Cards
- Barbara Linman on Things to Learn in R
- Rob on Linear Programming: The Gateway to Analytics
- Joshua Burkhow on Linear Programming: The Gateway to Analytics
- Analytics (18)
- Big Data (7)
- Business Intelligence (57)
- Data Science (66)
- Miscellaneous (16)
- Data Sources (3)
Tags2008 Analysis Analytics Article Big Data Book Business Intelligence Charts Cognos Dashboards Data Data Warehouse Design Dimensional Fusion Tables Google Hadoop Humor IBM Learning Logical Market Microsoft Model Modeling Operational Predictive Programming Python R Ralph Kimball Reporting Science Server SQL SQL Server SSIS Statistics TED Tools Tutorial Unstructured Video Visualization Warehousing