Quality Structure Freshness

Data Quality

Data Quality is a subjective measure of the value of information based on the degree to which it possesses the following three characteristics:

  • Accuracy – Data is accurate if it is free from error or defect. Accurate data is a precise and exact representation of events

  • Relevance – Data is relevant if it has bearing upon or is related to the question, issue, concept, operation or strategy at hand. Relevant data is pertinent and has a material relationship to a topic of discussion, decision to be made or strategy to be defined and executed
  • Timeliness – Data is timely if it is delivered at such point that it can be taken into consideration during the course of discussion, evaluation or execution. Timely data is delivered to the appropriate personnel such that it can be used to effect change and/or impact decision making

For data to be judged “High Quality” it must possess a high degree of all three of the aforementioned characteristics. Data that is completely lacking in any one of these qualities is considered to be of low quality. Organizations must define acceptable levels of data quality and should have fail-over procedures in place in case requisite data does not meet minimum levels of acceptability.

Given that Data is the basis and building block of all Information Assets within the Information Delivery and Business Intelligence system, Data Quality is critically important and demanding of routine evaluation and management.

Data Storage Structure & Strategy

Data Warehouses, Stores and Marts

Security, system processing speed and ease of use are all significant drivers in the configuration of any data storage design strategy.

The most common approach to defining the data storage structure is to create:

  • Warehouses that correspond to the different functions of the Enterprise
  • Stores that correspond to the different roles within each function
  • Marts that support the most complex and technologically demanding information assets required within each role

As the Enterprise considers the data storage structure it will use to support the Information Delivery and Business Intelligence system, it is important to keep the following in mind:

  • Multiple warehouses do not necessarily mean multiple servers. It is common for entities to store multiple warehouses on a single server and use the security / user profile configuration options within the server appliance itself to keep users out of warehouses where they don’t belong
  • Not all delivery assets require their own Mart. For example, a simple operations report with simple calculations can be quickly pulled from the physical databases within the Warehouse. Where multiple queries, joins and algorithms are required to produce an information asset however, it is often advised to cache a mart of data for that particular deliverable.
  • While warehouses are always physical technological assets, Stores and Marts may be temporary assets consisting of sub-sets of data extracted from the warehouse and cached in memory only long enough to support the user and/or deliverable for which they were intended.

The most important issues to keep in mind when defining the data storage structure and strategy is how and how often data elements will be utilized, which data elements are going to be used routinely by power users, which deliverables will be the most processor-intense, the security requirements of the Enterprise and, ultimately, the capabilities of the hardware appliance that is to be used to host the warehouse(s).

Data Updates & Freshness

As Accuracy is a measure of data quality, the frequency with which the source of all data to the Information Delivery and Business Intelligence system is updated is a critical issue to be addressed during the design of the system.

As noted previously herein, accurate data is a precise and exact representation of events. It is the precision component that gives an Enterprise the latitude to define “acceptable” levels of

accuracy. It is also important to note that not all data is held to the same standard. For example, Balance Sheet data such as the depreciated value of capital assets may be considered accurate if it has been updated within the last 90 days while Income Statement data such as gross sales to date may be considered accurate only if it has been updated within the last 24 hours. Similarly, a stock trader may require data that is updated every 3 seconds while a bank officer may not need updates any more often than once every 30 days.

Ultimately individual power users will often dictate the level of “freshness” required in order for data to be accurate enough to be judged high quality.

Data Freshness

There is an inverse relationship between the level of detail in an Information Asset and the frequency at which the underlying data supporting that content must be refreshed / updated. The table below demonstrates this relationship.

Appendix A – Glossary

TermDefinition
– A –
AccuracyA measure of Data Quality. An objective evaluation of the level of defect or degree of error. Precision and Accuracy are synonyms when discussing Data Quality
Agile ApproachA project management methodology based on constant interaction between the project team and user community where progress is reviewed and priorities updated constantly based on changing project forces and user demand
AlertHigh-level message from Information Delivery and Business Intelligence system that predicted future state, based on current operations, may not be within acceptable thresholds
AlgorithmsCollection of specifically defined rules, steps and processes, to be executed in a specified cadence in order to solve a problem
AssetAny end-user facing output of an Information Delivery and Business Intelligence system. Assets consist of Reports, Scorecards, Dashboards, Alerts and Warnings
– B –
Informationsdienst eines Unternehmens“BI”. The third level of maturity in the Data Maturity Model. Business Intelligence is the comparative result of information to internal and external metrics
– C –
CalculationsArithmetic operations applied to numeric Data in order to convert it into Information
ConcatenationMethod of textual data manipulation where multiple strings and/or sub-strings from multiple alpha-numeric data fields are combined to create a new text string. Concatenation is the opposite of Parsing.
Current StateSystems, operations, processes and work flows as defined and in force at the current time
– D –
DashboardGraphic, dynamic representation of current operations updated continuously
  Daten  A specific characteristic and/or result of an action, work flow, process or operation
Data ArchitectureThe underlying structural composition of a data storage unit. Data Architecture documentation represents the “blue print” used to build and support data structures including databases, data warehouses, data stores and data marts. A Data Architecture defines the characterics and

TermDefinition
requirements of all data elements within a data storage unit.
  Data Freshness  A measure of how recently data has been updated with regards to the most recent operations cycle. The more time that elapses between updates the less Fresh data is considered and the lower the overall data quality measure as less Fresh data is considered less accurate
Data MartA collection of data elements grouped according to a common job function
Data ProfileSpecific details regarding the characteristics (Type, Format, Picture, Default Value) of each field within a database Architecture
Data StewardIndividual empowered to enforce and oversee the administration of a Governance Model
Data StoreA collection of data elements grouped according to a common reporting asset
Data StructureThe definition and configuration of and relationship between Data Warehouses, Data Stores and Data Marts
Data UpdateA revision of existing data and addition of new data to a data storage unit based on all operations and work flows that have been executed since the last Data Update
Data WarehouseA collection of data elements grouped according to function
DeliverableSee “Asset”
DeliveryMovement of an Asset from the source system to the end user
Downstream FlowFlow of data from its native source system to a central storage facility (often a Data Warehouse) and the flow of assets from the central storeage facility to end users. Data that goes from its native source to a warehouse is said to flow Downstream.
– E –
EffectivenessA measure of one’s ability to complete a work stream, operation, process or function such that the end result is the anticipated / expected current state and/or Asset. And operation can be Effective irrespective of whether or not it is efficient.
EfficiencyA measure of the human and financial costs required for a work stream, operation, process or function to be Effective. An operation may be Effective without being Efficient, but it can not be Efficient without being Effective.

TermDefinition
End-UserConsumer of assets generated by and delivered from the Information Delivery and Business Intelligence System.
End StateSystem, operations, processes and work flows projected to be in force as of the completion of specified project or work stream.
ETLExtraction, Transformation and Loading of data as it flows Downstream from it’s native source to a centralized data storage facility
  Event  A metaphysical state of being inside the Enterprise that has significance within the Information Delivery and Business Intelligence System. Also see “Trigger”
– F –
FirewallA system of related security measures and access devices within a technological environment in place to control and monitor access to data elements
FormatPhysical presentation design of assets; Placement and presentation details of Reports, Scorecards and Dashboards. Also see “Layout”.
Future StateSystems, operations, processes and work flows projected to be in force as of a point in time that has not yet occurred but is still to occur
– G –
Gated ApproachA project management methodology based on interim interactions between the project team and user community. The occurrence of these interactions is tied to the accomplishment of certain goals or the completion of some phase of design of an end-state asset
  Governance  A system of security measures, data quality measures, data management processes, user password protocols and technology standards administered and overseen by Data Stewards
– I –
InformationNumeric, Date and Time Data elements that have been manipulated according to arithmetic and/or statistic operations.   Textual Data elements that have been manipulated according to alphanumeric manipulations including Concatenation and Parsing
Information AssetSee “Asset”
  – L –
LayoutThe physical configuration and design of of an Information Asset. Also see “Format”
  – M –

TermDefinition
MetadataA component of the Information Delivery and Business Intelligence system that translates user requests for information and BI into executable queries. The Metadata component translates end-user speak into computer-speak
  Metric  A unit of measure defined by an Enterprise as an indication of process, operations or work stream results and/or impact
MiningCollecting data elements from the results of operations, procedures or work streams
– P –
ParametersBoundaries set to control Data, Information and Business Intelligence content presented in Reporting Assets. Parameters may be numeric, textual or date/time in format and are often provided in pairs with one upper and one lower boundary. Exceptions include single parameters provided when a standard deviation is allowed as a qualification for content inclusion
  Parsing  Separating a single textual string into multiple textual strings. Parsing is the opposite of Concatenation.
PortalA gateway through which end users can access both the Presentation Layer and certain Information Assets. Intranet Portals are contained within a private internal network while Extranet and Internet Portals are available for use to authorized users outside the Enterprise.
Pull DeliveryMethod of presenting Information Assets according to a specific user request
Push DeliveryMethod of presenting Information Assets automatically to end users irrespective of a specific request
– Q –
QueryCommands and parameters passed to a data source resulting in the extraction of a sub-set of data that is valid given the limits of acceptability defined by the parameters
– R –
RelevanceA measure of Data Quality. A subjective evaluation of the degree of impact Data, Information or Business Intelligence can have at a point in time
  Report  A type of Information Asset.  Detail Reports contain specified data elements for all records that fall within the defined selection range. Summary Reports contain statistically and arithmetically calculated Information that present a macro view of the content of a related Detail Report. Exception Reports display specific data elements for records that fall outside the defined selection range.
– S –

TermDefinition
ScorecardA type of Information Asset. A static, numeric and textual summarized representation of the results of actions, work streams, operations or processes executed during a previous time period.
SicherungProtection of the Intellectual Assets that are Data, Information, Business Intelligence and the calculations, methods and processes by which they are created.
StewardSee “Data Steward”
StrategyCollection of actions, work streams and processes designed to accomplish a defined end state
Subject Matter ExpertIndividual with deep level of understanding and extensive experience in a specific area of business operations or function
Subscription DeliveryMethod of presenting Information Assets to end users where the asset is automatically Pushed to the end user based on the specific delivery details included in a Subscription. A Subscription is a user request for information that is issued once by the end user and then retained within the system and repeated according to a prescribed cadence
– T –
TimelinessA measure of Data Quality. A subjective comparative evaluation between when an Information Asset is received by an end user and when that Asset may impact the end user’s actions, decisions and/or work streams.
  Trigger  An metaphysical state that, once achieved, will cause an action or series of events to occur. Please also see “Event”
– U –
Upstream FlowFlow of data from end users to a central data storage facility. Data that is manually entered into the system is said to flow Upstream.
– W –
WarningHigh-level message from Information Delivery and Business Intelligence system that current-state is not within acceptable thresholds