While media debates whether data is the new oil or not, one thing is clear: Like oil, data needs a lot of processing. From Facebook to growing startups, any successful organization that handles a growing volume of data, must be able to organize, access, secure and process data to convert it into insights and decisions.
There are many tools and vendors to consider, particularly in terms of the needs of the business and the task at hand. However, regardless of the task, the goal is to ultimately find a way to make data as useful as possible while minimizing cost, risk, and resource consumption.
Data Management Software
Data management is a broad discipline, with many different focuses and tools to manage these focuses. Some vendors and softwares contain multiple functionalities and can eliminate the need for a dedicated tool. If you’re in search of a bit more background about data management, be sure to check out our blog post on the topic.
We can structure data management software around these topics
- First, data architecture and data model design software enable companies to model their data structures
- These data structures are created in databases provided by database management companies
- An especially important topic is managing master and reference data
- Documents and other unstructured content pose challenges for especially traditional databases and various solutions facilitate their collection and analysis
- Metadata about all this data is valuable as the simplest metadata fields such as update and creation times allow companies to identify issues in their data and analyze the data creation and update process
- Once data federation (collection) begins, data quality needs to be monitored and there are numerous solutions to measure and increase data quality
- Finally, numerous solutions of differing complexity enable companies to analyze this data
Data Architecture, Analysis, and Design
Data architecture is the models, policies, or rules that govern which data is collected, how it stored, and how it is used. It is then further split into enterprise architecture or solution architecture.
Data analysis is the process of inspecting, cleansing, transforming, and modeling data in order to find useful information. Data analysis also includes data mining, statistical applications (descriptive statistics, exploratory data analysis), and a wide range of techniques for analyzing statistical data, such as hypothesis testing or regression analysis.
Data modeling defines and analyzes data requirements necessary for business processes within information systems. There are three different types of data models produced, which progress from the conceptual model, to the logical data model, and finally arrive with the physical data model.
All of these categories help to organize and map data, improving its reliability and also transparency within an organization.
Some useful tools related to these products include:
-Database management to reduce redundancy
|Teradata||1979||Public||-Big Data architecture that can be built from multiple data platforms|
|Looker||2011||Private||-Data analysis without SQL|
|Tableau||2003||Public||-Rapid ad hoc analysis without programming
-Automatic updates or live connection
Database management has a variety of objectives ranging from performance, to storage, to security and more. Tools aim to control data throughout its entire lifecycle, leading to better business intelligence and better decision making.
Some general tasks that should be met with the right database management software include:
- Application tuning
- Response time testing
- Throughput testing
- Performance management
It is important to keep in mind the difference between DBMS and RDBMS. DBMS is a general term for different types of database management technologies that have been developed over the last 50 years. In the 1970’s, a relational database management system (RDBMS) was born and quickly became the dominant technology in the field. The most important factor in RDBMS is its row-based table structure that can connect related data elements, which is achieved via database normalization. Since 2000s, non-relational or no-SQL databases like MongoDB started gaining popularity but relational databases are still important for storing structured data.
Some vendors that work within this discipline include:
|Oracle Enterprise Manager||1977||Public||-Self management capabilities built into database kernel
-For Linux, Windows, Solaris, IBM AIX, UP-UX
|IBM DB2||1983||Public||-For Linux, Unix, and Windows
|MongoDB||2007||Public||-Works with AWS, Azure, and Google Cloud
-Several versions: Enterprise Advanced, Stitch, Atlas, Cloud Manager
Reference and Master Data Management
Reference data is a subset of master data that can be used for classification throughout an organization. Some of the most common reference data include postal codes, currency, codes, and other classifications – but it can also be ‘agreed upon’ data within an organization. Managing this type of data is important as it often serves as reference for a number of systems.
There are a number of tools available to assist with reference data management, here are a few:
|ASG metaRDM||1986||Private||-Focus on compliance support|
|Collibra Reference Data Accelerator||2008||Private||-Easy deployment and implementation|
|Informatica Cloud - MDM Reference 360||1993||Public||-Utilizes INFA Cloud MDM foundation|
|Kalido by Magnitude Reference Data Management||2014||Private||-Embedded workflow engine for stewardship and governance|
Master Data Management (MDM) is a comprehensive method for defining and managing the essential data of an organization in order to provide a point of reference. Software for this field supports the identification, linking, and synchronization of customer information across disparate data sources. This information is used in support of a number of initiatives related to data stewardship and governance.
Some popular MDM tools and vendors include:
|Orchestra Networks EBX||2000||Private||-Includes functionality for master, meta, and reference data|
|Dell Boomi||1984||Public||-Features such as ‘Boomi Suggest’ and ‘Boomi Assure’ to help with development and testing|
|Stibo Systems||1976||Private||-Emphasis on multidomain MDM|
|Profisee||2007||Private||-Solutions built by industry|
Document, Record, Content Management
Enterprise content management, sometimes called document management, is the process of storing, managing, and monitoring documents from daily business activities.
Some general functionalities that any solution should include are:
- Document scanner for making digital copies of paper texts
- Optical character recognition (OCK) to convert scanned documents
- User based access
- Document assembly to create using a cabinet-and-folder structure
- PDF converter
- Document storage and backup
- Integration options
- Collaboration tools and version control
|Alfresco||2005||Private||-Range of workflow and collaboration options|
|Dokmee/Office Gemini||2006||Private||-A lower cost option than some|
|eFileCabinet||2001||Private||-A strong option for remote teams|
Metadata management is the administration of data describing other data. It also entails processes for ensuring that data can be integrated and utilized throughout the organization. It is important for maintaining the consistency of definitions, clarity of relationships, and data lineage.
Some common tasks associated with metadata management that should be fulfilled with any software or tool include:
- Metadata repositories for documentation and management and to perform analysis
- Data lineage to specify the data’s origin and where it has moved over time
- Business glossary to communicate and govern key terms
- Rules management to automate the enforcement of business rules
- Impact analysis detailing any information dependencies
|Adaptive Metadata Manager||1997||Private||-Over 20 years of experience with a number of partnerships|
|Data Advantage Group||1999||Private||-Known for ease of implementation|
|Informatica Metadata Manager||1993||Public||-Concentration on information governance and analytics|
|Smartlogic Semaphore||2005||Private||-Captures inconsistent and incomplete metadata related to information assets|
When we talk about the condition and usability of the data for its intended function, we’re talking about data quality. Some major processes associated with ensuring high data quality include:
- Parsing and standardization: Breaking down text fields into their components and formatting their values into consistent layouts based on the chosen criteria. Some common layouts are defined by industry standards, user-defined business rules, or knowledge bases of values and patterns.
- General “cleansing”: Updating data values to fall within domain restrictions, integrity constraints or other business rules that determine minimum data quality for the organization
- Profiling: Data analysis to capture statistics (metadata) to obtain insight into the quality of the data and locate data quality issues
- Monitoring: Process to ensure conformance of data to set quality rules for the organization.
- Enrichment: Increasing the value of internally held data by adding related attributes from external sources
Any data quality tool you consider should include functionality for all of the above and more. Some major vendors include:
|Talend Open Studio for Data Quality||2005||Public||-Open source with over 400 built-in data connectors|
|Ataccma||2007||Private||-Machine learning, self-service data preparation, data catalog|
|BackOffice Associates (BOA)||1996||Private||-Range of prepackaged reports available|
|Innovative Systems: Enlighten||1968||Private||-Address validation and geocoding feature|
Data Warehousing and BI Management
A data warehouse is the consolidation of data from a wide range of sources that sets the foundation for Business Intelligence (BI). All data here is stored in the same format, but intelligent algorithms such as indexing enable effective analysis.
Business Intelligence is the set of methods and tools used by organizations to take data and make better informed decisions based upon it. BI platforms describe either what is happening with your business at the exact time or what has happened – preferably in real time.
To better understand the tools for each of these, the following table compares the major differences:
|What it is||Source||Output||Audience|
|Business Intelligence||System to derive business insights||Data from data warehouse||Reports, charts, graphs||Executives, management|
|Data Warehouse||Data storage, historical and current||Data from different sources||Data in consistent format for BI tools||Data engineers, data and business analysts.|
Some examples of tools for these processes:
|Microsoft Power BI||BI||2013*||Public||-Similar interface to Excel|
|QlikView||BI||1993||Private||-Includes data mining and analytics|
|Cognos||BI||1969||Private||-Multidimensional and relational data sources|
|Tableau||BI||2003||Public||-Widely regarded as one of the best options in terms of visualizations|
|Teradata Data Warehouse||DW*||1979||Public||-Uses AMPs (Access Module Processors) to increase data processing speeds|
|Amazon Redshift||DW||2012*||Public||-Completely managed tool - no need for DBA|
|Oracle Data Warehouse||DW||1977||Public||-Includes some BI functionality|
*DW = data warehousing
*Year of product founding, not company founding
Data warehouses often exist in close conjunction to an ETL (Extract, Transform, Load) solution that takes data from many different sources and ‘transforms’ it into a single, usable format for the data warehouse. To learn more, see our about ETL and ETL tools blog posts.
Interested in learning about more the technologies and vendors that are changing the way organizations get things done? Check out our blog for posts on a wide range of AI/tech related topics.