By Ryan Wade, Principal Consultant at Diesel Analytics
The volume of data collected is increasing at an extremely fast rate. Globally, this amount is expected to surpass 44 zettabytes by 2020.
To put this quantity in perspective, one zettabyte is roughly 1 trillion gigabytes. Data is considered to be the new gold because of the level of insight you can derive from it. With the amount of data collection increasing, all signs point to a virtual gold rush for data miners that can process the information.Wastefully, only about 0.5% of all data collected will ever get analyzed, regardless of the value the data may have provided. This disparity is mostly due to the lack of knowledge of the various methods that can be used to analyze data. In this article, we will discuss the main categories of data analytics and give examples of how they apply to business.
Data analytics can be broken down into four major categories:
- Descriptive Analytics
- Diagnostic Analytics
- Predictive Analytics
- Prescriptive Analytics
Let ’s review these four types of data analytics in greater detail. Descriptive Analytics
The simplest and most commonly used type of data analytics is descriptive analytics. They tell you what happened. Examples are periodical sales reports, inventory reports, & operational dashboards.
The data sources of descriptive analytics are often online transaction databases, enterprise reporting systems, and data warehouses. MS Excel, Power BI, Tableau, and QlikView are the favorites used to deliver descriptive analytics.
The next type of data analytics is diagnostic analytics. Diagnostic analytics help answer why things happen.
Example – Retail Sales Report
Retail sales reports with drill through capabilities are a basic example of diagnostic analytics. This capability allows management to zero-in on which stores have trending performance issues, focusing their attention more accurately.
Example – Bad Debt
Another example is a public hospital that has a report that monitors patient revenue. One of the line items in the report may be patient bad debt. The hospital can incorporate diagnostic analytics in the report by implementing a drill-through that gives the demographic makeup of the patients that are in the bad debt category. If the analyst finds that the income composition of many of the patients in this category may qualify for charity care, that insight will enable them to offset some of the bad debt.
The data source used in this type of analysis is usually an enterprise data warehouse. The data must be trusted, and data in data warehouses are typically scrubbed and tested for accuracy. Prominent front-end tools used in diagnostic analytics are MS Excel, QlikView, Tableau, and Power BI.
The next level up from diagnostic analytics is predictive analytics. Predictive analytics attempts to answer the question of what will happen. The level of complexity and sophistication of predictive models can vary greatly, but the ones that are most useful in business are parsimonious models. That model achieves prediction with as few variables as possible, making them easier to interpret and analyze.
Two popular types of predictive models used in the enterprise are linear regression and logistic regression models. Linear regression models use a sequential approach to show the relationship between independent variables (your predictors) and your dependent variable (your response).
Example – Sales Prediction
A simple example is a store estimating future sales based on GDP. The other model, logistic regression, predicts the likelihood of an event (your response) happening that has a binary outcome based on one or more independent variables (your predictors). A simple example is determining the likelihood of whether a person might purchase a bike, based on specific personal characteristics (attained education level, income, proximity to work, etc.)
Common data sources for predictive models are enterprise data warehouses. Again, this is because the data is clean and vetted. On occasion, data scientists sometimes find that the data in the data warehouse is not sufficient, so they compensate by using disparate data sets such as those from the US Census to enhance the enterprise data. Popular tools used in predictive analytics are R, Python, SAS, SPSS, & RapidMiner.
The top level of analytics is prescriptive analytics. Prescriptive analytics tells you what to do based on the situation. It involves a multi-step process in which a recommendation is based on the prediction of a model. In addition to the recommendation, the actual outcome is recorded so that the effectiveness of the model can be determined.
Example – Health Insurer
One scenario of prescriptive analytics is a situation in which a health insurer finds that sedentary individuals have a higher healthcare utilization rate. A logistic regression model can predict the likelihood of which individuals are expected to be inactive. To lower the insurance utilization rate, health insurers can provide those individuals with incentives to become more active (and more healthy). The healthcare provider will also track the outcome of those individuals to test the effectiveness of the recommendations.
Example – Sales Analysis
Another scenario could be a meal delivery company noticing that their monthly subscriptions are trending downward. Through analysis, they find that the majority of the subscribers that are leaving have switched to plant-based diets. Their data scientists develop a model to predict the subscribers that are likely to adopt a plant-based diet. They target market to those individuals with plant-based diet promotions and record the outcome to determine whether they were able to retain those subscribers.
The data source for prescriptive analytics will typically be the enterprise data warehouse which supplies clean and well-vetted data. The tool used for prescriptive analytics is usually custom software developed using predictive analytics tools along with programming languages such as Java, Java Script, Python, and/or C#.
Data analytics is a relatively new and vast discipline, and it may be overwhelming to those that are new to it. Many that have an interest don’t know where to start. A book that does a great job of giving an extensive tour of data analytics as it relates to business is Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking by Tom Fawcett. This book explains the basic concepts of data analytics and effectively describes how advanced analytics apply to the enterprise.
In addition to learning the concepts of data analytics, you will also need to learn the tools that will help you be effective. R & Python are popular programming languages used, and many data analytics professionals are versed in one or both. A recommended resource for those that want to learn R is the book R for Data Science by Hadley Wickham. It is written by one of the most prolific developers of R packages. I have been programming in R for close to 4 years, and I have not seen a better introduction to R than this one. The Python counterpart to R for Data Science is Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney. Python has surpassed R in popularity, so this may be the preferred first language for many. This book is a good option for those that want to go the Python route.
Ryan wade is a data analytic professional with close to 20 years of experience. You can view his full LinkedIn profile here.