Data mining is an extraction of interesting (potentially useful) or knowledge from the massive amount of data.
The wide availability of vast amounts of data and the imminent need for turning such data into useful information and knowledge.
Data mining is defined as a process used to extract usable data from a larger set of any raw data which implies analysing data patterns in large batches of data using one or more software.
Real life Examples in Data Mining
Following are the various real-life examples of data mining,
1. Shopping Market Analysis
There is a huge amount of data in the shopping market, and the user needs to manage large data using different patterns. Market basket analysis is a modelling technique is used to do the analysis. Market basket analysis is a modelling technique based on a theory that if you buy a group of items, you are more likely to buy another group of things. This technique may allow the retailer to understand the purchase behaviour of a buyer. Using differential analysis comparison of results between different stores, between customers in different demographic groups can be done.
2. Stock Market Analysis
There is a vast amount of data to be analysed in the stock market. So, data mining technique is used to model those data to do the analysis.
3. Weather forecasting analysis
Weather forecasting system uses an enormous amount of historical data for prediction. As there is a processing of enormous amount data, one must have to use the suitable data mining technique.
4. Fraud Detection
Due to the size of the data, traditional methods of fraud detection are time-consuming and complicated. Data mining helps in providing meaningful patterns and turning data into information.
5. Intrusion Detection
Data mining can help to improve intrusion detection by adding a level of focus to anomaly detection. It supports an analyst to distinguish activity from common everyday network activity.
6. Financial Banking
A tremendous amount of data is supposed to be generated with new transactions in computerised banking. Data mining can donate to solving business problems in banking and finance by finding patterns, causalities, and correlations in business information.
Video surveillance is used in a day to day life almost at every place for security perception. Data mining is used in video surveillance as we need to deal with a large amount of collected data.
8. Online Shopping
In online shopping, E-commerce companies like Amazon, Flipkart, Snapdeal, Myntra, and many more uses Data Mining and Business Intelligence to offer cross-sells and up-sells through their websites, who use sophisticated mining techniques to drive there, ‘People who viewed that product, also liked this’ functionality. Data mining is used to identify customers loyalty by analysing the data of customer’s purchasing activities such as the data of frequency of purchase in a period, a total monetary value of all investments and when was the last purchase.
9. Criminal Investigation
Criminal Investigation is a process that intentions to identify crime characteristics. Crime analysis includes discovering and detecting crimes and their relationships with criminals. The large volume of crime datasets and the complexity of relationships between them have made criminology a suitable field for applying data mining techniques.
Data Mining approaches are well suited for Bioinformatics, as it contains a massive amount of data. The mining of biological data aids to extract useful knowledge from massive datasets gathered in biology, and other related life sciences areas such as medicine and neuroscience.
11. Health Care and Insurance
The growth of the insurance industry entirely depends on the ability to convert data into the knowledge, information or intelligence about customers, competitors, and its markets. Data mining is applied in insurance industry lately but brought tremendous competitive advantages to the companies who have implemented it successfully.
Data Mining Techniques
Data Mining techniques are as follows,
1. Classification Analysis Technique
- Classification technique is used for assigning the items into target categories or classes which is used to predict what will occur within the class accurately.
- It classifies each item in a set of data into one of a predefined set of classes or groups.
- We use it to classify different data in different classes.
- As this process is like clustering. It relates a way that segments data records into different segments called classes.
- An example is an Outlook email. They use specific algorithms to characterise an email as authenticating or spam.
Figure: A classification model can be represented in various forms, such as (a) IF-THEN rules, (b) a decision tree, or a (c) neural network
- Association Rule Learning Technique
- It is also known as relation technique.
- A pattern is recognised based upon the relationship of items in a single transaction.
- The association technique is used in market basket analysis to identify a set of products that customers frequently purchase together.
- Retailers used the association technique to research customer’s buying habits. Based on historical sale data, retailers might find out that customers always buy crisps when they buy beers, and, therefore, they can put beers and crisps next to each other to save time for the customer and increase sales.
3. Anomaly or Outlier Detection Technique
- Outliers is defined as the data objects that do not comply with the general behaviour or model of the data available.
- It refers mainly to an observation of data items in a dataset for the data sets that do not match an expected pattern.
- Anomalies are also known as outliers, novelties, noise, deviations, and exceptions as this anomaly provide critical and actionable information.
4. Clustering Analysis Technique
- Cluster analysis is one of the techniques of data mining by which related records are grouped. As a result, objects are like one another within the same group. Although, they are different in same or other clusters.
- The objects are clustered based on the principle of maximising the intraclass similarity and minimising the interclass similarity.
- In clustering, the class labels are not present in the training because they are not known to begin with which is called unsupervised learning.
5. Regression Analysis Technique
- This technique is used for establishing the dependency between the two variables so that causal relationship can be used to predict the outcome.
- In statistical ways, we use to identify and analyse the relationship between variables.
- It helps you to know the characteristic value of the dependent variable.
- Generally, used for prediction and forecasting.
6. Prediction Technique
- Prediction is made by finding the relationship between independent and dependent variables.
- Suppose the deal is an independent variable and profit could be a dependent variable. Then we can draw a fitted regression curve that is used for profit prediction.
7. Sequential Patterns Technique
- This is an important part of data mining techniques.
- This technique will identify regular occurrences of similar events.
- This technique is used to understand user buying behaviours. With the help of historical data.
- This technique is used in shopping basket application.
- In online shopping sales, with the use of historical transaction data, businesses can identify a set of items that customers buy together different times in a year. Then companies can use this information to recommend customers buy it with better deals based on their purchasing frequency in the past.
8. Decision Trees Technique
- decision tree is one of the analytical technique of Data Mining.
- This technique is effortless to understand the users.
- This technique is used for categorising or predict data.
- In this technique, the root of a decision tree is a simple question. As they have multiple answers.
- Above figure shows an example where you can classify an incoming error condition.