All Data Mining processes are much more effective if they are done in a planned and systematic way. This is where the importance of the CRISP-DM (Cross Industry Standard Process for Data Mining) Methodology comes in. Analytical methodology aimed at the development and implementation of solutions with the objective of achieving success in analytical and predictive projects.
This methodology is generic and can be applied to different sectors of activity. In this sense, PSE uses the CRISP-DM methodology to support projects, to transform data into knowledge and provide the best services to customers.
The CRISP-DM methodology includes six phases of development that follow a cyclical process. What are the crucial steps in a Data Mining project?
-
Business Understanding
The first step is business knowledge. To make a project it is necessary to know the business problem to be solved, define the project objective and the company’s needs. It is an extremely important topic and must be worked on together with the client.
-
Data Understanding
At this stage, the available data are gathered and what will be needed to meet the project objective is defined. This includes collecting, describing, exploring and verifying the quality of the data. It is very important to verify the nature and quality of the data source and obtain the necessary data to fulfill the objectives defined in the first step.
-
Data Preparation
As the name implies, now is the data preparation stage. Some requirements must be defined, such as how the data will be organized, where data from different sources are crossed, among others. In short, it includes selecting, cleaning, building, integrating and formatting the data.
-
Modeling
At this stage, the analysis methods that will respond to the objective defined in the first stage are selected and used. With these methods, the model is built through modeling techniques that allow extracting information from the available data and, in turn, respond to the project objective.
-
Evaluation
Having already chosen and applied the model, we reach a crucial stage – the evaluation. We have to test our model to see if the results meet the objective of our project. In this phase, we evaluate results, review the data mining process and determine the next steps.
-
Deployment
And we finally have the answer to our business problem. We fulfilled our objective. Now we have to integrate the acquired knowledge with the company’s business, in order to solve the initial problem. From the final report it is possible to make changes in the company, based on knowledge.
These are the steps of the CRISP-DM methodology, however there may be some key points for the development of this process. The process can demonstrate a linear trend and flow in order of the phases described above, or it can have a non-linear trend and go backwards in the process phases.
For example, through decisions and information collected in the modeling phase, the analyst may have to rethink the data preparation process, which can present new problems in the modeling phase. And consequently, in the remaining phases.
Likewise, the assessment phase can lead the analyst to reassess the business understanding phase and, as such, wonder if he is trying to answer the wrong question. At this point you can review the business understanding phase and proceed with the rest of the process with a better goal in mind.
The knowledge obtained from a Data Mining cycle can generate new questions, new problems and new opportunities, allowing to identify and satisfy needs.
PSE uses this methodology in projects with the help of IBM SPSS software (Statistics and Modeler). Get to know IBM SPSS software and learn how they can help your business.