CONSTRUCT DATA (E.G. DURATION BETWEEN “TRIAGE” AND “SEEN BY CLINICIAN”) – JUSTIFY WHY YOU NEED THIS DATA, AND DESCRIBE IN DETAIL (IN DATA DICTIONARY) HOW YOU ARE GOING TO CONSTRUCT THE DATA POINT (FORMULAS, …)
Data is created as a side-product of practice or as a result of targeted activity. While this data is used for the primary business of the organisation/practice, it can be also used to extract business intelligence. One of the frameworks, used in the industry is Cross-Industry Standard Process for data Mining (CRISP-DM). This process has several phases:
Before you start any attempt to collect/analyse data you need to get a good idea why you are doing the exercise – understand the purpose. The main components are:
• Determine business objectives
– Initial situation/problem etc. (…we have crowded emergency departments (ED)…)
• Assess situation
– Inventory of resources (personnel, data, software)
– Requirements (e.g. deadline), constraints (e.g. legal issues), risks
Understanding your business will support determining the scope of the project, the timeframe, budget etc.
Next step is to look at what data is needed (available) and write data definitions (so that we know exactly what we talking about – this is very important for aggregation of apparently same data: the definitions may not be the same!).
• Collect initial data
– Acquire data listed in project resources
– Report locations of data, methods used to acquire them, …
• Describe data
– Examine -surface- properties
– Report for example format, quantity of data, … ? Data dictionary
• Explore data
– Examine central tendencies, distributions, …
– Report insights suggesting examination of particular data subsets (data selection)
• Verify data quality
– Is the data complete? (missing values)
– Is the data correct? (integrity constraints)
– Is the data noisy or are there outliers?
NB: this is an initial exploration – scouting the problem space. It helps you to understand what data is available and it helps to align your approach to the business objectives and the data available. At the same time – this phase can help to verify, whether the project is viable (feasibility) and refine the project scope, budget, resources etc.
Typically any data you get is not in the right format for analysis (it was collected for other purposes such as providing care or managing the practice) and needs to be pre-processed.
• Select data
– Relevance to the data mining goals
– Quality of data
– Technical constraints, e.g. limits on data volume
• Clean data
– Raise data quality if possible
– Selection of clean subsets
– Insertion of defaults
• Construct data
– Derived attributes (e.g. age = NOW – DOB)
• Integrate data
– Merge data in different sources
– Merge data within source (tuple merging)
• Format data
– Data must conform to requirements of initially selected mining tools (e.g. input data is different for Weka, and different to Disco).
The post CONSTRUCT DATA (E.G. DURATION BETWEEN “TRIAGE” AND “SEEN BY CLINICIAN”) – JUSTIFY WHY YOU NEED THIS DATA, AND DESCRIBE IN DETAIL (IN DATA DICTIONARY) HOW YOU ARE GOING TO CONSTRUCT THE DATA POINT (FORMULAS, …) appeared first on My Nursing Paper.