With a sample data set in excel (0nly 2000 records). we have to explore and derive some meaning out of the data set. Please quote a price and I will provide the data set ( It is very simple to understand)
1A. Initial data exploration 1. Identify the attributes, their type (nominal, ordinal, interval or ratio). 2. Identify the values or value ranges of the attributes, frequency of values, distributions, medians, means, and variance where applicable. 3. Using Miner3D explore the data set and identify any outliers, clusters of similar instances, "interesting" attributes and specific values of those attributes. In the presentation of the results include clear snapshots which illustrate your findings and explain what exactly the snapshot supports. 1B. Data preprocessing Perfrom each of the following data preparation tasks (each task applies to the original data): a. Use the following binning techniques to smooth the values of the Age and Education Years attributes: - equi-width binning - equi-depth binning - distance-based binning - smoothing by bin means (you need to decide on a bin depth, explain your preferences). In each of this techniques you need to illustrate your steps. The results should be placed in separate columns in your Excel workbook file. b. Use the following techniques to normalise the attribute Age: - min-max normalization to transform the values onto the range [0.0-1.0]. - z-score normalization to transfrom the values. c. Discretise the Age attribute into the following categories: Teenager = 1-20; Young = 21-30; Mid_Age = 31-45; Mature = 46-65; Old = 66+. Provide the frequency of each category in the data set. d. Convert all variables into binary ones [with values "0" or "1"] [Note: you may need to discretise some of them initially]. Note: Each of the results of parts (a) through (d) should be presented in a separate spreadsheet (and respectively table in the assignment paper). 1C. Data exploration using association rules (goal-driven exploration) Couple of weeks have passed and your CEO would like to get some preliminary answers to the following questions: Question 1C-a: Is there any association between the age, gender of the person, the education level and the salary the person gets? Question 1C-b: Is there any association between the sex, race, the educational level and the marital status? Answers to these questions will help with further focussing of the investigation. You are the data miner who has the magic tools and the "know-how" to find the answers. Don't forget that each answer needs to be supported by evidence and explanation of the results. You can use different association rule miners (Weka, WhizWhy, Magnus Opus) and Weka Explorer when looking for some insights in the data. Assignment paper In the assignment paper include a section for each of the tasks in this assignment. At the end of the paper include a summary section in which you describe what you have understood about the data set in a way that can assist you and your boss to decide what you should next. This may include specific characteristics (or values) of some attributes, some clusters identified visually that you propose to examine, associations found that should be investigated more rigorously, etc. As the CEO will be reading the summary definitely, you need to be confident, but don't forget to base your confidence on facts!!! Assessment This assignment is assessed as an individual work . The assessment criteria include: Correctness of the preprocessing results and explanation of the steps; Feasibility of the results and insights of preliminary data exploration; How comprehensive are the explanations of these results - depth and clarity of the analysis, appropriateness of illustrations; Quality of the summary section.