Wednesday, 28 March 2018

MIS772 | Data Exploration and Classification in Rapidminer | Predictive Analytics


images11Executive summary (one page)

This report is unique and is the result of individual effort by the author listed above.Any part of this report that bears resemblance to another students’ report will be treated as plagiarism.
Ensure that all contents throughout needs to be readable and the font should be no smaller than Arial 10 points.
In the report include here only those results that are most significant for your analysis and recommendations.

Avoid indiscriminate “dumping” of tables, charts or code into this report – all content must have some purpose.
Each chart, table or code snippet has to be described or used in the discussion.
Make sure that all charts, tables and important results in the following pages are labelled for cross-referencing, e.g. “Figure 1 – Histogram of Overall Rating” or “Table 4 – Comparison of model performance”. Then refer to them as “… (see Figure 1)” or “As shown in Table 4…”.

Business Problem

Aim 1: Succinctly state a business problem (or question) and specify requirements for its solution in terms of insights to be generated.

Solution to Business Problem

Aim 2: Succinctly describe the results (answer or solution) and justify. Provide references to the supporting evidence, e.g. charts and plots.

Extension (above 80%)

Clearly identify what kind of decisions are to be supported by the analytic solution and what types of actions can be recommended by the system.
Do not attempt this extension unless the main objective has been achieved.
If not attempting this section then delete it.
Before entering your report text, delete all such instructions and clarifications as they unnecessarily take space.

Data exploration and preparation in RapidMiner (one page)

Include here the text of your analysis with tables and plots, and if needed small parts of RM process. If analysis or results could only be determined by inspecting the process and running it, the marks will be reduced. All comments, such as this, which are not part of your submission can be deleted to save space.

Expectation

Understand what data is needed to solve the problem; select at least 5 attributes to be used as candidate predictors; explore and understand their characteristics, e.g. using scatter plots or lines charts, histograms or density curves, etc. Deal with missing values. Annotate the included visuals (e.g. with text and little arrows). Clearly report your insights.

Extension (above 80%)

Identify more than 5 candidate predictors, including some categorical attributes (e.g. you may need to use dummy variables for some models). Be selective in your data visualisation, especially when you report a lot of different insights, in which case you may wish to tabulate some of the results.
Do not attempt the extension unless the main objective has been achieved.
If not attempting this section then delete it.

Discovering Relationships and Data Transformation in RapidMiner (one page)

Include here the text of your analysis with tables and charts, and RM process. 
If analysis or results could only be determined by inspecting the process or running it, the marks will be reduced. All comments, such as this, which are not part of your submission can be deleted to save space.

Expectation

Explore, visualise and understand relationships (such as correlation) between candidate attributes; recommend and justify the selection of the most appropriate label attribute and a subset of predictors to build an analytic solution in terms of relationships between them. Annotate the included visuals (e.g. with text and little arrows). Clearly report your insights.

Extension (above 80%)

Use an appropriate technique to determine relationships between predictors and the label attribute, thus assessing their worth for the modelling task (e.g. use weights from the correlation analysis or investigate other methods of doing so). Transform the selected attributes as appropriate, but only if needed. It is likely that some attributes will be eliminated in the process of your analysis.
Do not attempt this extension unless the main objective has been achieved.
If not attempting this section then delete it.

Create a Model(s) in RapidMiner (one page limit)

Include here the text of your analysis with tables and charts, and RM process. 
If analysis or results could only be determined by inspecting the process or running it, the marks will be reduced. All comments, such as this, which are not part of your submission can be deleted to save space.

                                                                                                                                        

Expectation

Build one classification model, e.g. k-NN or Decision Tree. Briefly report steps taken in the model construction and justify the selection of model parameters.

Extension (above 80%)

Create at least two models, e.g. k-NN and Decision Tree.
Do not attempt this extension unless the main objective has been achieved.
If not attempting this section then delete it.

Evaluate and Improve the Model(s) in RapidMiner (one page)

Include here the text of your analysis with tables and charts, and RM process. 
If analysis or results could only be determined by inspecting the process or running it, the marks will be reduced. All comments, such as this, which are not part of your submission can be deleted to save space.

Expectation

Validate and test the model for its ability to predict the values of the labelled attribute; evaluate the model performance, e.g. in terms of correlation of expected and obtained results, accuracy or kappa. Interpret and report results. Justify why you can trust the model performance.

Extension (above 80%)

Use cross-validation. Validate, test and compare the performance of all developed models. Show in what situation and what models perform best and when they fail, refer to the specificity and sensitivity measures of your model performance, include and interpret an ROC chart.
Do not attempt this extension unless the main objective has been achieved.
If not attempting this section then delete it.

Provide an Integrated Solution in RapidMiner (one page)

Include here the text of your analysis with tables and charts, and RM process. 
If analysis or results could only be determined by inspecting the process or running it, the marks will be reduced. All comments, such as this, which are not part of your submission can be deleted to save space.

Expectation

Create a separate process which illustrates how your developed, validated and tested model(s) will be deployed, i.e. applied to the newly acquired data. Justify the selection of model parameters. Create a small data set consisting of new examples and apply the model to this data set. Explain the results. In point form, describe the process of using your model, i.e. acquisition, exploring and transforming (if needed) of new data, predicting and presenting results.

Extension (above 80%)

Provide recommendations.
Do not attempt this extension unless the main objective has been achieved.
If not attempting this section then delete it.

Further Research and Extensions in RM (one page limit)

Include here the text of your analysis with tables and charts, and RM process. 
If analysis or results could only be determined by inspecting the process or running it, the marks will be reduced. All comments, such as this, which are not part of your submission can be deleted to save space.

Expectation (All this is above 80% overall)

When working in analytics you always need to enhance your skills by self-study.
So extend your work with RM features beyond what was covered in class, to improve the model and to present its results in the best way. For example find new and useful data visualisation or apply new predictive models for additional insights and better model performance. Always justify the selection of your approach and compare with the base expectation.
Alternatively perform some additional data analysis by using R or Python extensions or some other analytics/data mining tools (however ensure that all the expected work is done using RapidMiner, so all above non-research sections need to be done with RapidMiner), if you decided to use other tools for this section only, your submission should also include their project data (e.g. Python or R scripts). Please, do not use Excel, which is not a data mining tool L
You may wish to report new and surprising insights.
Alternatively, conduct independent research in the area related to the analysed data set to determine if your predictions are able to confirm or extend previously published results.
Do not attempt this section unless the main objective has been achieved.
If not attempting this section then delete it

No comments:

Post a Comment

Recent Questions

Learn 11 Unique and Creative Writing Examples | AssignmentHelp4Me

Learn 11 Unique and Creative Writing Examples | AssignmentHelp4Me elp4Meelp4Me