Thursday, 8 March 2018

ITC516 | WEKA DATA MINING SOFTWARE INSTALLATION AND PROCESSING ASSESSMENT 1 | DATA MINING AND VISUALIZATION

Task

Weka Data Mining Software Installation and Processing

There are two steps in this assessment which prepares you for identification and analysis work you are expected to complete throughout this subject. The first task will build the practical and technical skills that will enable you to compare and evaluate output patterns for visualization.
For this first assignment, you are required to install the Weka software, that you will then use throughout the duration of this subject. You will also do some tasks with the Weka software for data preprocessing and visualization.

Task 1: Data Preprocessing

Load a dataset by clicking the Open file button in the top left corner of the panel. Inside the data folder, which is supplied when Weka is installed, you will find a file named
As shown in the Weka interface, the weather data has 14 instances, and 5 attributes called outlook, temperature, humidity, windy, and play. Click on the name of an attribute in the left subpanel to see information about the selected attribute on the right, such as its values and other details. This information is also shown in the form of a histogram. All attributes in this dataset are “nominal” — that is, they have a predefined finite set of values. The last attribute, play, is the “class” attribute; its value can be yes or no. Answer the following:
  • What are the values that the outlook and humidity attributes have? [ 2 marks]
  • What is the class value of instance number 3 in the weather data? [ 1 mark]
  • Load the  dataset and open it in the editor by clicking the Edit button from the row of buttons at the top of the Preprocess panel in Weka Interface and answer the following question. How many numeric and how many nominal attributes does this dataset have? [ 2 marks]

                                                                                                   

 Task 2: Visualization and Analysis
Load the dataset in Weka and answer the following questions.
  • How many instances and how many attributes does this dataset have? [ 1 mark]
  • What is the range of possible values for each of the 4 attributes that can be observed in the dataset? [ 2 marks]
  • Present a scatter plot visualization of this dataset and find which two classes have more overlapping tendency and which one is likely to be a separate class as observed in the attribute-pair based plotting. Alternatively, you may use the 3D visualization feature provided in Weka to find which two classes have more overlapping tendency and which one is likely to be a separate class using different combinations of any three featuring attributes out of four attributes in the dataset. [ 2 marks]

Rationale

This assessment builds your skills and knowledge towards being able to:
be able to identify and analyse business requirements for the identification of patterns and trends in data sets;
be able to compare and evaluate output patterns;

Marking criteria

The grade you receive for this assessment as a whole is determined by the cumulative marks gained for each question. The tasks in this assessment involve a sequence of several steps and therefore you will be marked on the correctness of your answer as well as clear and neat presentation of your diagrams, where required.

Presentation

Assignments are required to be submitted in either Word format (.doc, or .docx), Open Office format (.odf), Rich Text File format (.rtf) or .pdf format. Each task must be submitted as a single document.
All diagrams that are required should be inserted into the document in the appropriate position.

No comments:

Post a Comment

Recent Questions

Learn 11 Unique and Creative Writing Examples | AssignmentHelp4Me

Learn 11 Unique and Creative Writing Examples | AssignmentHelp4Me elp4Meelp4Me