UNIVERSITY EXAMINATIONS: 2019/2020
EXAMINATION FOR THE DEGREE OF MASTER OF SCIENCE IN DATA
ANALYICS/MASTER OF SCIENCE IN INFORMATION SYSTEMS
MDA 5304/MISM 5402: DATA MINING AND WAREHOUSING
DATE: MAY 2020 TIME: 14 DAYS
INSTRUCTIONS: Answer ALL Questions
Take-Home Examination (THE)
Every organization has a great deal of data and more data is being collected every day. In addition
to the already large data-sets that exist today, many organizations are looking for ways to construct
a classification model that can be used classify future objects and develop a better understanding of
the classes of the objects in the data base. The aim of classification techniques is to generate more
accurate classification results. However, existing classification techniques gives poor classification
accuracy when compared to others. Therefore, there is need for comparing different classifiers and
using the most accurate classifier.
1. Download data about Kenya from open online sources and use python libraries to carry out data
preprocessing (5 Marks)
2. Use appropriate data mining tools such as Python libraries to apply any two different
classification techniques and develop classification models (5 Marks)
3. Use python libraries to evaluate the accuracy of the two developed models to determine which
model is more accurate given the same data. (5 Marks)
4. Write a research paper to communicate which among the studied techniques provides more
accurate results. The paper should be structured as follows:
i. Front Pages (3 Marks)
– Title page, Glossary, acronyms, Abstract and table of content
ii. Introduction (5 Marks)
– Background of classification as a data mining task
– Problem statement related to accuracy of classification
-Significance of the study
iii. Literature Review (5 Marks)
– Review of at least three studies that have used classification techniques to perform
machine learning tasks for decision making support.
– Limitations on each of the reviewed studies.
iv. Methodology (5 Marks)
– Description of the process used to develop classification models. Draw a well labelled
diagram to illustrate the process and cite the source of the process
– screen shots of python code used to implement various steps of the process and
v. Results (10 Marks)
– Screen shots of the two developed and visualized models.
– Description of evaluation results obtained after testing both models with test data.
vi. Conclusion and Recommendations (5 Marks)
References (2 Marks)