MISM5402  MDA5304  DATA MINING AND DATA WAREHOUSING KCA Past Paper

UNIVERSITY EXAMINATIONS: 2018/2019
EXAMINATION FOR THE DEGREE OF MASTER OF SCIENCE IN
INFORMATION SYSTEMS MANAGEMENT/ MASTER OF SCIENCE IN
DATA ANALYTICS
MISM 5402/ MDA 5304: DATA MINING AND DATA WAREHOUSING
DATE: AUGUST 2019 TIME: 2 HOURS
INSTRUCTIONS: Answer Question One & ANY OTHER TWO questions.

QUESTION ONE
(a) Describe four data mining tasks. Use one example to illustrate each task (4 Marks)
(b) Calculate hamming distance between the following two animals (2 Marks)

(c) State and explain four data transformation techniques that can be carried out before data
mining (4 Marks)
(d). Consider the following confusion matrix:
a b c <– classified as
9 6 5 | a = Msc in Data analytics
1 3 4 | b = Msc in information systems management
2 8 7| c = Msc in Data communications
Use the above confusion matrix to determine the following: (3 Marks)
(i) Precision for Msc Data Analytics
(ii) Recall for Msc in information systems management category
(iii) True negatives for Data communications
(e) Describe three types of clustering approaches that are used in data mining. Give one
example for each case (3 Marks)
(f) Discuss any two potential applications of clustering in business enterprises (2 Marks)
(g) Discuss three data cleaning tasks that can carried on raw data after collection (2 Marks)
QUESTION TWO
(a) Consider the following data set


Compute information gain for selecting wind attribute as the root of decision tree
during decision tree learning using ID3 algorithm (5 Marks)
(b)Describe any two techniques for smoothening noisy data during pre-processing phase
(2 Marks)
(c) Describe any two steps of classification task as used in data mining and warehousing
(2 Marks)
(d) State and explain any two applications of classification tasks in modern organizations
(2 Marks)
(e) Describe three main phases of knowledge discovery process (3 Marks)
(f) Describe the importance of “Confusion Matrix” in the context of data mining and
warehousing. (1 Mark)

QUESTION THREE
(a) The company provides you with the following examples from the company’s hiring record.


Use the above data set to answer the following questions
(i) identify independent and dependent attributes (2 Marks)

(ii) Given that the split point =3,write sample python code to split the data set into test and
training data set, (2 Marks)
(iii) Write sample python code to split the data into independent and dependent attributes
(2 Marks)
(iv) Calculate entropy of the data set (3 Marks)
(b) Describe the term overfitting its symptoms, causes and how it can be avoided (3 Marks)
(c) Describe three functions of measuring similarity during clustering (3 Marks)
QUESTION FOUR
(a) Discuss three main approaches of accessing a data warehouse (3 Marks)
(b) Describe the meaning of the following terms in the context of warehousing
(i) Dimension (1 Mark)
(ii) Schema (1 Mark)
(iii) Fact (1 Mark)
(c) Describe three differences between a data warehouse and operational data store
(4 Marks)
(d) Describe the meaning of initials ETL as used in data warehousing (3 Marks)
(e) Briefly describe two properties of a data mart (2 Marks)

(Visited 202 times, 1 visits today)
Share this:

Written by