UNIVERSITY EXAMINATIONS: 2017/2018
EXAMINATION FOR THE DEGREE OF MASTER OF SCIENCE IN
INFORMATION SYSTEMS MANAGEMENT
MISM 5203: DATA MINING AND WAREHOUSING
DATE: NOVEMBER, 2017 TIME: 2 HOURS
INSTRUCTIONS: Answer Question One & ANY OTHER TWO questions.
QUESTION ONE
(a) State and explain three characteristics of operational data store (3 Marks)
(b) Describe four data mining tasks. Use one example to illustrate each task (4 Marks)
(c) Calculate hamming distance between the following two animals
d) State and explain four techniques for filling missing values after collecting raw data
(4 Marks)
(f). Consider the following confusion matrix:
a b c <– classified as
9 6 5 | a = Bsc IT
1 3 4 | b = BCOM
2 8 7| c = Diploma in IT
Use the above confusion matrix to determine the following: (3 Marks)
(i) Precision for BCOM category
(iii) Recall for Bsc IT category
(iv) True negatives for Diploma in IT ta communications
(g) Consider the following data set
Compute information gain for selecting humidity attribute as the root of decision tree
during decision tree learning using ID3 algorithm (4 Marks)
QUESTION TWO
(a) Describe the meaning of the term ‘association rule’ in the context of association rule
mining. Discuss one example to illustrate your answer. (2
Marks)
(b) Describe the following two association mining
(i) Brute force (2 Marks)
(ii) Two step Approach (2 Marks)
(c) State and explain two subjective measure of evaluating association rules. (2 Marks)
(d). Consider the following data
Given the above training data calculate support and confidence of the following association
rule ( 2 Marks)
{C,B} {A}
ii. Use apriori algorithm to find frequent 3-Itemset where minimum support count =2
(3 Marks)
iii. Given that minimum confidence =1, generate strong association rules using apriori
algorithm (2 Marks)
QUESTION THREE
(a) Distinguish between ‘data mart ’ and ‘data warehouse’ (1Mark)
(b) Discuss the interplay between operation data store, data ware housing and data mining
mart. Use a diagram to illustrate your answer (4
Marks)
(c) State and explain four characteristics of a data warehouse (4 Marks)
(d) Discuss three types of Data warehousing architectures (3 Marks)
(e) Describe the meaning of initials ETL in the context of data warehousing (3 Marks)
QUESTION FOUR
(a) Briefly explain the meaning of the following data warehousing terms:
(i) Schema (1 Mark)
(ii)OLAP (1 Mark)
(iii) fact (1 Mark)
(iv) Dimension (1 Mark)
(b) State and explain three types of schemas that are used to model data in datawarehouse.
Draw a diagram to illustrate each schema (6 Marks)
(c) Describe the meaning of the following types of clustering approaches as used in data
mining and warehousing. Give one example for each case
(i) Partitioning approach (1Mark)
(ii) Hierarchical clustering approach (1 Mark)
(iii) Model based clustering approach (1 Mark)
(d) Discuss any two potential applications of cluster mining in business enterprises (2 Marks)