UNIVERSITY EXAMINATIONS: 2017/2018
EXAMINATION FOR THE DEGREES OF MASTERS OF SCIENCE IN
DATA ANALYTICS/ INFORMATION SYSTEMS MANAGEMENT
MISM5302 MDA5304 DATA MINING AND DATA WAREHOUSING
ORDINARY EXAMINATIONS
DATE: APRIL, 2018 TIME: 2 HOURS
INSTRUCTIONS: Answer Question One & ANY OTHER TWO questions.
QUESTION ONE
(a) Describe four pre-processing tasks in the context of knowledge discovery process (4 Marks)
(b) Briefly explain the meaning of the term ‘Entropy’ (2 Marks)
(c) State and explain four techniques for filling missing values after collecting raw data
(4 Marks)
(d). Consider the following confusion matrix:
a b c <– classified as
9 6 5 | a = Msc in data analytics
1 3 4 | b = Msc in information systems
2 8 7| c = PhD in information systems
Use the above confusion matrix to determine the following: (3 Marks)
(i) Precision for Msc in information systems
(ii) Recall for Msc in data analytics
(iii) True negatives for PhD in information systems
(e) Describe four types of data mining tasks (4 Marks)
(f) State and explain three functions of measuring similarity in the context of clustering. Use
an example to illustrate application of each type (3 Marks)
QUESTION TWO
(a) Describe any three situations when decision tree learning methods can be considered
(3 Marks)
(b) Describe the criteria of stopping building decision tree (2 Marks)
(c) Describe the meaning of the term ‘overfiting’ as used in decision tree learning (1 mark)
(d) Briefly explain two symptoms of overfiting and two approaches of how it can be avoided
(4 Marks)
(e) Consider the following data set
Compute information gain for selecting humidity attribute as the root of decision tree during
decision tree learning using ID3 algorithm (5 Marks)
QUESTION THREE
(a) Describe any two potential applications of association mining in information systems and
management (2 Marks)
(b) Describe two subjective measures and two objective measures of association rules
‘ in the context of association mining. (4 Marks)
(c) Describe the following two association mining approaches
(i) Brute force approach (1 Mark)
(ii) Two step approach (1 Mark)
(d) Consider the following data
i. Given the above training data calculate support and confidence of the following association
rule ( 2 Marks)
{pasta,orange} {lemon}
ii. Use apriori algorithm to find frequent 3-Itemset where minimum support count =2
(3 Marks)
iii. Given that minimum confidence =1, generate strong association rules using apriori
algorithm (2 Marks)
QUESTION FOUR
(a) Describe the meaning of the following terms in the context of warehousing
(i) Fact (1 Mark)
(ii) Dimension (1 Mark)
(iii) Schema (1 Mark)
(b) There three types of schemas that can be used design and develop a datawarehouse
(3 Marks)
(c) State and explain four properties of a data warehouse (4 Marks)
(d) Describe any two characteristics of a data mart (2 Marks)
(e) Describe the meaning of initials ETL in the context of data warehousing (3 Marks)