# MISM5203  DATA MINING AND WAREHOUSING KCA Past Paper

UNIVERSITY EXAMINATIONS: 2017/2018
EXAMINATION FOR THE DEGREE OF MASTER OF SCIENCE IN
INFORMATION SYSTEMS MANAGEMENT
MISM5203 DATA MINING AND WAREHOUSING
DATE: AUGUST, 2018 TIME: 2 HOURS
INSTRUCTIONS: Answer Question One & ANY OTHER TWO questions.

QUESTION ONE [20 MARKS]
(a) Briefly explain the meaning of the following terms. Use practical examples to illustrate
(i) Data cleaning (2 Marks)
(ii)Binning (2 Marks)
(iii) Association rule (2 Marks)
(b) Consider the following confusion matrix
a b <– classified as
5 4 | a = yes
6 9 | b = no
Use the above confusion matrix to calculate the following:
(i) False positive rate for class yes (1 Mark)
(ii) True positive rate for class yes (1 Mark)
(iii) Precision for class no (1 Mark)
(c) Describe three data structures that are used by Pandas package to store and manipulate data
in the context of python programming tools (3 Marks)
(d) Consider the following data set:

) Write a sample python programming code for merging mathematics and physics data
sets on student_ID attribute to form the data set named Marks (2 Marks)
(ii) Write python programming code that can split Marks data set into training data se and
test data .The splitting point =3 (2 Marks)
(e) Describe three differences between a data mart and a datawarehouse (3 Marks)
(f) Explain the meaning of the term “Knowledge discovery process” in the context of data
mining and warehousing (1 Mark)
QUESTION TWO [15 MARKS]

(a) Describe three methods of assigning probabilities and their potential applications in Bio
informatics (3 Marks)
(b) Consider the following contingency table of Gender and categories of students

i) Use the above matrix to develop probability matrix (2 Marks)
(ii) Calculate joint probability of female and undergraduate student (1 Mark)
(iii)Calculate union probability of being a Male and a diploma student (1 Mark)
(c).The doctor knows typhoid causes stomach pains 50% of the time. Typhoid occurs 1/50,000,
stiff necks occur 1/20. If a patient has stomach pains, what’s the probability he/she has
typhoid? (4 Marks)
(d) Consider the following data set

Given above training data, use Bayesian learning to predict the class of the following new
instance

QUESTION THREE [15 MARKS]
(a) Explain the difference between ‘input attributes ‘ and ‘output attributes’. Use an example to
(b) State and explain four common data mining tasks (4 Marks)
(c) Describe one potential application of association mining in retail industry (2 Marks)
(d) Describe the meaning of the term ‘association rule ‘ as used in data mining and
(e) Consider the following data

Use apriori algorithm to find frequent 3-Itemset where minimum support count = 2
(3 Marks)
ii. Given that minimum confidence =1, generate strong association rules using apriori
algorithm (2 Marks)
QUESTION FOUR [15 MARKS]
(a) Briefly explain the following terminology in the context of data warehousing (3 Marks)
(i) Dimension
(ii) Facts
(iii) Cube

(b) State and explain three data warehousing schemas (3 Marks)
(c) Describe three differences between operational data store and a data warehouse
(3 Marks)
(d) Discuss ETL processes in the context of data warehousing (3 Marks)
(e) Discuss three types of Data warehousing architectures

(Visited 139 times, 1 visits today)