UNIVERSITY EXAMINATIONS: 2016/2017
ORDINARY EXAMINATION FOR THE DEGREES OF BACHELOR OF
SCIENCE IN INFORMATION TECHNOLOGY/ BACHELOR OF
BUSINESS IN INFORMATION TECHNOLOGY
BIT3201A & BBIT300 DATA WAREHOUSING AND DATA MINING
DATE: AUGUST, 2017 TIME: 2 HOURS
INSTRUCTIONS: Answer Question One & ANY OTHER TWO questions.
QUESTION ONE: 30 MARKS (COMPULSORY)
a) Define the following terms (3 Marks)
i. Data Normalization
ii. Data Binning
b) Discuss five differences between OLTP and OLAP (6 Marks)
c) Giving examples, discuss four reasons why data cleaning stage in the KDD process is
necessary (4 Marks)
d) A grocery shop sells six items which are Bread, Cheese, Eggs, Juice, Milk and Yogurt. The
shopkeeper also keeps a record of the transactions as follows.
Using the improved apriori algorithm find the association rules with 50% and 75% confidence.
e) Differentiate between classification and clustering. (4 Marks)
f) In building a decision tree, three possible attributes are considered as split attributes, the
information gain for the attributes A, B, and C are 0.97, 0.029, and 0.15 respectively. Which
attribute should be selected for the split and why? (3 Marks)
QUESTION TWO: 20 MARKS
a) Discuss the causes of noisy data in the database (5 Marks)
b) Before data warehousing it is apparent that data preprocessing must be carried out. Describe
the five major tasks that constitute data pre-processing (5 Marks)
c) Discuss any five desired features of cluster analysis algorithm. ` (5 Marks)
d) Discuss five ways in which the data that has been mined can be visually presented.
QUESTION THREE: 20 MARKS
a) Define the term data ware housing (1 Mark)
b) With the help of a diagram describe the architecture of a data warehouse (4 Marks)
c) Discuss four benefits of data mining (4 Marks)
d) Discuss any four challenges facing data mining. (4 Marks)
e) Justifying your answer, discuss briefly whether the following activities constitute data
mining. (7 Marks)
i. Dividing the customers of a company according to their profitability.
ii. Computing the total sales of a company.
iii. Sorting a student database based on student identification numbers.
iv. Predicting the outcomes of tossing a (fair) pair of dice.
v. Predicting the future stock price of a company using historical records. Monitoring
the heart rate of a patient for abnormalities.
vi. Monitoring seismic waves for earthquake activities.
vii. Extracting the frequencies of a sound wave.
QUESTION FOUR: 20 MARKS
a) Discuss five characteristics of OLAP (5 Marks)
a) Discuss any four factors that lead to the growth and popularity of data mining.
b) Describe the various classification of data mining systems (6 Marks)
d) Discuss any five factors that you would consider when selection and acquiring a data mining
software. (5 Marks)
QUESTION FIVE: 20 MARKS
a) Using two items A and B, define the following terms and give their equations.
b) In the context of association rules mining; describe the following terms (3 Marks)
i. Frequent item-sets
ii. Confident rules
c) With the help of a diagram illustrate the Knowledge Discovery Process (8 Marks)
d) Partition-based clustering, hierarchical Clustering and density-based clustering are three
popular clustering methods. Describe them and briefly explaining how each works citing a
well known algorithm for each. (6 Marks)