BIT 4204 DATA WAREHOUSING AND DATA MINING KCA Past Paper

UNIVERSITY EXAMINATIONS: 2012/2013
THIRD YEAR EXAMINATION FOR THE BACHELOR OF
SCIENCE IN INFORMATION TECHNOLOGY
BIT 4204 DATA WAREHOUSING AND DATA MINING
DATE: DECEMBER, 2012 TIME: 2 HOURS
INSTRUCTIONS: Answer Question ONE and any other TWO

QUESTION ONE
a) Define the following terms (4 Marks)
i. Data normalization
ii. Data binning
b) With the help of a diagram illustrate the Knowledge Discovery Process (8 Marks)
c) Giving examples, discuss the reasons why data cleaning stage in the KDD process is
necessary 3 Marks
d) Discuss any five desired features of cluster analysis algorithm. 5 Marks
e) A grocery shop sells six items which are Bread, Cheese, Eggs, Juice, Milk and
Yogurt. The shopkeeper also keeps a record of the transactions as follows.

Using the improved naïve algorithm find the association rules with 50% and 75%
confidence. (10 Marks)
QUESTION TWO
a) Discuss four issues that are as a result of data mining and explain how to overcome
them. (4 Marks)
b) Discuss five characteristics of OLAP (5 Marks)
c) Discuss six differences between OLAP and OLTP systems (6 Marks)
d) Discuss any five factors that you would consider when selection and acquiring a data
mining software. (5 Marks)
QUESTION THREE
a) Define the following terms (2 Marks)
i. Data warehousing
ii. Data miningb) Using two items X and Y, define the following terms. (2 Marks)
i. Support
ii. Confidence
c) In the context of association rules mining, describe the following terms (2 Marks)
i. Frequent item-sets
ii. Confident rules
d) Consider a retail shop with the following set of transactions

Using the improved Apriori algorithm find the association rules with minimum
support of 22% and 70% confidence. (14 Marks)
QUESTION FOUR
a) Discuss four types of distances in clustering (4 Marks)
b) Using appropriate examples, describe the following types of data (4 Marks)
i. Ordinal data
ii. Nominal data
c) Discuss four ways in which the data that has been mined can be visually presented to
the user. (4 Marks)
d) Describe four categories of data mining systems showing the basis for the
categorization for each. (4 Marks)
e) Discuss four applications of data mining in real life (4 Marks)
QUESTION FIVE
The table below contains the training data is used to classify animals. Read it and answer
the questions that follow.

a) Using the split algorithm, find the attribute that has the highest information gain.
(16 Marks)
b) Draw the decision tree for the table above (4 Marks)

(Visited 115 times, 1 visits today)
Share this:

Written by