UNIVERSITY EXAMINATIONS: 2017/2018
EXAMINATION FOR DEGREE OF BACHELOR SCIENCE/ BUSINESS
INFORMATION TECHNOLOGY
BBIT 300&BIT3201 – DATA WAREHOUSING AND DATA
MINING/MANAGEMENT
MODE: FULL TIME/PART TIME/DISTANCE LEARNING
ORDINARY EXAMINATIONS
DATE: JULY.2018 DURATION: 2 HOURS
INSTRUCTIONS: Answer Question ONE and any other TWO questions
QUESTION ONE [30 MARKS]
a) Define clustering as used in data mining.
2 Marks
b) Define the following terms as used in data warehousing and data mining
i. Data
ii. Data mart
iii. Data dredging
iv. Data scrubbing
4 Marks
c) Discuss the following distance measures in cluster analysis.
i. Euclidean distance
ii. Manhattan distance
iii. Chebychev distance
6 Marks
d) Using two examples in each case, distinguish between predictive and descriptive data mining
techniques.
4 Marks
e) For each of the following tasks, state whether or not it is a data mining task. Give a
reason in each case.
i. Dividing the customers of a company according to their profitability.
ii. Monitoring the heart rate of a patient for abnormalities.
iii. Computing the total sales of a company.
iv. Sorting a student database based on student registration numbers.
v. Predicting the outcomes of tossing a fair coin
vi. Predicting the future stock price of a company using historical records.
12 Marks
f) Differentiate between a data warehouse and a database
2 Marks
QUESTION TWO [20 MARKS]
a) The following data was extracted from a certain retail shop in town.
TransactionID ITEMS
T1 {Bread, Milk, Margarine, Cake}
T2 {Bread, Sugar, Cake, Eggs}
T3 {Milk, Sugar, Cake, Margarine }
T4 {Bread, Milk, Sugar, Cake }
T5 {Bread, Milk, Sugar, Margarine}
i. Calculate the support for the rule {Bread,Milk}->{Cake}
2 Marks
ii. Calculate the confidence for the rule in [i] above
2 Marks
iii. Explain the use of confidence and support of a rule.
2 Marks
iv. Calculate the lift the rule in [i] above
2 Marks
b) Explain the use of Apriori algorithm in data mining.
2 Marks
c) Discuss three ways of representing the database design in an OLAP system.
6 Marks
d) Discuss any four dimensions of data quality
4 Marks
QUESTION TWO [20 MARKS]
A multidimensional database (MBD) is defined as a type of database that is optimized for data
warehouse and online analytical processing (OLAP) applications. Multidimensional databases
are frequently created using input from existing relational databases. In this respect:
a) Define the following terms:
i. Data cube
ii. Dimension
iii. Dimension table
iv. Schema
4 Marks
b) Describe any three benefits of multidimensional databases
6 Marks
c) Describe the following categories of OLAP tools.
i. ROLAP
ii. HOLAP
iii. MOLAP
6 Marks
d) Describe the following analytical operations supported by multidimensional OLAP
databases.
i. Drill-down
ii. Slicing and dicing
iii. Roll-up
6 Marks
QUESTION THREE [20 MARKS]
The main purpose of a data warehouse is to provide aggregate data like totals, average, variance,
trends e.t.c which is in a suitable format for decision making. From this point of view:
a) Define the term data warehouse
2 Marks
b) Discuss the four key characteristics of a data warehouse
8 Marks
c) Discuss the following components of a data warehouse
i. Operational databases
ii. Load manager
iii. Ware house manager
iv. Query manager
v. End user access tools
5 Marks
d) Describe the activities involved in designing and implementing a data warehouse
5 Marks
QUESTIONS FOUR [20 MARKS]
Data mining can be defined as a process of uncovering of potentially useful information in the
data warehouse. From this stand point:
a) Discuss the challenges that face data mining with regard to:
i. Data mining methodology and user interaction issues
ii. Performance issues
6 Marks
b) Describe the various classifications of data mining techniques.
4 Marks
c) Discuss the architecture of a typical data mining system.
6 Marks
d) Discuss how data mining can be applied in the following fields.
i. Medicine
ii. Marketing
iii. Crime management
6 Marks