Data mining and Warehousing Examination  KCA Past Paper

KCA University
Data mining and Warehousing Examination
Dec 2018
Answer Question one and any other two questions

Question One
(a) Describe the meaning of the following terms in the context of data mining and warehousing.
(i) Noisy data (1 mark)
(ii) Data mining (1 mark)
(iv) Entropy (1 mark)

(b) Briefly describe knowledge discovery process. Use a diagram to illustrate your answer
(4 marks)
(e) State and explain two techniques of data reduction (2marks)
(f). Consider the following confusion matrix:
a b c <– classified as
7 6 9 | a = part time
1 8 4 | b = full time
5 3 7| c = Distance learning
Use the above confusion matrix to determine the following: (3 marks)
(i) Precision for full time class
(ii) Recall for part time class
(iii) True negatives for Distance learning class
(e) Describe four types of data mining tasks (4 marks)
(f) Describe two motivations of data mining (2 marks)
(g) Describe two techniques of filling missing values during pre-processing phase (2 marks)
Question Two
(a) Briefly explain the meaning of the following terms in the context of data mining and
warehousing
(i) Clustering (1 mark)
(ii) Dendrogram (1 mark)
(iii) category utility (1 mark)
(b) State and explain three types of clustering approaches (3 marks)
(c) Briefly explain three metrics (functions) of measuring similarity of data items during
clustering (3 marks).
(d). Explain any two applications of clustering in business enterprises (2 marks)
(e) Describe four operations that are used by COBWEB algorithm when building the
classification tree (4 marks)
Question Three
(a) Describe any three situations when decision tree learning methods can be considered
(3 marks)
(b) Consider the following data set
Compute information gain for selecting outlook attribute as the root of decision tree
using ID3 algorithm (5 marks)
(c) Briefly explain two symptoms of overfiting and two approaches of how it can be avoided
(4 marks)
(b) Describe the criteria of stopping building decision tree (2 marks)
(c) Describe the meaning of the term ‘overfiting’ as used in decision tree learning (1 mark)
Question Four
(a) Describe the meaning of the following terms in the context of warehousing
(i) Dimension (1 mark)
(ii) Schema (1 mark)
(iii) Fact (1 mark)
(b) There three types of schemas that can be used design and develop a datawarehouse
(3 marks)
(c) Describe any two properties of a data mart (2 marks)
(d) Describe the meaning of initials ETL in the context of data warehousing (3 marks)
(e) State and explain four characteristics of a data warehouse (4 marks)

(Visited 88 times, 1 visits today)
Share this:

Written by