# MISM5404  MDA5404  DATA ANALYTICS AND KNOWLEDGE ENGINEERING KCA Past Paper UNIVERSITY EXAMINATIONS: 2018/2019
EXAMINATION FOR THE DEGREE OF MASTER OF SCIENCE IN INFORMATION
SYSTEMS MANAGEMENT/ MASTERS OF SCIENCE IN DATA ANALYTICS
MISM5404 & MDA5404 DATA ANALYTICS AND KNOWLEDGE ENGINEERING
DATE: APRIL 2019 TIME: 2 HOURS
INSTRUCTIONS: Answer Question One & ANY OTHER TWO questions.

QUESTION ONE
(a) Briefly describe the meaning of the following terms as used in data analytics and
(i) Data analytics (1 Mark)
(ii) Big Data (1 Mark)
(iii) knowledge engineering (1 Mark)
(b) Briefly describe three main goals of data analytics (3 Marks)
(c) A small study is conducted involving infants to investigate the association between gestational age at birth,
measured in weeks, and birth weight, measured in kilograms. Use the above data set to compute measures for both Gestational Age (wks) and Birth Weight (kg). Interpret
results for each case
(ii) Standard deviation (2 Marks)
(iii) Covariance (2 Marks)
(iv) Correlation (2 Marks)
(d) Discuss three limitations of descriptive analytics (3 Marks)
QUESTION TWO
(a) Briefly describe the basic algorithm of agglomerative clustering (3 Marks)
(b) Describe one potential application of agglomerative clustering in business enterprises (1 Mark)
(c) State and explain five types of agglomerative clustering methods. Use a diagram to illustrate each type
(5 Marks)
(d)Suppose we have 3 shapes (A, B, C) and each Shape has two features (length and width). The following
figure shows the three data items and associated feature values. (6 Marks)
length width Given the above data set, use one of the existing agglomerative clustering methods to perform clustering
QUESTION THREE
(d) Describe Predictive Analytics Process Cycle. Use a diagram to illustrate your answer (6 Marks)
(e) State and explain any three applications of Predictive Analytics (3 Marks)
(d) Using a three-week moving average (k=3) for the department store sales to forecast for the week 24 and 26.
(2 Marks)
QUESTION THREE
(a) Briefly explain the following meaning of the following terms as used in sequence analytics.
i) Sequence (1 Mark)
ii) Sequential pattern (1 Mark)
(b) Describe two applications of sequence mining in business enterprises (2 Marks) (c) Consider the following sequence database
SID Sequence
10 <a(abc)(ac)d(cf)>
30 <(ef)(ab)(df)cb>
40 <eg(af)cbc>
Given minimum support count = 3, determine whether <(ab)c> is a sequential pattern. Justify your answer
(2 Marks)
(b) Describe the two main steps of Generalized Sequential Pattern (GSP) algorithm (2 Marks)
(c) Consider the following transaction data set (i) Convert the above transaction database into a sequence database (2 Marks)
(ii) Given that support count=2 use GSP algorithm to find 2-item sequence (3 Marks) (d) Determined whether the following subsequence are contained in the corresponding sequences (2 Marks)
Question four
(a) Discuss six parts of a knowledge based system (6 Marks)
(b) Discuss five knowledge engineering activities (5 Marks)
Sequence Subsequence Contained?
< {2,4} {3,5,6} {8} > < {2} {3,5} >
< {2,4} {2,4} {2,5} > < {2} {4} >
(c) Consider the following knowledge that describes properties of specific animals
sheep, cats, bears and whales are mammals. Bears and cats have fur while whales and fish lives in water.
Both mammals and fish are animal.
(i) Use the above scenario to write a collection of facts and rules using predicate logic that can be stored in
a knowledgebase. (2 Marks)
(ii) Write a query can be typed at the prompt of prolog to test if there is such an animal as a cat among the
collection of facts and rules written in (i). (1 Mark)
(iii) Write a query that can search collection of facts and rules written in (i) and output all the animals that
have fur (1 Mark)

