Data collection refers to the gathering of information to serve or prove some facts. Data collection is vital in every day living. For example, without up-to-date and comprehensive data about the characteristics of the population, no government can plan and build the facilities and resources that effectively serve the citizens. Commercial organizations collect data to improve their economic prospects. By collecting views on people’s attitudes about their products, they are able to offer goods or services that potential customers seem to want. In research, data is collected to further a researcher’s understanding of a puzzling issue. Data collection helps to clarify the facts. This chapter identifies what data collection is, purposes of collecting data, effective data collection techniques, sources of data, steps in data collection, characteristics of different data collection methodologies, challenges faced by researchers in data collection and ethical issues related to data collection.


Meaning of Data Collection

In research, the term “data collection” refers to gathering specific information aimed at proving or refuting some facts. In data collection the researcher must have a clear understanding of what they hope to obtain and how they hope to obtain it. In data collection, the researcher must have a clear vision of the instruments to be used, the respondents and the selected area. Data collection is important in research as it allows for dissemination of accurate information and development of meaningful programmes.


Purpose of Collecting Data

In research, data is collected for various purposes. This includes the following:

  1. To stimulate new ideas. This is because data collection helps in identifying areas related to the research topic that need improvement or further evaluation.
  2. To highlight a situation and therefore create awareness and improvement.
  3. To influence legislative policies and regulations.
  4. To provide justification for an existing programme or illustrate a need for a new programme.
  5. It is the only reliable way to evaluate the responsiveness and effectiveness of the study.
  6. It promotes decision-making and resource allocation that are based on solid evidence rather than on isolated occurrences, assumption, emotion, politics, and so on.


Sources of Data

There are two major sources of data used by researchers. These are the primary and secondary sources.


Primary sources: Primary data is information gathered directly from respondents. This is through questionnaires, interviews, focused group discussions, observation and experimental studies. It involves creating “new” data. Data is collected from existing sources. In an experimental study, the variable of interest is identified.

Secondary sources: Secondary information sources are data neither collected directly by the user nor specifically for the user. It involves gathering data that already has been collected by someone else. This involves the collection and analysis of published material, and information from internal sources. Secondary data collection may be conducted by collecting information from a diverse source of documents or electronically stored. information. This is often referred to as desk research.



The main advantages of using secondary data are as follows:

  1. It is usually available more cheaply. The collection of secondary data is generally significantly quicker and easier (and hence less costly) than collecting the same data “from scratch.”
  2. Existing data are likely to be available in a more convenient form; using secondary data can give the researcher access to otherwise- unavailable organizations, individuals or locations.
  3. Secondary data allows the researcher to extend the “time base” of their study by providing data about the earlier state of the system being studied.
  4. The fact that secondary data are likely to be pre-processed eliminates the time-consuming (and hence costly) analysis stage.



The main disadvantages of using secondary data are as follows:

  1. The method by which secondary data was collected is often unknown to the user of the data (apart from major sources like the Census). This means that the researcher is forced to rely on the skills and propriety of the collectors — usually, but not always, a safe proposition.
  2. With secondary data the researcher may have little or no direct knowledge of the processing methods employed, and the researcher may rarely have access to the original raw data to check the validity of the findings.
  3. The researcher is forced to rely on the skills and integrity of the people who collected and analyzed the data.


Steps in Data Collection

The following are essential steps that a researcher should use in data collection:

  1. Define the sample Before gathering data, the researcher should define the target population. This involves identifying the respondents and their accessibility.
  2. Reflect on the research design: The researcher should be clear of the research design to be used. This is whether it is a survey, a case study or an experiment. This is critical as it enables the researcher to be sure of the format in which data will be collected. The researcher needs to design and select the sample in such a way that he/she obtains results that have acceptable precision and accuracy.
  3. Ensure research instruments are ready: The key data collection instruments to be used in the study for example questionnaires, interviews, observations, focus group discussions and experimental treatments should be in order. This includes finding out if they are ready and available. All research instruments should be in order. For example, if the researcher is using questionnaires, the correct number of questionnaires should be available. If using tape recorders, they should be in working condition. If any computer software is to be used, the researcher should consider his/her and assistant researcher’s expertise, the skills that exist and the cost of operating the system.
  4. Define the data to be collected The researcher should make sure that he/she and the assistant researchers are clear on the information that is being sought. Researchers should be clear of the sample, for in- stance, the male/female ratio.
  5. Request permission to collect data from the relevant authorities: Before collecting any information, the researcher should ensure he/she has been granted permission to carry out the study. The researcher should also send an advance letter to the sample respondents, explaining the purpose of the study. Information must be given to the respondents regarding the voluntary qr mandatory nature of the study and how the answers are to be used. After reflecting on all these components, the researcher should carry out a pre-test.
  6. Pre-testing: Before collecting data, the researcher should pre-test the research instruments. A pre-test is a pilot study. The researcher should pilot the questionnaire with a sma1representative sample. A pre-test of the questionnaire and field procedures is the only way the researcher can find out if everything “works” particularly the research instruments. This is because it is rarely possible for the researcher to foresee all the potential misunderstandings or biasing effects of different questions and procedures. A pilot study helps test the feasibility of the study techniques and to perfect the questionnaire concepts and wording. The importance of pre-testing before data collection includes the following:
  • It enables the researcher to find out if the selected questions are measuring what they are supposed to measure.
  • It enables the researcher to find out if the wording is clear and all questions will be interpreted in the same way by respondents.
  • It helps the researcher to detect what response is provoked and find out if there is any research bias.

It enables the researcher to monitor the context in which the data will be collected and the topic areas addressed. The researcher should not use the pre-test sample in the actual study.


Collection of Data

The procedure used to collect data will be influenced- the research instruments used. For example if questionnaires or interviews are used, the researcher should carry out the following:


Use of Questionnaires

In questionnaires respondents fill in answers in written form and the researcher collects the forms with the completed information. There are various methods used to collect the questionnaires, such as the following:

  • The instruments are distributed to the respondents by the researcher and research assistants. Respondents are given time to complete answering questionnaires. All the questionnaires are gathered after the given response time is over.
  • Questionnaires maybe distributed to respondents by the researcher and research assistants. They are later collected on an agreed upon date.
  • Questionnaires are mailed to the respondents. After they have answered them, they are mailed back. If questionnaires are administered, respondents should be given sufficient time to complete the questionnaire. The questionnaires should then be collected by the researcher or research assistants or mailed to the researcher. Today, the manner in which data is collected from questionnaires has begun to move from the traditional distribution and mail-out/ mail-back approach. The use of fax machines and the Internet is on the rise.


Use of Interviews

Collecting data using the interview method requires the researcher to identify respondents and request them to answer certain questions. The researcher and research assistants note down the answers given. In some interviews the response is recorded. Some interviews are carried out through the telephone and the information received is recorded by the researcher. The main requirement for good interviewers during data collection is the ability to approach identified respondents in person or by telephone and persuade them to participate in the study. Once a respondent’s cooperation is acquired, the interviewers must maintain it, while collecting the needed data. This data must be obtained in exact accordance with instructions.


Focus Group Discussions

In focus group discussions, the researcher should have specific topics to be discussed. A recording list should be made of the discussion. A tape recorder should also be used to keep the records. Observation:


In observations, the researcher should have a checklist to provide information about actual behaviour to be observed. The researcher should note down the observation. In experiments, the observer should also note down what has been observed. In experimental studies, where the researcher wants to obtain information under controlled conditions, subjects may be randomly assigned to various tests and experiences then assessed via observation or standardized scales.


Each data collection method has its strengths and weaknesses. When designing a research study it is important for the researcher to decide what outcome (data) the study will produce then select the best methodology to produce that desired information.

Factors to Consider during Data Collection

During data collections, researchers should adhere to the following:

  1. Collect only the data needed for the purpose of the study: researchers should avoid digressing and getting involved in issues that are not relevant to the study.
  2. Inform each potential respondent about the general nature of the study and the intended uses of the data.
  3. Protect the confidentiality of information collected from respondents and ensure that the means used in data collection are adequate to protect confidentiality to the extent pledged or intended.
  4. Ensure that processing and use of data conforms with the pledges made and that appropriate care is taken with directly identifying information (using such steps as destroying a certain type of information or removing it from the file when it is no longer needed for the inquiry).
  5. Apply appropriate techniques to control statistical disclosure. The researcher should ensure that, whenever data are transferred to other persons or organizations, this transfer conforms to the established confidentiality pledges, and require written assurance from the recipients of the data that the measures employed to protect confidentiality will be at least equal to those originally pledged. While in the field the researcher should ensure the following:
  • Punctuality in appointments.
  • Use of clear and simple language.
  • Be careful about question construction: The manner in which a question is formulated can also result in inaccurate responses.
  • Have various ways of probing: It is important for the researcher and research assistants to be aware that some individuals tend to provide false answers to partj1cular questions. If this is noted, the researcher should devise other ways of probing.
  • It is important for the researcher to acknowledge that certain psyhological factors, such as fear or low self-esteem can induce in correct responses. Great care must be taken to design a study that. minimizes this effect.

Importance of Data Analysis

Importance of data analysis includes the following:

  • Findings/results are clearly shown.
  • Areas / gaps for further research are pointed out.
  • Researchers can be able to know the results without wasting time on primary and secondary data.
  • One can be able to know the statistical methods used for analyzing data.


Pitfalls in Data Analysis and Interpretation

There are three pitfalls in data analysis and interpretation, which are shown below:

  1. The first involves sources of bias. These are conditions or circumstances which affect the external validity of statistical results.
  2. The second is errors in methodology, which can lead to the third point.
  3. The third class of problems concerns interpretation of results, or how statistical results are applied (or misapplied) to real world issues.


Ethical Issues in Data Collection

Researchers whose subjects are people or animals must consider the conduct of their research, and give attention to the ethical issues associated with carrying out their research. Sometimes a research, project may involve changing the subjects’ behaviours or in some cases, causing the subjects pain or distress for example in experiments where the researcher analyzes blood samples. Most research organizations have complex rules on human and animal experimentation. Some of the rules applicable to data collecting are as follows:’

  1. The researcher must justify the research via an analysis of the balance of costs. The researcher’s interest alone is not sufficient justification to carry out research and collect data. In order to carry out a survey or experiment, there has to be benefits from the study that outweigh the costs. Researchers are expected to justify beyond any reasonable doubt, the need for data collection.
  2. The researcher must maintain confidentiality at all times. Only certain people conducting the survey/experiment should know the identity of the participants. Any subject should generally not know the identity of other subjects.
  3. Researchers are responsible for their own work and for their contribution to the whole study. Researchers must accept individual responsibility for the conduct of the research and, as far as foreseeable, the consequences of that research.
  4. The researcher must obtain informed consent from any subjects used in the study and must ensure that all subjects participate voluntarily.
  5. The researcher must be open and honest in dealing with other researchers and research subjects. The researcher must not exploit subjects by changing agreements made with them. For example, a researcher might discover that his/her survey/experiment show something that he/she would like to further investigate. If the researcher carries out the investigation secretly but pretends to be still carrying out the previous study that had been agreed to in the first place, this is a form of exploitation, and would breach the principles of informed consent and voluntary participation.
  6. The researcher must take all reasonable measures to protect subjects physically and psychologically. Even voluntary participants can “get carried away” to the point where they have to be protected from themselves and each other. The researcher must be prepared to intervene; even at the cost of the study/experiment itself, to protect the subjects.
  7. The researcher ‘must fully explain the research in advance, and debrief subjects afterward. Whilst full ,explanations before the survey/experiment are essential to gaining informed consent, it is, unfortunately, a common practice for researchers to complete their research without telling the participants anything about the results.


Challenges Faced by Researchers in Data Collection

Collecting data entails scores of activities, each of which must be carefully planned and controlled. Lack of proper strategies can invalidate the results and badly mislead the users of the information gathered. Some of the challenges faced by researchers in data collection are:

  1. The researcher failing to carry out a pilot study: Failure to pilot the study may contribute to haphazard work in the field. This is mainly because a pre-test helps to identify some of the shortcomings likely to be experienced during the actual study. A pre-test of the questionnaire and field procedures is the only way of finding out if everything will “work” during the actual study.
  2. Lack of sufficient follow up on non respondents: A researcher’s failure to follow up non respondents can ruin an otherwise well-designed study. It is not uncommon for the initial response rate in many survey studies to be under 50 percent. A low response rate does more damage in rendering a survey’s results questionable than a small sample. This is because there may be no valid way of scientifically inferring the characteristics of the population represented by the non-respondents. To deal with this possibility, the researcher, may have to return to sample households where no one was home (perhaps at a different time or on a weekend) or attempt to persuade persons who are inclined to refuse to participate. In the case of questionnaire response, it is usually necessary to conduct several follow-ups spared, possibly, about three weeks apart.
  3. Inadequate quality controls: In some field work the researcher allocates all work to research assistants with minimum supervision. This can result in guessing the results. Controlling the quality of the fieldwork is done in several ways. The researcher can control the quality of field work through observation. The researcher can also carry out a small sample of interviews. There should be at least some questionnaire- by- questionnaire by checking by the researcher, while the survey is being carried out. This is essential if omissions or other obvious mistakes in the data are to be uncovered before it is too late to rectify them. The researcher should during field work, re-examine the sample selection, carry out some ……………………………………………..coding of the responses. Without proper checking, errors may go undetected. The researcher should insist on high standards in recruiting and training of interviewers. This is crucial to conducting a quality field study.
  4. Poor targeting: Errors in defining and selecting the sample during data collection will bias the results by making the sample less representative of the target population. This can be due to non-inclusion errors where people are not included in the sample who should be.
  5. Poor implementation: In data collection some errors are caused by the way data collection is implemented. Some of the errors include the following:
  • Question errors – the question is wrongly worded or misleading.
  • Interviewer error – the interviewer makes an error whilst asking the question.
  • Recording error – the interviewer records incorrectly the answer given by the respondent.
  • Coding error – the responses are wrongly coded.


In data collection, the researcher must play an active role. He/she must ensure that data collection is accurate. It is essential that at the end of every session of data collection, a brief meeting is held with research assistants to analyze the work covered and any challenges faced. This should also be time to map Out the next session. The researcher should collect and keep all the collected data after every session. In data collection, the researcher should ensure that the objectives of the field study are clearly spelt out and understood by all participants. If respondents are to be interviewed, the researcher should ensure that they are aware of the time the researcher is arriving. The researcher should avoid inconveniencing respondents. He / she should always thank respondents after data collection.



The collection of information is a vital component in research, This is because it is through the collected  information that major research findings are made, recommendations offered and the way forward formulated. A researcher should therefore ensure that relevant steps are adhered to in data collection. Efforts should also be made to …………………………………….







The term “data organization” in research refers to orderliness in research data. This is putting the data into some systematic form. The raw” data collected, particularly in surveys, needs to be processed before it can be subjected to any useful analysis. This organization includes identifying (and correcting) errors in the data, coding the data, and storing it in appropriate form. On the other hand, analysis refers to examining the coded data critically and making inferences.


The presentation of data refers to ways of arranging data to make t clearly understood. This chapter discusses the organization, analysis and presentation of data.


Data Organization

Collected data is known to be “raw” information and not knowledge by itself. It therefore has to be well organized in various stages. The organization from raw data to knowledge is as follows:

  • From raw data to information: Data becomes information when it becomes relevant to the problem identified by the researcher.
  • From information to facts: Information becomes facts, when the data can support it. Facts are what the data reveals.
  • From facts to knowledge: Facts therefore lead to new information, new experiences and views.
  • Knowledge is expressed together with some statistical degree of confidence. Before analyzing the collected data, the researcher has to ensure the data is well organized. The procedure in data organization involves the following: 


a) Pre-processing

After collecting data the researcher has to ensure it processed in some manner before carrying out the analysis. The primary purpose of pre-possessing is to correct problems that are identified in the  raw data. This might include differences between the results obtained by multiple interviewers. In experiments, calibrations are carried out where significant and consistent differences between the measured result and the “correct” result are found. The pre-processing stages are as follows:

  • The elimination of unusable data: The researcher may find two or more questions that really provide the same data. The researcher must therefore decide which one of the questions is worth coding and storing, and which one should be discarded.
  • Interpretation of ambiguous answers: The more subtle problems in data analysis are associated with the researcher trying to interpret ambiguous answers. It could be argued that any complex study is likely to produce at least some answers of this type. The researcher needs to develop a strategy for dealing with them.
  • Contradictory data from related questions: The researcher may also receive contradictory data from related questions. For example, respondents in one religious denomination may give different answers as to who is the church elder. Contradictions may be due to church wrangles. The researcher may have to verify and reject wrong responses.

Many of these problems, if not detected and corrected at the organization stage, will reflect adversely on the study findings.


b) The Development of a Coding Scheme

After correcting any errors that may influence data analysis, the researcher should formulate a coding scheme. The core function of the coding process is to create codes and scales from the responses, which can then be summarized and analyzed in various ways. A coding scheme is an unambiguous set of prescriptions of how all possible answers are to be treated, and what (if any) numerical codes are to be assigned to particular responses. In the coding scheme the researcher assigns codes to each likely answer, and specifies how other responses are to be handled. For exam9le, the researcher might allocate 1 to yes, 2 to no and 0 to do not know. Although these numerical codes are arbitrary, in some cases, their organization will have implications on how the resulting data can be processed statistically. The reliability of a coding scheme is whether the person who created the scheme (researcher) can give it to another person, and the coding of the raw data matches exactly what the person creating the code (researcher) would have produced if they had applied the scheme to the same answers.


There are various challenges faced by researchers in the development of a coding scheme. The major challenge associated with coding is the treatment of missing data. It is difficult for the researcher to decide on what action should be taken when the coding cannot be applied, such as when a question is unanswered. Do they ignore the question, or change and interpret it? Decisions are usually needed on how to handle missing items, or cases in which the respondent did not know the answer or refused to provide one. While providing codes it may also occur to a researcher that an essential question was not asked. There are several possible approaches that a researcher can apply to address these challenges. These include:

  • Cross-reference the missing answer with the answers to related questions (this option, ironically, is less available if the researcher has carefully minimized duplication between questions).
  • Interpolate from other answers to create a “pattern” for the respondent, and look to see how other respondents of the same “type” answered this question.
  • Look at the distribution of answers and interpolate from that; some computer programs will supply distributions of answers to the question and suggest what the missing value ought to be in order to maintain the distribution.
  • Give missing data its own code, such as “Did not answer”; this is the most common (and safest) approach.
  • Exclude the respondent from the analysis (if the respondent failed to answer a number of questions, or the responses appear unreliable).
  • Exclude the question from the analysis (if a significant number of respondents failed to answer it).


However in research, the preferred practice for missing items is to provide special codes indicating why the data was not included. When resources are available, the “filling in” or imputation of these missing data items should be undertaken by the researcher to reduce any biases arising from their absence. This involves going back to the field and filling in the missing information.


c) Deciding on Data Storage

After coding the data, the researcher will have to make a decision about the short and long-term storage of the information generated. The short time storage is necessary before data analysis. The system in which the researcher stores the data will determine (at least in the early stages) what forms of analysis the researcher will carry out and how easy it will be to transfer the data into systems which will do more complicated forms of analysis. There are two major storage forms, the electronic form and non electronic (paper) form.


Paper storage: This is where the coded data is written on paper before the analysis. Paper storage has the following advantages:

  • It has a low cost.
  • It allows for speedy retrieval.
  • It is easy to distribute.
  • It is comprehensible.

However, its disadvantages include the following:

  • It is not extensible.
  • It is fragile.
  • It is bulky.

Electronic storage: The advantages of electronic storage include the following:

  • It is extensible.
  • It is easy to distribute.
  • It is easy to interchange options.
  • It has low volume.


The disadvantages of electronic storage are:

  • Equipment costs are high.
  • It has limited access.
  • It is fragile.


Today, selecting electronic storage is an increasingly significant decision for a researcher. In electronic storage, the researcher can transfer the data (or information derived from it) into another system.


d) Choosing a Statistical Software Package

After deciding on how data will be stored, the researcher has to reflect on the statistical software package that will be relevant in data analysis. When choosing a statistical software package, there are several things a researcher has to consider. These include the following:

  • Characteristics of the data to be used; for example, is it descriptive or does it analyze relationships?
  • Analyses that will be performed.
  • Technical and financial constraints.


There are various types of statistical software packages that a researcher can select from. The software the researcher selects depends on the overall plan that the researcher has for analyzing and presenting the data. The following are some of the computer software:



The researcher may decide to enter the data in text form straight into a word processor, include Microsoft Word and Excel.



  • The obvious advantage of using a word processor is that the researcher does not waste time on unnecessary processing. This is because data in text form is entered directly in the processor
  • If the researcher is creating a report from this data to explain and present it then he/she can directly use the data.
  • The researcher might choose to take the data (from survey or experiment



The major problem of using a word processor is lack of analytical tools. Only the most advanced word processors have spreadsheet- like functions. This means that in most cases, if the researcher puts data into a table he/she cannot carry out simple calculation (sums and standard deviations) on the column of the table.



This is one of the most versatile analysis and storage combination tools. Many of the formulae that spreadsheets have built-in are applicable to the data summarization process.



  • Spreadsheets allow a large range of conventional summary statistics.
  • Some also incorporate elements of Exploratory Data Analysis (EDA).
  • It is possible with some spreadsheets to form cross-tabulations.
  • Most spreadsheets offer graphical presentation of the results of an analysis.
  • Spreadsheets are also able to interchange data with other systems. By using spreadsheets, a researcher can take information straight from a spreadsheet and place it into a word processor. Relevant information from the spreadsheet can be copied directly across to a report.





  • The statistical functions supported by spreadsheets are mostly restricted to descriptive statistics and basic inferential statistics. A researcher is unlikely to find a wide range of advanced statistical operations, such as multivariate statistics.
  • Whilst the graphics in most spreadsheets are visually impressive, they are usually restricted to a certain number of fairly fundamental graphic structures (bar ch.rts, pie charts, and so on). If a researcher wants to use some of the more esoteric systems he/she has to transfer the data either via a statistical package or directly to a graphics package.



In research analysis, databases are vital in record keeping. A researcher may use a database programme where he/she wants to take. advantage of the record manipulation options of database management systems. For example, if researcher wants to find all survey responses where the respondent said yes to one question and no to another, the researcher keys in formulated codes and gets the answers. As well as basic record manipulation (sorting and searching), the database also provides other basic data processing functions, such as cross tabulations.



  • Databases have high levels of interchangeability with other systems, such as word processors, spreadsheets, graphic packages and statistical packages.
  • The database is often a good starting point for storing raw data because if a researcher needs to manipulate the data (beyond the capability of the database), he/she can do so by transferring the information into an alternative system.



These are application systems that carry out a wide range of statistical techniques. The simplest statistical packages support data summarization and basic inferential statistics. The more complex statistical packages support advanced inferential techniques, including multivariate methods. What they offer is advanced data manipulation. This includes sophisticated data description, and a range of various statistical tests. Statistical systems interchange particularly strongly with graphic systems.



These are not software packages. Generally, the researcher is not going to actually store data in a graphical system for future analysis. The assumption is that after the researcher has carried out the analysis, he/she generates graphical displays of the results. Graphical systems emphasize:

  • Advanced display options, including a large range of chart tyres.
  • Interchange with word processors and other graphic systems such as presentation graphics and visualization systems.


Before purchasing any statistical software package, it is crucial for the researcher to reflect on the data that will be analyzed, particularly on the effectiveness of the statistical software package identified in analyzing the collected data. Many statistical packages are unable to handle a large amount of data, or various types of data structures. The researcher should brainstorm on the following:

  • How will the data collected be stored?
  • How will the data be accessed by the software package?
  • Will the statistical package be able to create new variables as well as query the data?
  • What amount of data will be used for the analysis? Will the statistical package be able to handle the database size?
  • Does the current staff have the knowledge to operate the statistical package? What is financial implication of the statistical package?

While all statistical packages are able to generate descriptive statistics and basic tests, the breadth and depth of complex analyses that a statistical package is able to perform varies greatly among packages. Several statistical packages require the purchasing of additional modules or programmes in order to perform more advanced analyses. These packages may be expensive. The researcher should purchase only needed programmes and expand the package when additional analyses are needed. In selecting a statistical package, the researcher should also consider its display of the results and graphs.


Data Analysis

Data analysis refers to examining what has been collected in a survey or experiment and making deductions and inferences. It involves uncovering underlying structures; extracting important variables, detecting any anomalies and testing any underlying assumptions. It involves scrutinizing the acquired information and making inferences.


Statistical data analysis divides the methods for analyzing data into two categories: exploratory methods and confirmatory methods Exploratory methods are used to discover what the data seems to be saying by using simple arithmetic and easy-to-draw pictures to summarize data. This is used mainly in qualitative research. Confirmatory methods use ideas from probability theory in the attempt to answer specific questions. These methods are mainly applicable in quantitative research. The methods used in data analysis are influenced by whether the research is qualitative or quantitative.


  1. a) Data Analysis in Qualitative Research

Qualitative research involves intensive data collection (of several variables), over an extended period of time in a natural setting (variables are studied when and where they naturally occur). Qualitative data, such as finding out the views of respondents on a certain issue (for example, abortion) is not always computable by arithmetic relations: The responses can be categorized into various classes which are called categorical variables. The analysis of qualitative .data varies from simple descriptive analysis to more elaborate reduction and multivariate associate techniques. The analysis will vary with the purposes of the research, the complexity of the research design and the extent to which conclusions can be reached easily (Orodho and Kombo, 2002:116). In qualitative research designs, the researcher should decide before going to the field, how he/she will analyze the data. The analytical technique will determine the recording style that will be used during the data collection exercise. The analytic techniques used in qualitative research are as follows:



In qualitative research, data can be analyzed bi a quick impressionist summary. This involves the following:

  • Summarizing key findings. For example in focus group discussions the researcher notes down the frequent responses of the participants on various issues.
  • Interpretation and conclusion.

This rapid data analysis technique is mainly used in situations that require urgent information to make decisions for a programme for example in places where there is an outbreak such as cholera and vital information is needed for intervention. This technique can also be used when the results already generated are obvious, making further analysis of data unwarranted. For example if a researcher finds out that 80% of respondents give similar answers to what caused a fire outbreak doing further analysis may be unwarranted. This form of analysis does not require data transcription. The researcher records key issues of the discussion with respondents. A narrative report is written enriched with quotations from key informants and other respondents.



In qualitative research, data can also be analyzed thematically. Themes refer to topics or major subjects that come up in discussions. This form of analysis categorizes related topics. In using this form of analysis major concepts or themes are identified. In this form of data analysis, the researcher does the following:

  • Peruses the collected data and identifies information that is relevant to the research questions and objectives.
  • Develops a coding system based on samples of collected data.
  • Classifies major issues or topics covered.
  • Rereads the text and highlights key quotations/insights and interpretations.
  • Indicates the major themes in the margins.
  • Places the coded materials under the major themes or topics identified. All materials relevant to a certain topic are placed together.
  • Develops a summary report identifying major themes and the associations between them.
  • Uses graphics and direct quotations to present the findings.
  • Reports the intensity, which refers to the number f times certain words or phrases or descriptions are used in the discussion. The frequency with which an idea or word or description appears is used to interpret the importance, attention or emphasis.


Weaknesses: The thematic method tends to rely heavily on the judgment of a single analyst. This may lead to high levels of subjectivity and bias. It may he necessary to have two or more analysts to code the transcript independently and compare notes.



Content analysis examines the intensity with which certain words have been used. Content analysis systematically describes the form or content of written and/or spoken material. In content analysis a classification system is developed to record the information. In interpreting results, the frequency with which a symbol or idea appears may be interpreted as a measure of importance, attention or emphasis. The relative balance of favourable attributes regarding a symbol or an idea may be interpreted as a measure of direction or bias. In content analysis, a researcher can be assisted by trained researchers or a computer programme can be used to sort the data to increase the reliability of the process. Content analysis is a tedious process due to the requirement that each data source be analyzed along a number of dimensions. It may also be inductive (identifies themes and patterns) or deductive (quantifies frequencies of data). The results are descriptive, but will also indicate trends or issues of interest. In content analysis, the first step is to select the data source to be studied, then develop a classification system to record the information.


There are various forms of content analysis. These are as follows:

  • Pragmatic Content Analysis: Classifies signs according to their probable causes and effects. The emphasis is on why something is said. This could be used to understand people’s perceptions and beliefs.
  • Systematic Content Analysis classifies signs according to meaning.
  • Designation analysis determines the frequency with which certain objects or persons, institutions or concepts are mentioned. This is a simple counting exercise.
  • Attribution analysis examines the frequency with which certain characterization or descriptors are used. The emphasis is on the adjectives, verbs, and descriptive phrases and qualifiers. This is a simple counting exercise.
  • Assertion analysis provides the frequency with which certain objectives (persons, institutions) are characterized in a particular way. Such an analysis often takes the form of a matrix with objects as columns and descriptors as rows (Orodho and Kombo, 2002: 119)


In historical research there are various forms of data analysis. These include:

  • Analysis of concepts: Concepts are clarified by describing the essential and core concepts beginning from the early developmental stages.
  • Interpretive analysis relates one event to another. The event is studied and described within a broader context to add meaning and credibility to the data.
  • Comparative analysis examines similarities and differences in events during different time periods.
  • Theoretical and philosophical analysis utilizes historical parallels, past trends, and sequences of events to suggest the past, present, and future of the topic being researched. Findings would be used to develop a theory or philosophy of leisure. For example, an analysis of public recreation agency goals and objectives of previous eras can be used to describe the future in the context of social, political, economic, technological, and cultural changes in society.


b) Data Analysis in Quantitative Research

Quantitative data analysis Consists of measuring numerical values from which descriptions such as mean and standard deviations are made. These data can be put into an order and further divided into two groups: discrete data or continuous data. Discrete data are countable data, for example, the number of defective items produced during a day’s production. Continuous data, are parameters (variables) that are measurable and are expressed on a continuous scale, for example, the height of a person. The analysis of quantitative data varies from simple to more elaborate analysis techniques. The analysis varies with the objective of the experiment, its complexity and the extent to which conclusion can be easily reached. Data analysis in quantitative research depends on the type of study. This is as follows:



In corelational research studies, data is mainly analyzed using the correlation coefficient. By using this tool the researcher indicates the degree of relationship between two variables. The correlation coefficient is a number ranging from 1 (a perfect positive correlation) through 0 (no relationship between the variables) to-i (a perfect negative correlation). In analyzing the correlation coefficient, a researcher attempts to indicate the proportion of sameness between two variables.  One of the correlation tools is the Pearson Product Moment Correlation. This tool is used to analyze the relationship between isolated independent and dependent variables.

Another type of correlation analysis is reliability studies (analyses conducted to provide information about the validity and reliability of tests). In reliability studies the same group of subjects is given a test and then at a somewhat later date is given the test again. The researcher analyzes the two scores for each subject (the test score and the retest score) and the correlation coefficient between the two sets of scores can be calculated. This kind of correlation coefficient is referred to as a reliability coefficient. Many tests used in education, for example, standardized achievement tests, have more than one form. To determine the reliability coefficients, a group of subjects are given both forms of a test thus two scores are obtained for each subject and the correlation coefficient is calculated for the two sets of scores. To conduct a validity correlational analysis, a researcher obtains scores for students on a test and also, records their scores on the criterion measure. Thus he/she has two scores for each subject and can calculate the correlation coefficient of the sets of scores. This correlation coefficient is referred to as a validity coefficient.


The important thing to remember is that in correlational research, while carrying out analysis, the researcher is only looking at the degree of relationship between the variables and not the effect of one variable on another variable.



In predictive correlational studies, while carrying out the analysis, the researcher uses the degree of relationship that exists between two variables to predict one variable from the other. For example if reading and spelling are correlated, then the researcher can use the information to predict a student’s score on the spelling test if the student has only taken the reading test. Conversely, the researcher can predict the student’s score on the reading test given the student’s score on the spelling test. Prediction studies are widely used to predict student academic success in college, based on such measures as secondary school grades in mathematics, and aptitude test scores.



Causal-comparative educational research attempts to identify a causative relationship between an independent variable and a dependent variable. However, this relationship is more suggestive than proven as the researcher does not have complete control over the independent variable. If the researcher had control over the independent variable, then the research would be classified as true experimental research. In carrying out analysis based on this design, the researcher compares two selected groups on the dependent variable. For example, if in form two, some of the students in mathematics classes use calculators while others do not, a researcher may be interested in finding out the effect of calculator use on mathematics grades at the end of the year. The researcher therefore selects a group of students from the class that use calculators and then selects another group of the same size form the class that do not use calculators and compares the two groups at the end of the year on their final mathematics grades. Another variant of this study would be to take the students from a class that uses calculators and compare them with another class that does not use calculators. Both these studies would be causal-comparative research studies but they would differ in how you can generalize the results of the study. One of the problems faced in analyzing data in causal-comparative research is that since the respondents are not randomly placed in the groups, the groups can differ on other variables that may have an effect on the dependent variable.


An inferential statistic used to analyze data in both causal comparative and experimental research designs is the t-test. Where the subjects in the two groups are independent of one another, that is no matching of subjects or other control procedures were used. The independent t-test is used to test the significance of a difference between the means of the experimental and control groups in the study. In research designs where the influence of an extraneous variable has been controlled, or in designs utilizing a pre-test-post-test procedure, the appropriate t-test to use to compare the two groups would be the dependent t-test. When a researcher has three or more groups to compare, the appropriate inferential statistic to use in data analysis would be one-way analysis of variance. This statistic shows the significance of differences in the means of three or more groups of subjects.


In cases where the researcher uses frequency counts for the dependent variable, the appropriate inferential statistic to use in data analysis would be the chi-square test. This statistic tests the significance of differences between two or more groups (independent variable) in frequencies for the dependent variable.



The major difference in data analysis between causal-comparative and experimental research is that the researcher has control over the independent variable in experimental research and can manipulate this variable at will. In the case of causal-comparative research, the independent variable is established by the identity of the groups chosen and is not under experimental control. In experimental designs, the observer should decide before carrying out the experiment the analytical process. The analytical process in experimental studies mainly involves the calculation of effect size. Effect size is the mean of the experimental group minus the mean of the control group, divided by the standard deviation of the control group. The idea is to calculate the effect size across a number of studies to determine the relevance of the test, treatment, or method.


Data Presentation

There are three ways researchers can present data after analysis.

This includes the following:

  • Using statistical techniques.
  • Using graphical techniques.
  • Using a combination of both.


Statistical Techniques

Statistics are a set of mathematical methods used to extract and clarify information from observed data. Statistics generate simple numbers to describe distributions, either grouped or ungrouped. Statistics have two major functions in data presentation. They can add to our understanding of the data that make up the distribution, and they can substitute for (be used instead of) the distribution. With descriptive statistics it is important to define whether the researcher is calculating values for a population or for a sample: the results will be different. A sample statistic is any numerical value describing a characteristic of a sample. The following are some of the statistical techniques used to present analyzed data.



The values in a set of ungrouped data constitute a distribution. The values that we have in a set of ordinal data, and the values we generate by converting ungrouped data into grouped form, constitute a frequency distribution. For example, imagine a survey in which we measure the weight of a sample of pieces of wood loaded onto a lorry. The values for all the pieces of wood measured make up a distribution. A researcher can calculate sample statistics from that distribution, such as a sample mean (for example, 14.56 kg). A frequency distribution of grouped data can also be created as shown in the table below.


Weight (kg) Number
7-9 2
10-12 8
13-15 12
16-18 19
19-21 7


Table 1. Frequency distribution of a wood load


Class Limits: The frequency distribution is made up of the values (Counts) for a set of classes; each class has a frequency (f) associated with it. The class limits are the upper and lower values for each class. They should be defined in such a way that no value is excluded, but no value can fall into two classes. The researcher can achieve this by using class boundaries with a precision (meaning in this case number of significant figures) one order below that of arty of the actual data values. In the wood example, if the researcher weighs the pieces to the nearest tenth of a kiIogrmme, he/she would set the class boundaries to 7.05, 9.05, and so on. The class interval is the 4ifference between the upper class boundary and the lower class boundary; in most frequency distributions it will be constant across the classes. The point halfway between the upper and lower class limits is the class midpoint. These values are used to calculate the mean of a set of grouped data.


Statistics can be divided into two groups: measures of central tendency and measures of dispersion.



Measures of central tendency are numbers that define the location of a distribution’s centre. For example, if we regard all measurements as being attempts to give us the “true” value of a particular phenomenon, we can regard the centre of the distribution of a set of measurements an estimate of that “true” value. The various sources of error in the measurement process will produce variability in the measurements, so they will not all have the same value. Measures of dispersion attempt to quantify the extent of this variability. When dealing with ungrouped data, the researcher can use several measures of central tendency. These include the mean, the median and mode. When dealing with grouped data, the researcher cannot use the arithmetic mean, instead he/she can use the group mean. Using grouped data the researcher cannot use the median, but can define the modal class.


MEAN — This is the average. It is found by the sum total divided by the number.


MEDIAN — The median can be defined in a set of ungrouped data. If the data are arranged in ascending or descending order; in general, the median is the value that has half of the data values less than it, and half greater than it. If the sample size (n) is an odd number, the median is the middle value of the entire distribution. If n is an even number, the median is the mean of the two “middle” values. For example, in the fallowing ungrouped data 12, 14, 16, 18, 19, 22, 24; the median is 18 whereas for 12, 14, 16, 18, 19, 22, 24, 27 the median is 18.5. So the median is the value that minimizes the absolute distance to the data points.


MODE — The mode of a set of data is the value that occurs most often, with certain provisos: It is possible to have no mode (that is , no value occurs more than once). It is possible to have more than one mode (a distribution may be bimodal, trimodal or multi-modal). For grouped data the class with the highest frequency value is the modal class. There may be two modal classes (bimodal), or more. For example, for the following frequencies: 12, 18, 13, 13, 22, 12, 14, 13 the mode is 13.



This type of statistic describes how much the distribution varies around the central point. The various ways we can describe this spread are called measures of dispersion. These measures quantify the variability of the distribution. As they are attempting to quantify the general shape of a distribution rather than a single value for its centre most measures of dispersion are numerically more complex. These measures consist of the following:


RANGE — The simplest measure of dispersion is the range of the data: the difference between the highest and the lowest values in the data (maximum — minimum).


VARIANCE._ This is a measure that indicates the distribution of data. It is based upon the idea that each observation differs from the mean by some amount. This is referred to as the difference from the mean. The difference between each value and the population mean is called its deviation. To get the variance, all the values are taken and summed. Dividing the result by the population size (N) gives the mean deviation. Unfortunately, this measure does not give sufficient “weight” to the values on the margins of the distribution. To do so, the sum of the squares of the deviations from the mean has to be taken. Dividing this value (the sum of squared deviations) by the population size gives the variance of the distribution.

Standard Deviation: The standard deviation is the square root of the variance. For example in the example on the wood weight, if the mean weight was 13.78 kilogrammes, and the variance was 3.56 kilogrammes, the standard deviation will be 1.89 kilogrammes. Consequently, we cannot compare the variances of two distributions unless they happen to have the same units. We cannot use the variance (or the standard deviation) to indicate which of two or more distributions exhibits greater variability. For this latter purpose we need a “dimensionless” measure of dispersion, for which we usually employ the coefficient of variability.


Coefficient of Variability (or Variation). The coefficient of variability is calculated by expressing the standard deviation as a percentage of the mean.


The basic shape of a frequency curve can be described quantitatively by several measures.


These are measures that explicitly quantify the “balance” of the distribution (See figure 2.). These balance bas two components:

  • Are the values arranged symmetrically on either side of the centre?
  • Is the distribution highly “peaked” (most values lie close to the centre, and the tails are short) or is the distribution “flat” (long tails and a low central concentration)?

The measures used to describe the overall symmetry of a distribution that is, whether the two tails of the distribution are equal – is called the skewness. The distribution can be described as left (positively) or right (negatively) skewed. The coefficient of skewness can be used to quantify the extent of the asymmetry. We also define whether the distribution is “peaked” or not; the measure for this is called the kurtosis. Distributions that are strongly peaked (that is, most of the values lie close to the centre of the distribution, with relatively short tails) are termed leptokurtic1 whereas those where the values are broadly spread (the tails are long) are termed platykurtic.


Figure 2: The major components of distribution shape


PERCENTILES: Percentiles are values that divide a set of observations into 100 equal parts (F1, F2, P3 P99) such that 1% of all the data points fall below P1, 2% fall below P2 and so on.


DECILES: Deciles are values that divide a set of observations into ten equal parts (D1, D2, D3 D9) such that 10% of all the data points fall below D1, 2O% fall below D2, and so on.


QUARTILES: Quartiles are values that divide a set of observations into four equal parts (Q1, Q2, Q3) such that 25% of all the data points fall below Q1, 50% fall below Q2, and 75% fall below Q3.



Whilst the most obvious way of representing grouped data is as a table, the information can also be represented diagrammatically. ,Data can be graphically presented by a histogram or polygon.


a) Histogram — A basic representation of the shape of a frequency distribution (See figure 3.). This can be shown as a series of vertical (or horizontal) bars, their length indicating the frequency of thëpárr

ticular class.

b) PolygonData can also be presented as polygons. The polygon is closed by connecting the midpoints of the end classes to the midpoints of “imaginary” classes on each side, which have a notional frequency of zero.



Figure 3: Sample Histogram


c) Bars The cumulative frequency distribution can also be plotted as a series of bars (see Figure 4), or as a series of lines joining the midpoints of the classes; this is termed an ogive (Figure 5).

Figure 5: Cumulative Frequency Curve (Ogive)

Pie chart — A pie chart can also be used for the purpose of presenting results. (Figure 6).

At a glance, one can be able to see that the upper class group dominates the purchasing. of goods and services.


Challenges Faced in Data Analysis

In data analysis, the researcher should ensure the following:

  • Understands the assumptions of their statistical procedures and be sure they are satisfied. In particular, the researcher should be aware of hierarchically organized (non independent) data use techniques designed to deal with the challenges faced in data analysis.
  • Be sure to use the best measurement tools available. If measures have errors, then that fact should be considered.
  • Beware of multiple comparisons. If one has to do many tests, then he/she should try to replace or use cross-validation to verify the results.
  • Keep in mind what one is trying to discover. One should look at the magnitude rather than values.
  • Use numerical notation in a rational way. One should not confuse precision with accuracy.
  • Be sure to understand the conditions for causal inference. If one needs to make inference, then he/she should try to use random assignment. If that is rcot possible, then one should devote a lot of effort to unearth causal relationships with a variety of approaches to the question.
  • Be sure that the graphs are accurate and reflect the data variation clearly.

‘Ethical Issues

In data analysis and presentation1 a researcher should maintain integrity. This is particularly in the application of statistical skills to problems where private interests may inappropriately affect the development or ap1ication of statistical knowledge. For these reasons, researchers should:

  • Present their findings and interpretations honestly and objectively.
  • Avoid untrue, deceptive, or doctored results.
  • Disclose any financial or other interests that may affect, or appear to affect their analysis.
  • Delineate the boundaries of the inquiry as well as the boundaries of the statistical inferences which can be derived from it.
  • Make the data available for analysis by other responsible parties with appropriate safeguards for privacy concerns.
  • Recognize that the selection of a statistical procedure may to some extent be a matter of judgment and that other statisticians may select alternative procedures.
  • Direct any criticism of a statistical inquiry to the inquiry itself and not to the individuals conducting it.
  • Apply statistical procedures without concern for a favourable outcome.



  • In data analysis and presentation, a researcher has, according to Cohen (1993) to be sure of the following:
  • Be sure the analysis sample is representative of the population in which the researcher is interested
  • Be sure you understand the assumptions of your statistical procedures, and be sure they are clearly defined. Beware of hierarchically organized (non-independent) data and use techniques designed to deal with them.
  • Be sure to use the best measurement tools available. If your measures have errors, take that fact into account.
  • Be clear of what you are trying to discover.
  • Be sure the graphs are accurate and reflect the data variation clearly.
(Visited 485 times, 1 visits today)
Share this:

Written by