The term “data organization” in research refers to orderliness in research data. This is putting the data into some systematic form. The raw” data collected, particularly in surveys, needs to be processed before it can be subjected to any useful analysis. This organization includes identifying (and correcting) errors in the data, coding the data, and storing it in appropriate form. On the other hand, analysis refers to examining the coded data critically and making inferences.
The presentation of data refers to ways of arranging data to make t clearly understood. This chapter discusses the organization, analysis and presentation of data.
Collected data is known to be “raw” information and not knowledge by itself. It therefore has to be well organized in various stages. The organization from raw data to knowledge is as follows:
- From raw data to information: Data becomes information when it becomes relevant to the problem identified by the researcher.
- From information to facts: Information becomes facts, when the data can support it. Facts are what the data reveals.
- From facts to knowledge: Facts therefore lead to new information, new experiences and views.
- Knowledge is expressed together with some statistical degree of confidence. Before analyzing the collected data, the researcher has to ensure the data is well organized. The procedure in data organization involves the following:
After collecting data the researcher has to ensure it processed in some manner before carrying out the analysis. The primary purpose of pre-possessing is to correct problems that are identified in the raw data. This might include differences between the results obtained by multiple interviewers. In experiments, calibrations are carried out where significant and consistent differences between the measured result and the “correct” result are found. The pre-processing stagesare as follows:
- The elimination of unusable data: The researcher may find two or more questions that really provide the same data. The researcher must therefore decide which one of the questions is worth coding and storing, and which one should be discarded.
- Interpretation of ambiguous answers: The more subtle problems in data analysis are associated with the researcher trying to interpret ambiguous answers. It could be argued that any complex study is likely to produce at least some answers of this type. The researcher needs to develop a strategy for dealing with them.
- Contradictory data from related questions: The researcher may also receive contradictory data from related questions. For example, respondents in one religious denomination may give different answers as to who is the church elder. Contradictions may be due to church wrangles. The researcher may have to verify and reject wrong responses.
Many of these problems, if not detected and corrected at the organization stage, will reflect adversely on the study findings.
After correcting any errors that may influence data analysis, the researcher should formulate a coding scheme. The core function of the coding process is to create codes and scales from the responses, which can then be summarized and analyzed in various ways. A coding scheme is an unambiguous set of prescriptions of how all possible answers are to be treated, and what (if any) numerical codes are to be assigned to particular responses. In the coding scheme the researcher assigns codes to each likely answer, and specifies how other responses are to be handled. For exam9le, the researcher might allocate 1 to yes, 2 to no and 0 to do not know. Although these numerical codes are arbitrary, in some cases, their organization will have implications on how the resulting data can be processed statistically. The reliability of a coding scheme is whether the person who created the scheme (researcher) can give it to another person, and the coding of the raw data matches exactly what the person creating the code (researcher) would have produced if they had applied the scheme to the same answers.
There are various challenges faced by researchers in the development of a coding scheme. The major challenge associated with coding is the treatment of missing data. It is difficult for the researcher to decide on what action should be taken when the coding cannot be applied, such as when a question is unanswered. Do they ignore the question, or change and interpret it? Decisions are usually needed on how to handle missing items, or cases in which the respondent did not know the answer or refused to provide one. While providing codes it may also occur to a researcher that an essential question was not asked. There are several possible approaches that a researcher can apply to address these challenges. These include:
- Cross-reference the missing answer with the answers to related questions (this option, ironically, is less available if the researcher has carefully minimized duplication between questions).
- Interpolate from other answers to create a “pattern” for the respondent, and look to see how other respondents of the same “type” answered this question.
- Look at the distribution of answers and interpolate from that; some computer programs will supply distributions of answers to the question and suggest what the missing value ought to be in order to maintain the distribution.
- Give missing data its own code, such as “Did not answer”; this is the most common (and safest) approach.
- Exclude the respondent from the analysis (if the respondent failed to answer a number of questions, or the responses appear unreliable).
- Exclude the question from the analysis (if a significant number of respondents failed to answer it).
However in research, the preferred practice for missing items is to provide special codes indicating why the data was not included. When resources are available, the “filling in” or imputation of these missing data items should be undertaken by the researcher to reduce any biases arising from their absence. This involves going back to the field and filling in the missing information.
c) Deciding on Data Storage
After coding the data, the researcher will have to make a decision about the short and long-term storage of the information generated. The short time storage is necessary before data analysis. The system in which the researcher stores the data will determine (at least in the early stages) what forms of analysis the researcher will carry out and how easy it will be to transfer the data into systems which will do more complicated forms of analysis. There are two major storage forms, the electronic form and non electronic (paper) form.
Paper storage: This is where the coded data is written on paper before the analysis. Paper storage has the following advantages:
- It has a low cost.
- It allows for speedy retrieval.
- It is easy to distribute.
- It is comprehensible.
However, its disadvantages include the following:
- It is not extensible.
- It is fragile.
- It is bulky.
Electronic storage: The advantages of electronic storage include the following:
- It is extensible.
- It is easy to distribute.
- It is easy to interchange options.
- It has low volume.
The disadvantages of electronic storage are:
- Equipment costs are high.
- It has limited access.
- It is fragile.
Today, selecting electronic storage is an increasingly significant decision for a researcher. In electronic storage, the researcher can transfer the data (or information derived from it) into another system.
After deciding on how data will be stored, the researcher has to reflect on the statistical software package that will be relevant in data analysis. When choosing a statistical software package, there are several things a researcher has to consider. These include the following:
- Characteristics of the data to be used; for example, is it descriptive or does it analyze relationships?
- Analyses that will be performed.
- Technical and financial constraints.