Primary and Secondary Data

Primary data is data that you collect yourself using such methods as:

  • direct observation – lets you focus on details of importance to you; lets you see a system in real rather than theoretical use (other faults are unlikely or trivial in theory but quite real and annoying in practice)
  • surveys – written surveys let you collect considerable quantities of detailed data. You have to either trust the honesty of the people surveyed or build in self-verifying questions (e.g. questions 9 and 24 ask basically the same thing but using different words – different answers may indicate the surveyed person is being inconsistent, dishonest or inattentive).
  • interviews – slow, expensive, and they take people away from their regular jobs, but they allow in-depth questioning and follow-up questions. They also show non-verbal communication such as face-pulling, fidgeting, shrugging, hand gestures, sarcastic
    expressions that add further meaning to spoken words. e.g. “I think it’s a GREAT system”
could mean vastly different things depending on whether the person was sneering at the time! A problem with interviews is that people might say what they think the interviewer wants to hear; they might avoid being honestly critical in case their jobs or reputation might suffer.

  • logs (e.g. fault logs, error logs, complaint logs, transaction logs). Good, empirical, objective data sources (usually, if they are used well). Can yield lots of valuable data about system performance over time under different conditions. Primary data can be relied on because you know where it came from and what was done to it. It’s like cooking something yourself. You know what went into it.
    Secondary data is collected from external sources such as:
    TV, radio, internet
    magazines, newspapers
    research articles
    stories told by people you know

There’s a lot more secondary data than primary data, and secondary data is a whole lot cheaper and easier to acquire than primary data. The problem is that often the reliability, accuracy and integrity of the data is uncertain. Who collected it? Can they be trusted? Did they do any preprocessing of the data? Is it biased? How old is it? Where was it collected? Can the data be verified, or does it have to be taken on faith?

Often secondary data has been pre-processed to give totals or averages and the original details are lost so you can’t verify it by replicating the methods used by the original data collectors. In short, primary data is expensive and difficult to acquire, but it’s trustworthy. Secondary data is cheap and easy to collect, but must be treated with caution.

