Questionnaire Design and Surveys Sampling
Author: Professor Hossein Arsham
The contents of this paper are aimed at those who need to perform basic statistical analyses on data from sample surveys, especially those in marketing science. A basic knowledge of statistics, such as descriptive statistics and the concept of hypothesis testing, is useful.
Questionnaire Design and Surveys Management
When the sampling units are human beings, the main methods of collecting information are:
- face-to-face interviewing
- postal surveys
- telephone surveys
- direct observation
- Internet
Objectives:
- To enable readers to understand the integrated processes of designing and conducting quantitative survey research projects.
- To give readers experience of grappling with problems in the design of survey samples, the construction of data collection instruments and the management of survey projects.
- To make readers aware of main sources of error in the survey process and ways of detecting, controlling and minimizing such error.
White paper Outline:
- The quantitative survey process from project formulation, statistical design and sampling, through instrument design and question formulation, to data processing.
- Basic principles and practice of probability sample design for field surveys.
- How to operationalise concepts, word questions and design, develop and test survey instruments, taking account of intended uses of the data collected.
- Principles of manual coding and editing of survey data, computer editing and preparing data for analysis.
- Sources of error in survey data, ways of assessing them and ways of minimizing error.
- Planning and management of large scale surveys, piloting and pre testing, relations with stakeholders in the sponsored survey process, issues in survey ethics.
The main questions are:
- What is the purpose of the survey?
- What kinds of questions the survey would be developed to answer?
- What sorts of actions is the company considering based on the results of the survey?
Step 1:
Planning Questionnaire Researchz
- Consider the advantages and disadvantages of using questionnaires.
- Prepare written objectives for the research.
- Have your objectives reviewed by others.
- Review the literature related to the objectives.
- Determine the feasibility of administering your questionnaire to the population of interest.
- Prepare a time-line.
Step 2:
Conducting Item Try-Outs and an Item Analysis
- Have your items reviewed by others.
- Conduct "think-aloud" with several people.
- Carefully select individuals for think-aloud.
- Consider asking about 10 individuals to write detailed responses on a draft of your questionnaire.
Ask some respondents to respond to the questionnaire for an item analysis. In the first stage of an item analysis, tally the number of respondents who selected each choice.
In the second stage of an item analysis, compare the responses of high and low groups on individual items.
Step 3:
Preparing a Questionnaire for Administration
- Write a descriptive title for the questionnaire.
- Write an introduction to the questionnaire.
- Group the items by content, and provide a subtitle for each group.
- Within each group of items, place items with the same format together.
- At the end of the questionnaire, indicate what respondents should do next.
- Prepare an informed consent form, if needed.
- If the questionnaire will be mailed to respondents, avoid having your correspondence look like junk mail.
- If the questionnaire will be mailed, consider including a token reward.
- If the questionnaire will be mailed, write a follow-up letter.
- If the questionnaire will be administered in person, consider preparing written instructions for the administrator.
Step 4:
Selecting a Sample of Respondents
- Identify the accessible population.
- Avoid using samples of convenience.
- Simple random sampling is a desirable method of sampling.
- Systematic sampling is an acceptable method of sampling.
- Stratification may reduce sampling errors.
- Consider using random cluster sampling when every member of a population belongs to a group.
- Consider using multistage sampling to select respondents from large populations.
- Consider the importance of getting precise results when determining sample size. Remember that using a large sample does not compensate for a bias in sampling.
- Consider sampling non respondents to get information on the nature of a bias.
- The bias in the mean is the difference of the population means for respondents and non respondents multiplied by the population nonresponse rate.
Step 5:
Preparing Statistical Tables and Figures
- Prepare a table of frequencies.
- Consider calculating percentages and arranging them in a table with the frequencies.
- For nominal data, consider constructing a bar graph.
- Consider preparing a histogram to display a distribution of scores.
- Consider preparing polygons if distributions of scores are to be compared.
Step 6:
Describing Averages and Variability
- Use the median as the average for ordinal data.
- Consider using the mean as the average for equal interval data.
- Use the median as the average for highly skewed, equal interval data.
- Use the range very sparingly as the measure of variability.
- If the median has been selected as the average, use the interquartile range as the measure of variability.
- If the mean has been selected as the average, use the standard deviation as the measure of variability.
- Keep in mind that the standard deviation has a special relationship to the normal curve that helps in its interpretation.
- For moderately asymmetrical distributions the mode, median and mean satisfy the formula:
mode=3*median-2*mean.
Step 7:
Describing Relationships
- For the relationship between two nominal variables, prepare a contingency table.
- When groups have unequal numbers of respondents, include percentages in contingency tables.
- For the relationship between two equal interval variables, compute a correlation coefficient.
- Interpret a Pearson r using the coefficient of determination.
- For the relationship between a nominal variable and an equal interval variable, examine differences among averages.
Step 8:
Estimating Margins of Error
- It is extremely difficult, and often impossible, to evaluate the effects of a bias in sampling.
- When evaluating a percentage, consider the standard error of a percentage.
- When evaluating a mean, consider the standard error of the mean.
- When evaluating a median, consider the standard error of the median.
- Consider building confidence intervals, especially when comparing two or more groups
Step 9:
Writing Reports of Questionnaire Research
- In an informal report, variations in the organization of the report are permitted.
- Academic reports should begin with a formal introduction that cites literature.
- The second section of academic reports should describe the research methods.
- The third section of academic reports should describe the results.
- The last section of academic reports should be a discussion. Acknowledge any weakness in your research methodology.
Missing Values on a Sensitive Topic
A natural way to get answers is to, as much as possible, assure people that the surveys are anonymous, and to find a way to make the respondent at least minimally comfortable. So, according to US General Accounting Office book, "Developing and Using Questionnaires" (Oct 1983) chapter 9, you should do the following:
- Explain to respondent the reasons for asking the questions.
- Make response categories as broad as possible.
- Word the question in a nonjudgmental style that avoids the appearance of censure, or, if possible, make the behavior in question appear to be socially acceptable.
- Present the request as factual matter as possible.
- Guarantee confidentiality or anonymity.
- Make sure the respondent knows the info will not be used in any threatening way.
- Explain how the info will be handled.
Avoid cross classification that will allow for pinpointing responses.
Source of Errors
- The use of an inadequate frame.
- A poorly designed questionnaire.
- Recording and measurement errors.
- Non-response problems.
For example consider the following question: "Over the last twelve months would you say your health has on the whole been: Good? / Fairly good? / Not good?" The respondent is required to tick one of 3 thus-labeled boxes. What is wrong with the following:
It is the ONLY question on the form, which asks about a matter of opinion rather than fact, but this distinction is not in any way represented in its layout or wording.
Whereas for a question about opinion there should be a response option of 'Don't Know' this is not provided. In some cases, such as the Census Form and the Census advisory staff are adamant that the question must be answered. Thus a person with no opinion on the matter is in a quandary and threatened with possible legal action.
This particular question is highly ambiguous as regards the qualitative nature of what is being asked about (your health). Is one to respond in terms of how one feels, how one can perform, comparisons with peer groups, comparisons with other periods of one's life, or what?
Sample Size in Surveys Sampling
People sometimes ask me, what fraction of the population do you need? I answer, "It's irrelevant; accuracy is determined by sample size alone" This answer has to be modified if the sample is a sizable fraction of the population.
For an item scored 0/1 for no/yes, the standard deviation of the item scores is given by SD = (p(1-p)/N)1/2 where p is the proportion obtaining a score of 1, and N is the sample size.
The standard error of estimate SE (the standard deviation of the range of possible p values based on your sample estimate) is given by SE= SD/N1/2. Thus, SE is at a maximum when p = 0.5. Thus the worst case scenario occurs when 50% agree, 50% disagree.
The sample size, N, can then be expressed as largest integer less than or equal to 0.25/SE2
Thus, for SE to be 0.01 (i.e. 1%), a sample size of 2500 would be needed; 2%, 625; 3%, 278; 4%, 156, 5%, 100.
Note, incidentally, that as long as the sample is a small fraction of the total population, the actual size of the population is entirely irrelevant for the purposes of this calculation.
Sample sizes with regard to binary data:
n = [t2 N p(1-p)] / [t2 p(1-p) + a2 (N-1)]
with N being the size of the total number of cases, n being the sample size, a the expected error, t being the value taken from the t distribution corresponding to a certain confidence interval, and p being the probability of an event.
There are several formulas for the sample size needed for a t-test. The simplest one is
n = 2(Za+Zb)2s2/D2
which underestimates the sample size, but is reasonable for large sample sizes. A less inaccurate formula replaces the Z values with t values, and requires iteration, since the df for the t distribution depends on the sample size. The accurate formula uses a non-central t distribution and it also requires iteration.
The simplest approximation in your case is to replace the first Z value in the above formula with the value from the studentized range statistic that is used to derive Tukey's follow-up test. If you don't have sufficiently detailed tables of the studentized range, you can approximate the Tukey follow-up test using a Bonferroni correction. That is, change the first Z value to Za where k is the number of comparisons.
Neither of these solutions is exact. I suspect that the exact solution is a bit messy. But either of the above approaches is probably close enough, especially if the resulting sample size is larger than (say) 30.
A better stopping rule for conventional statistical tests is as follows:
Test some minimum (pre-determined) number of subjects.
Stop if p-value is equal to or less than .01, or p-value equal to or greater than .36; otherwise, run more subjects.
Obviously, another option is to stop if/when the number of subjects becomes too great for the effect to be of practical interest. This procedure maintains a about 0.05.
We may categorize probability proportion to size (PPS) sampling, stratification, and ratio estimation (or any other form of model assisted estimation) as tools that protect one from the results of a very unlucky sample. The first two (PPS sampling and stratification) do this by manipulation of the sampling plan (with PPS sampling conceptually a limiting case of stratification). Model assisted estimation methods such as ratio estimation serve the same purpose by introduction of ancillary information into the estimation procedure. Which tools are preferable depends, as others have said, on costs, availability of information that allows use of these tools, and the potential payoffs (none of these will help much if the stratification/PPS/ratio estimation variable is not well correlated with the response variable of interest).
Therefore, you must use whatever tools are at your disposal that would improve your estimates at feasible costs.
There are also heuristic methods for determination of sample size. For example, in healthcare behavior and process measurement sampling criteria are designed for a 95% CI of 10 percentage points around a population mean of 0.50; There is a heuristic rule: "If the number of individuals in the target population is smaller than 50 per month, systems do not use sampling procedures but, attempt to collect data from all individuals in the target population.", visit e.g.
Dr. Hossein Arsham, Professor in Decision Science, Simulation, and Statistics (University of Baltimore), is the Fellow of the Operational Research Society, the Fellow of The Royal Statistical Society, the Fellow of The World Innovation Foundation: Scientific Discovery.



