Likert, and the Mathematical Basis of Scales
It may come as a surprise to some people that there is a considerable body of knowledge that backs up the construction and administration of opinion questionnaires. They look so simple that anybody could do it! This is a topic called psychometrics and it is a bit of a black art. Very often one hears of 'Likert' style in relation to questionnaires in which there are usually five response options. Likert is the name of a person. Rensis Likert (1903-1981) worked for a number of US government agencies in the business of collecting survey information and he spent the early part of his life developing and perfecting a method for obtaining reliable information from respondents. This was published in 1932 entitled A Technique for Measuring Attitudes. His ideas have been extremely influential in the 20th century and look like they will be as influential in the 21st.
What is a Likert-style questionnaire? One with five response choices to each statement, right?
No indeed not. A Likert-style questionnaire is one in which you have been able to prove that each item of the questionnaire has a similar psychological 'weight' in the respondent's mind, and that each item is making a statement about the same construct. Likert scaling is quite tricky to get right, but when you do have it right, you are able to sum the scores on the individual items to yield a questionnaire score that you can interpret as differentiating between shades of opinion from 'completely against' to 'completely for' the construct you are measuring.
It is possible to find questionnaires which seem to display Likert-style properties in which many of the items are simply re-wordings of other items. Such questionnaires may show some fantastic reliability data, but basically they're a cheat because you're just adding in extra items that bulk up the statistics without telling you anything really new.
And of course there are plenty of questionnaires around which are masquerading as Likert-style questionnaires but which have never had their items tested for any of the required Likert properties. Summing item scores of such questionnaires is just nonsense. You should treat such questionnaires as checklists: that is, look at the number of responses to each category for each question separately. You have no proof that the questions belong together, nor that 'agree' on one question has the same meaning to the respondent as 'agree' on another question.
How can I tell if a question belongs to a Likert scale or not?
The essence of a Likert scale is that the scale items, like a shoal of tropical fish, are all of approximately the same size, and are going in the same direction.
People who design Likert scales are concerned about developing a batch of items that all have approximately the same level of importance (size) to the respondent, and are all more or less talking about the same concept (direction), which concept the scale is trying to measure. Designers use various statistical criteria to quantify these two ideas.
To start with, we have to get a bunch of people to fill out the first draft of the questionnaire we are trying to design. We should ideally have about 100 respondents with varied views on the topic we are trying to measure, and certainly, more respondents than questions. We then compute various statistical summaries of this data.
Do the items all have the same level of importance to the respondent? To measure this we look at the reliability coefficient of the questionnaire. If the reliability coefficient is low (near to zero) this means that some of the items may be more important to the respondents than others. If the reliability coefficient is high (near to one) then the items are most probably all of the same psychological 'size.'
Are the items all more or less talking about the same concept? To measure this we look at the statistical correlation between each item and the sum of the rest of the items. This is sometimes called the item-whole correlation. Items which don't correlate well are clearly not part of the scale (going in a different 'direction') and should be thrown out or amended.
It's fascinating to use an interactive statistical package and to watch how reliabilities and item-whole correlations change as you take items in and out of the questionnaire.
A very real risk a developer runs when constructing a scale is that they start to 'model the data.' That is, they take items in and out and they compute their statistics, but their conclusions are only applicable to the sample that evaluated the questionnaire. What the developer must do next is to try the new questionnaire on a fresh sample, and re-compute all the above statistics again. If the statistics hold on the fresh sample, then well and good. If not, then it's back to the drawing board.
Warning: one sometimes sees some very good-looking statistics reported on the basis of analysis of the original sample, without any check on a fresh sample. Take these with a large pinch of salt. The statistics will most probably be a lot less impressive when re-sampled.
When the properties of Likert scales are reported, it is extremely misleading to report the reliabilities etc. on the basis of the original samples which were used to develop it on. You must see how the scale performs on a fresh sample!
Conclusion: how to catch a questionnaire
In general, in answer to the question: is this a real Likert scale or not, the onus is on the person who created the scale to tell you to what extent the above criteria have been met. If you are not getting this level
you are entitled to ask:
- Have you done an advanced course in behavioral statistics and measurement?
- Are you an accredited member of a psychological or allied association?
- How many questionnaires have you developed already? Can I see them?
- Do you publish your results for peer scrutiny?
It is possible that someone may produce brilliant questionnaires who has said 'no' to all of the above. But certainly if this is the case, caveat emptor - let the buyer beware! Your own reputation may be at stake.
Dr Kirakowski is a Senior Lecturer in Applied Psychology and the Director of the Human Factors Research Group (hfrg.ucc.ie). He is the author of the Software Usability Measurement Inventory (sumi.ucc.ie) and the Website Analysis and Measurement Inventory (www.wammi.com).



