Scales are overused in survey resarch. They introduce noise and ambiquity. What do we lose out on if we get rid of them?
Poor surveys ask questions that cannot be answered and worse still, provide unhelpful answers to those that can. Both are crimes of laziness and the accomplice is the scale. If in doubt, stick a Likert alongside the vaguest of questions and it will appear half decent. This is our first clue.
Empirically validated scales are seen to good effect across the social sciences – large question banks can search out nuanced aspects of abstract concepts. Classical measurement theory at its finest. Like tentacles, they reach out into the gloom, working in concert to capture their prize. This comprehensive ‘scale heavy’ approach however, along with its associated statistical techniques, is inappropriate for commercial research.
There are four reasons why:
- Most of us struggle to faithfully articulate an idea on a lone scale
- Individuals and cultures engage with scales differently
- In practice, we do not make differentiated use of scales
- Similarly, scales invite straight lining.
This finds Likert and semantic differential scales particularly vulnerable to differing styles of response. These styles vary across demographic and culture, masquerading as true differences in attitude and opinion.* Even on a good day, extreme response style (ERS) renders cross-national comparisons invalid. While there are a handful of approaches capable of salving its effects, sadly, there is no cure.
Academics aside, rating higher or lower on a scale is still interesting. Why not allow more flexibility for respondents to express themselves? Intuitively, this is a good idea but, given the above, at what cost? Are we truly left with increased sensitivity? There is no forcing the issue either, less we hazard asking respondents to distinguish between artificial increments which do not naturally occur to them or do not exist at all. Regardless, once aggregated, respondents’ inability to navigate arbitrary points on scale after scale, in the way we hope, translates to noise within the data.
A ‘Yes’, ‘No’ or ’Neither’ is a crude fall back, a cop out even, that risks coercing an answer when a respondent is on the cusp – however, while uncomfortable, arguably this too has real-world merit. Considered against the opportunity for the overwhelming majority to answer quickly, clearly and easily, together with stronger response rates and better representation, simplicity outstrips any potential refinement that scales promise. Less noise allows for a lower sample size and shorter surveys are cheaper still.
If your budget has limits, then sacrificing scales is hardly controversial. There are only upsides to dropping the obligatory nasal preamble and focusing minds on the lowest common denominator. Cast in this light, a scale is counterproductive, certainly if the same figures were sought retrospectively, say by adding up its top two boxes.
In this, a rating of ‘9’ raises the question: “What shall we do with an 8?” Does this impact decision-making? Should it? Then what’s the point of it? Any combination of answers to these questions are our final clue. That said, ratings scales such as these are indispensable when testing subjective perceptions like quality, taste or aesthetics, especially when relative scoring isn’t practical. However, when behaviourally anchored or matter-of-fact diagnostics are plausible alternatives, insistence on scalar evaluations ensures boring, nebulous results.
By way of example, what does the C-suite make of ‘Invest to move the queue evaluation from a 7.2 to an 8.5 out of 10 in order to improve loyalty’ or ‘Improve the top box score from 70% agreement to 80%...’? Instead, a description allows: ‘Reduce perceived waiting time from 15 mins to 10 mins in order to drive…’ An evaluation returns something merely directional, if not obvious, while a description hands us the performance target.
Similarly, in the area of product and service development, trade-offs provoke decisions, revealing hierarchies of appeal, sharper delineations and higher predictive power. Hard choices traverse culture and time. Likert scales and their ilk do not.
Unless all respondents can effortlessly imagine their answer against a slide rule, then a scale must be the instrument of last resort. Further, scales are a crutch to avoid divisive questioning and the price we pay is too dear. Over-reliance on scales only increases the cost of doing research while reducing our ability to act upon it. So, on a scale from 1 to 5, where an answer of 5 means you agree strongly, and 4 means you agree slightly and… Spare me.
Reference:
*De Jong MG., Steenkamp JEM, Fox JP & Baumgartner H ( 2008 ) Using Item Response Theory to Measure Extreme Response Style. Marketing Research: A Global Investigation. Journal of Marketing Research 45( 1 ): 104-115.
I am a marketing scientist with 24 years of experience working with sales, media spend, customer, web & survey data. I help brands and insight agencies around the world get the most out of data, by combining traditional statistics with the latest innovations in data science. Follow me on Linkedin for more of this sort of thing.