All products should be thoroughly tested prior to distribution. Totally new to market products or those to revise formulation should be properly tested. It also can pay dividends to periodicly retesting established products, particularly vs new competitive introductions or revised/improved competitive formulas to assure your product represents the highest quality in its category.
A ‘control’ product item should be built into the design of every product test for relative measurement. This is to recognize that consumers’ stated attitudes in a research setting are not an absolute indicator of actual marketplace behavior. Therefore a standard of comparison or control is selected for comparison to its ratio of test score to in-market performance. Start by thinking about an appropriate control or comparison product.
If your are testing a new product, the control should be from the anticipated competitive target; perhaps the leading competitive brand in the category. For totally new category products there is a real challenge for a viable control. So you might select several controls from categories with the closest characteristics you can find; similar channels of distribution, ballpark price levels, etc.
When testing a new formulation of an existing product, comparison vs the current formulation is logical. You can include additional controls if beneficial to the product evaluation.
Longitudinal control approaches are possible. Here you compare product test results to prior tests in the same category. You can do this with monadic test results or monadic sections of paired comparisons. The problme is that markets change, competitive formulas change, consumer attitudes change. So be careful to consider how recent the previous test was done and if conditions are comparable in terms of new entries, pricing, and positionings shifts. It is best to simply build a control into every test and just use the prior test findings as an additional backdrop. Plus do keep in mind the basics of statistical error ranges in comparing differences in findings.
The other thing one needs to be aware of in comparison across controls measured verse in-market actual performance and that is the rest of the marketing mix beyond ‘product’. Especially important is that one product may have substantially differnt media support and thus simple awareness may drive wide differences in performance. If the involved product category under test is one that has a relatively short purchase cycle, it may make more sense to relate your product test findings to repeat rates and to total sales performance.
Regardless of technical testing methodology, the basic assumption of product testing is that repeat purchase is directly related to the consumer’s reaction to the product under conditions of normal usage conditions. But keep in mind that there are a ton of marketplace factors that influence consumer choice and purchasing; media exposures, persuasive copy, ACV distribution, shelf locations, sampling, couponing, trial sizes, pack sizes, pricing promotions, etc. So you can’t really use of product test results alone as a direct predictor of sales volumes; see more in Volume Modeling. That said, consumer input via product testing provides valuable guidance for improving products. Think of product testing as a step on the path to producing products with greater consumer acceptance and, if the other elements of the marketing mix are working adequately, with greater sales and profits. Product testing itself is not precisely predictive of volume, but is a key input to combine with the other market mix factors that impact sales performance. ARMTEC has helped organizations build simulated test market capabilities that allow scenario optimization in the context of their own organizational realities.
It is probably obvious, but you should logically test the products against the segment or target who would most likely buy it. You might do some preliminary concept testing against a broader audience with some well developed profiling questions (e.g. demographics, list source variables, etc.) to confirm those with interest. And you use that information as part of your segmentation targeting profile. Even product testing might be done against a slightly enlarged potential buyer group to provide better comparisons with other brands and to illustrate the potential for the product outside of too much of a niche market target.
Especially among inventors or founders it is pretty common to lean heavily into ‘features’ rather than a whole product. But think of products are “bundles” and your goal with a product test of evaluating a whole. As we discussed under Value Laddering, the individual characteristics/features/attributes interacting with each other to produce an overall product experience. In designing your product test you need to recognize that consumers often cannot tell you specifically what they want; but they can tell you what they like or dislike after trying a variation. So it is logical to include feature acknowledgement and reaction as a diagnostic. This is parallel in thought to the fact that lower level attributes can play a critical role in supporting a “reason to believe” in advertising even though they aren’t the main message that invokes interest and purchase intent. For attributes demonstrated from a whole product test as principle drivers it is fair to not only include diagnotics internal to the product test but to even develop separate tests to further optimize that attribute. And example might be a “sensory testing” asking consumers to scale “how much stronger, sweeter, softer” a product should be. The “best” product on a blind basis may not be the best when branded, packaged, labeled, … So don’t let your engineering level detail tests get interpreted as representing the ‘whole’. Brand name and positioning are part of the total product bundle. A final version should always be tested in the context of a whole product.
Product tests may be conducted in a “blind” (unidentified) or “branded” (identified) fashion. Blind tests are conducted when the objective is to obtain product diagnostics in the absence Of a brand name’s influence (usually for R&D purposes). Branded product tests are more typically conducted in image-oriented categories (e.g., fragrances) where a brand name may drive consumer reaction to a greater extent than would some product attributes. As a general rule, products should be tested both on a blind and identified basis.
Respondents are given one and only one product to try and asked to use it for a predesignated period of time (for that ‘control’ you use separate samples to evaluate each product). Following a testing period, the interviewer contacts the respondents to ask overall rating, purchase interest, likes, dislikes, etc.
Having a respondent evaluate just the one product is “real world” in the sense that is how they would logically use the product in practice.
You would have to be a deep pockets company to get enough sample size with the monadic design to be sensitive enough. Check your statistic text book on confidence intervals and sample size where you will find that you really would have to have some huge spread to be 99% sure of the winner. The good news is that major product differences translate to significant mean score differences. You can likely use the diagnostics as well. But give some consideration to whether you are trying to do a one and done test; is so use another method but if this is part of a series of steps you will gain some insights.
Like the name says, respondents try two products. The products can be placed for paired comparison two different ways:
Note: In a ‘triangle test’ methodology respondents sample three products, two of which are identical, and is then asked to select the one which is different and preferred.
Small differences tend to be magnified in a paired-comparison. Less sample size compared to the monadic study for differences to be statistically significant at a desired level.
Less “real world,” because consumers don’t typically try two products side by side. Because the design is preference oriented some feel the diagnostic information isn’t as meaningful because it is sort of better on this given feature then that other terrible product but no indication if it still is really at the ‘it sucks’ level.
Think of this as doing two monadic product tests one after each other. The two products being compared are given in different order to different respondents. The line of questioning for ratings, likes, dislikes, etc. are done after the use of each individual product; but preference questions aren’t asked until doing the complete second product and its line it questions. Only then is the repondent ask which they prefer overall and with regard to specific attributes. You can see that the lag:recent impact of preference could be an issue and that is why you rotate which is given first to different respondents.
The approach combines the “best” of both worlds by providing both monadic readings and preferences without jeopardizing either measure.
The fact you have to do a second callback interview increases costs a little but it should achieve the same statistical power of a paired comparison with the same sample size, so that doesn’t increase cost.
This approach is often considered the most real world. By introducing a Concept and Concept Test you exposed respondents to a concept stimulus (usually in print ad format) which describes the product. Respondents are asked purchase interest in the concept and a series of diagnostic questions. They are next given one product to try over the test period. After using the test product in-home, respondents are recontacted and asked a series of diagnostic questions regarding product performance and concept fulfillment. Some only place product in the second phase with those who answered favorably in the top boxes of purchase intent during the concept test. But other also be place product with respondents who were neutral or negative to the concept because they include additional diagnostics among those who expressed a favorable product reaction what they should have been told in the concept but hadn’t heard which would have made them interested. Note: Concepts should be successfully tested before the initiation of concept/product testing.
“Real world” in the sense product is evaluated by those favorably disposed toward the concept; believed most likely to be motivated to try the product after exposure to advertising.
Helps answer if the product that the consumers would purchase is consistent with concept established expectations.
Good foundation inputs to be modeled, along with marketing assumptions, into rough business value.
If you place the product among those who were not favorably to the concept but liked the product itself you can strengthening of the concept to expand the potential sales base.
Doesn’t provide a “clean” evaluation of the product alone.
The absence of a competitive framework.
Same disadvantages as stated for monadic designs..