Calibration alone won’t produce quality measurements; even when perfectly accurate, a gage can be unacceptable for measuring a product or process. That’s where Gage R&R studies come in. Nearly every manufacturing company is required to conduct these studies, the many benefits gleaned from which range from product acceptance measurements and reliable control charts to objective equipment selection criteria and identifying the need for operator training. More importantly, however, this single study may be the only chance to find measurement quality problems. Today’s software makes Gage R&R data entry and calculation easy, but learning to effectively interpret the results is imperative.


The Basics
The method is to measure variables of production measuring processes. The primary variables are repeatability and reproducibility (R&R). The purpose is to confirm that variation is not excessive or to take action if variation is found to be excessive. This is required for each production measuring process, but not for each gage – you might have 3,000 gages and only 200 production measuring processes.


Who does it?
Choose three people who do the measurements in production. These might be production people, quality inspectors or lab technicians, depending on the situation. (During the try-out phase of a new part, you may have to use substitutes for the people who will actually do the measurements in the future.)

It doesn’t matter who collects the data. A calibration technician would often be a good choice. He or she would serve as a resource to answer questions and would have access to gage R&R software.


What do they have to know?
The people who do the measurement have to know how to measure, of course. They also need to take precautions to make sure they never know which part they are measuring. The parts have to be temporarily numbered in a way in which they can avoid knowing which number is being measured. Operators can often influence a gage to a considerable extent. Just knowing what to expect will tend to reduce the variation. This can happen even if an operator is consciously trying not to do so.


Setting up the Study
Normal sample sizes are 10 parts, 3 operators and 3 trials for a total of 90 measurements. Smaller sample sizes can be used if there is a reason. For example, you have only 8 parts and 2 operators or the trials are very expensive.


Collecting the Data
The person collecting the data should present the parts in random order, but record the measurements according to the temporary part number. In Figure 1, data for a typical gage R&R study has been entered into GAGEtrak software.

Data for Gauge R&R Study


Choosing a Calculation Method
There are three ways to calculate gage R&R results. The most familiar method is not the best. The familiar method is called “average and range,” or “long AIAG.” This method is intended for spread sheets or pocket calculators, but it is not recommended for professional software. The average and range method assumes that an error term called “appraiser × part interaction” equals zero. If this assumption is not true (and it sometimes isn’t), then the calculations will not be reliable. A second method is called “range,” or “short AIAG.” It is reserved for special situations. This article will use the work-horse method called “ANOVA” which stands for analysis of variance. When using computer software, we should typically choose ANOVA.


Evaluating the Results
Figure 2 evaluates the results in two different ways. The % of Tol column evaluates the measurement process in terms of capability to determine whether parts meet tolerance. GRR% of Tol = 13.5% which is “fairly good.” GRR is the combined uncertainty (i.e., variation) including repeatability on production parts, reproducibility and appraiser × part interaction. GRR is summed by a special method called RSS (root sum square). The individual variables are described following Figure 2.

The % of TV column evaluates the measurement process in terms of capability to detect changes in total variation (TV, an estimate of process variation). GRR% of TV = 32.2% which is not acceptable.

Therefore, if we need a gage to use for experiments to reduce process variation we should choose a different gage for that purpose. If we need a gage only to determine whether parts meet tolerance, this gage will likely be adequate.

Figure 2: ANOVA Calculation Results for Figure 1


Description of Variables
Repeatability: Variation that is observed when one or more operators repeat the same measurement, on the same part and characteristic, using the same gage. This particular measure of variation does not distinguish between operators. Repeatability is not always influenced by human (operator) variation. To see whether human variation may be a repeatability issue, view the software’s “repeatability range control chart.”

Reproducibility: Additional variation that is observed when multiple operators are unable to reproduce the same test-group average within limits predicted by repeatability.

Appraiser × Part Interaction: Additional variation that is observed when multiple operators are unable to reproduce the same pattern of part variation within limits predicted by repeatability.

Part-to-part: Either the actual variation (% TV column), or the allowable variation (% Tol column), in the test parts the gage is trying to measure.

Figure 3: Gauge Capability Measures


Interpreting Gage Capability Measures
A few companies prefer to use “number of distinct categories” (ndc) instead of GRR%. We can visualize ndc as “categories” in an imaginary histogram. As GRR% gets smaller, the categories also get smaller and there is room for more categories. Using ndc will make no difference to acceptance decisions, with one potential exception: Users of ndc may choose to define “not acceptable” as ndc less than 5 categories. In that case, the corresponding rejection value would be GRR% more than 27%.

Figure 4 shows an example of visualizing gage capability measures with an imaginary histogram. In this example there are 9 categories associated with the tolerance, and 3 categories associated with TV (total variation). We can see that we have a pretty good idea whether the measurements do, or don’t, meet tolerance. We can also see that 3 categories give us only a very crude picture of the process variation. (But, the gage couldn’t reliably support smaller categories.)

Figure 4: Example "Imaginary" Histogram for ndc and GRR%

We can also represent GRR% of tolerance on our imaginary histogram as two “zones of doubt,” each centered on a specification limit. If the measurements should drift into one of these danger zones, sometimes the measured value and the “true value” would be on opposite sides of the limit, causing a wrong decision. For 9 categories, the corresponding value of GRR% of tolerance is 15.5% for each zone.


How do Gage Capability Measures Work?
There are three variables. One variable is GRR which, of course, is a combination of variables. The other two variables are part variation (PV) and total variation (TV). TV is usually an estimate of process variation, or tolerance / 6, depending on the purpose of the gage.

These variables are related by the formula:


This relationship can be modeled as three sides of a right triangle:

Figure 5: Model of Gauge Capability Relationship

Gage capability can be measured by the ratio of any two sides of the triangle, or the ratio of the squares of any two sides. The most popular ratio is a smaller-is-better ratio:
GRR% = 100 ( GRR / TV )

The second most popular ratio is a larger-is-better ratio:
number of distinct categories = 1.41 ( PV / GRR )

You could use either one, or both, according to preference.


Illustration of Reproducibility and Appraiser × Part Interaction
In Figure 2, appraiser × part interaction is 0.0%, meaning the pattern of part variation, as measured by each operator, is approximately the same.

Figure 6 shows what this looks like. Reproducibility is also 0.0%, meaning the three patterns have approximately the same average. How close the averages have to be is determined by repeatability.

Figure 6: Negligible Variation from Reproducibility or Interaction

By contrast, Figure 7 has both variation from interaction and variation from reproducibility. See Figure 8 for the corresponding calculated results.

Figure 7: Different Study with Variation from Reproducibility and Interaction


Figure 8: ANOVA Calculation Results for Figure 7


Software used for this article is GAGEtrak Calibration Management Software furnished by CyberMetrics, Phoenix, Arizona.


About The Author
Gary Phillips has been in the quality field for nearly 50 years. Previously with GM’s Cadillac division, Gary has now been a consultant for over 30 years and has trained well over 20,000 people worldwide, primarily in technical subjects related to quality and reliability engineering, such as designed experiments, engineering testing, statistical process control and measurement systems analysis.


Submit a Comment

Your email address will not be published. Required fields are marked *