Skip to main content

Interobserver Variability

All information on Interobserver Variability

At a glance

Wherever people work, mistakes also happen. Be it through subjectivities, through poor guidance or through small mishaps. Urinalysis by means of rapid tests is also susceptible to these small, but possibly influential errors.

Although it is estimated that over two billion urine tests are performed per year, there is relatively little scientific research on the quality of the implementation and the influence of the human factor on it. A big problem is that the evaluation is predominantly subjectively visual.

Further information

In terms of conducting urine rapid tests, Bell et al. in their two-part observational study, for example, nursing assistants had a relatively high number of false-positive tests compared to midwives (47 vs. 17) when conducting tests for proteinuria. In a sequential study, untrained laboratory staff also achieved a false-positive rate of 35. A subsequent training in the methodology and interpretation of the results led to a significant improvement in accuracy in laboratory staff (false-positive rate of 5). In addition to the lack of training, an identified problem was an “up-rounding problem” due to the low bandwidth of the few test fields on the strips. However, in principle, a relatively high inaccuracy remained in the false-negative results, which were probably due to the test strips.

According to the authors, these residual inaccuracies cannot be influenced by training or automation, but by a check of the threshold sensitivities of the test strips available at the time.

A study on glucose testing clearly showed that measurement errors were caused mainly by the rough classification of the measuring ranges into only five fields. The 21.7 deviation from laboratory diagnostics occurred significantly in the middle measuring range, where the values were both over- and underestimated.

The rough classification of the test fields and the accompanying over- and underestimation (partly by two “blocks”) was also the result of the Rumley study in 2000. Saudan et al. In their study, they found that automated urinalysis improved the percentage of real positive urine analyses from 48 with visual evaluation to 74.

In summary, it can be said that a rough classification of the test strips coupled with the human factor often lead to errors. The often inexperienced investigators are subject to very high demands, which can lead to decisions of far-reaching importance.

Smaller subclasses or a continuous mode for results documentation could provide better quantitative analysis. Automated reading of the test results is also an important factor.


  • Bell et al. (1999): “The’ role of observer error in antenatal dipstick proteinuria analysis”, British Journal of Obstetrics and Gynaecology, Vol.106, 1177-1180
  • Bekhof, J. et al. (2011): “Validity and interobserver agreement of reagent strips for measurement of glucosuria”, J. Clin. Lab. Invest.Vol. 71, No. 3, 248-252
  • Rumley (2000): “Urine dipstick testing: comparison of results obtained by visual reading and with the Bayer CLINITEK 50”, Clin. Biochem.Vol. 37, No. 2, 220-221
  • Saudan, P. J. et al. (1997): “Improved methods of assessing proteinuria in hypertensive pregnancy”, BJOG Int. J. Obstet. Vol. 104, No. 10, 1159-1164
  • European Confederation of Laboratory Medicine (2000) “European urinalysis guidelines”, J. Clin. Lab. Investig. Suppl.Vol. 231, 1–86
Status of information: 2022