Performance evaluation of human cough annotators: optimal metrics and sex differences


Study on reliability of human annotators finds high agreement within as well as between annotators. Using cough seconds reduces discrepancies by 50% vs. counting coughs. Also reveals significant differences in the length and frequency of cough sounds between men and women.

Authors: Isabel Sanchez-Olivieri, Matthew Rudd , Juan Carlos Gabaldon-Figueira, Francisco Carmona-Torre, Jose Luis Del Pozo, Reid Moorsmith, Lola Jover, Mindaugas Galvosas, Peter Small, Simon Grandjean Lapierre, Carlos Chaccour

The paper focuses on evaluating the consistency and accuracy of human annotators in labeling cough sounds, as well as examining sex differences in cough characteristics. The study is important for developing reliable methods to quantify coughs, which is essential for improving cough monitoring systems.

Key aspects of the study include:

  1. Annotation Agreement: The research assesses the level of agreement among human annotators in two ways:
  • Intralabeller Agreement: measures how much an individual annotator agrees with their own labeling across different instances.
  • Interlabeller Agreement: evaluates the consistency between different annotators' labeling of the same cough sounds.
  1. Methodology: The study involved 24 participants who wore an audio-recording smartwatch to capture continuous audio for 6-24 hours. A sample of this audio was labeled twice by an expert annotator and then by six trained annotators. Around 400 hours of audio were collected and 40 hours were analyzed.
  1. Results:
  • The study found a high level of intralabeller agreement (Pearson's correlation 0.98) and interlabeller agreement (Pearson's correlation 0.96).
  • Using "cough seconds" (any one second of time containing at least one cough) as the unit of analysis reduced annotator discrepancies by 50% compared to counting individual coughs.
  • The length of cough sounds and the epoch size (number of coughs per bout) showed significant differences between men and women, suggesting a need for tailored approaches for different sexes.