Reliability and responsiveness of endoscopic disease activity assessment in eosinophilic esophagitis.

Journal: Gastrointestinal Endoscopy

Background and

Aims: Endoscopic outcomes have become important measures of eosinophilic esophagitis (EoE) disease activity, including as an endpoint in randomized controlled trials (RCTs). We evaluated the operating properties of endoscopic measures for use in EoE RCTs.

Methods: Modified Research and Development/University of California Los Angeles appropriateness methods and a panel of 15 international EoE experts identified endoscopic items and definitions with face validity that were used in a 2-round voting process to define simplified (all items graded as absent or present) and expanded versions (additional grades for edema, furrows, and/or exudates) of the EoE Endoscopic Reference Score (EREFS). Inter- and intrarater reliability of these instruments (expressed as intraclass correlation coefficients [ICC]) were evaluated using paired endoscopy video assessments of 2 blinded central readers in patients before and after 8 weeks of proton pump inhibitors, swallowed topical corticosteroids, or dietary elimination. Responsiveness was measured using the standardized effect size (SES).

Results: The appropriateness of 41 statements relevant to EoE endoscopic activity (endoscopic items, item definitions and grading, and other considerations relevant for endoscopy) was considered. The original and expanded EREFS demonstrated moderate-to-substantial inter-rater reliability (ICCs of .472-.736 and .469-.763, respectively) and moderate-to-almost perfect intrarater reliability (ICCs of .580-.828 and .581-.828, respectively). Strictures were least reliably assessed (ICC, .072-.385). The original EREFS was highly responsive (SES, 1.126 [95% confidence interval {CI}, .757-1.534]), although both expanded versions of EREFS, scored based on worst affected area, were numerically most responsive to treatment (expanded furrows: SES, 1.229 [95% CI, .858-1.643]; all items expanded: SES, 1.252 [95% CI, .880-1.667]). The EREFS and its modifications were not more reliably scored by segment and also not more responsive when proximal and distal EREFSs were summed.

Conclusions: EREFS and its modifications were reliable and responsive, and the original or expanded versions of the EREFS may be preferred in RCTs. Disease activity scored based on the worst affected area optimizes reliability and responsiveness.