Head-to-Head Evaluation of ChatGPT 4o, GPT-5, and DeepSeek for Structured Extraction, Toric IOL Recommendation, and Refractive Prediction
We conducted a single-center, retrospective observational study to evaluate large language models (ChatGPT 4o, GPT-5, DeepSeek) for automated interpretation of de-identified IOLMaster 700 reports provided as raster images. Models produced structured biometric extraction, toric IOL recommendation, and refractive predictions (sphere, cylinder, axis). Primary outcomes included parameter-level agreement and refractive error metrics; secondary outcomes included decision-support performance for toric IOL selection and agreement on ordered T-codes. No clinical intervention was performed.
• postoperative corrected distance visual acuity (CDVA) of 0.10 logMAR or better -an absolute IOL rotational stability of less than 10∘ at the 1-month follow-up examination