FormalPara Key Summary Points

The recent COVID-19 pandemic underlined the need for high-quality remote care. This systematic review gives a current overview of digital tools for remotely assessing visual function and reports on their accuracy.

The mean differences between the included digital tools and conventional clinical charts are low, suggesting clinical equivalence.

The 95% limits of agreement (95% LoA) identify a lower precision of digital self-assessments, though this efffect diminishes in individuals with better visual acuities.

There is great potential for remote self-assessments to increase access to eye care, which is acutely relevant in the current COVID-19 pandemic.

Digital Features

This article is published with digital features, including a summary slide, to facilitate understanding of the article. To view digital features for this article go to https://doi.org/10.6084/m9.figshare.14731944.

Introduction

Digital tools are of great value for enhancing access to healthcare. In ophthalmology, numerous tools for self-assessing visual function have been developed over the last decade. These tools enable laypersons to self-measure aspects of visual acuity (VA) at home, or at school, using applications on smartphones, tablets and/or computers.

VA testing is the most commonly performed examination of visual function [1]. In clinical practice, it is most commonly carried out using a Snellen chart, which utilizes black letters or symbols (optotypes) of a range of sizes set on a white chart [2]. This chart was developed in 1862 and has been globally adopted as the standard VA test, despite some considerations regarding its design [3, 4]. The most important issues with this chart are the irregular progression of the size of the letters. Alternative charts have been introduced. LogMAR charts are the standard method in research as they are considered the most accurate [2]. The Snellen chart retains its popularity in clinical practice owing to familiarity, cost, smaller chart size and, most importantly, the short time taken to perform the test [5]. Notwithstanding, traditional VA testing with Snellen or logMAR charts requires the patient to physically attend a clinic. Figure 1 shows a visualization of the Snellen chart (left) and the alternative ETDRS chart, a well-established logMAR chart (right).

Fig. 1
figure 1

Left: Snellen visual acuity chart; Right: logMAR visual acuity chart (ETDRS). Images are not-to-scale

An urgent need for high-quality remote care was brought about during the COVID-19 pandemic in 2020, when most hospital care was globally suspended or postponed. Digital tools for self-assessment of VA increase access to eye care and avoid the burden for patients in quarantine, with poor mobility or without means of transportation.

A plethora of digital tools are available on the internet and in mobile app stores, which impedes choosing which tools are the most effective and reliable [6]. Before a digital self-testing tool can successfully be used in hospital care, extensive validation research and certification is needed [7]. The aim of this literature review is to provide an overview of the available scientific evidence for remote testing of visual function, and to critically appraise and report on the validity, quality and effectiveness of the available tools.

Methods

Protocol and Registration

A review protocol was developed on the basis of the Preferred Reporting Items of Systematic Reviews and Meta-Analyses (PRISMA) statement [8]. The systematic review was registered in the international prospective register of systematic reviews (PROSPERO) on 28 August 2020 (ID: CRD42020201421). In accordance with Dutch law, no institutional ethical review board approval was required.

Search Strategy

A search was conducted in PubMed and Embase, including literature published up to 1 April 2021. The syntax included synonyms for the words ‘assessment’ and ‘digital’, as well as the keywords ‘refractive error’, ‘visual field’, ‘colour vision’ and ‘visual acuity’. We deliberately used more keywords than ‘visual acuity’ as we did not want to miss tools that were not primarily developed to assess VA, but had the ability to do so. The full syntax can be found in the supplementary file. Articles published before 2010 were excluded because of the rapidly changing environment of digital tools and smartphones. No further limitations or filters were applied.

Study Selection

Titles and abstracts were screened for relevance independently by two reviewers (JC and JG). The reviewers were blinded and initial title/abstract screening focused on the use of digital tools in the field of ophthalmology in the broadest sense. Articles with a different topic were excluded. Subsequently, additional screening of titles/abstracts and full texts was performed to exclude papers about digital tools that did not include a VA assessment (i.e. different outcome) or were not self-assessments. We defined self-assessment tools as applications on a smartphone, computer or tablet that can be used without assistance of a healthcare professional. References of reviews and included studies were screened for additional citations. Articles were excluded if the full text was not available.

Our outcome of interest was the measurement accuracy of the digital tools, compared to conventional clinical charts, expressed as mean differences of VA and 95% limits of agreement (95% LoA). The mean difference can be interpreted as the systematic difference between the measurements (i.e. bias) and the 95% LoA as the range within 95% of the differences between one measurement and the other are included (i.e. random error). This methodology was first introduced by Bland and Altman and is commonly used in method comparison studies [9]. If the 95% LoA was not directly reported in the original article, it was derived from the plots or calculated manually from the mean difference and the SD using the following standard formula: 95% LoA = mean difference ± 1.96 × SD.

In parallel to writing the current review, our group investigates a specific remote eye exam in various patient populations: in a sample of 100 keratoconus eyes and a sample of 98 uveitis eyes. To illustrate how the accuracy of one digital tool varies for different VA ranges these preliminary study data were included in this review as well. The pre-prints have been published on medRxiv.org [10, 11].

Quality Assessment

All included studies were assessed for methodological quality according to the QUADAS-2 tool by two reviewers independently [12]. Disagreements were solved in consensus. ‘Bias’ is scored in terms of patient selection, blinding of outcomes during assessment with index test or reference test and the timing of the assessments (i.e. whether both tests were done sequentially within a short interval). Applicability focuses on whether there are concerns that the patient population, index tests or reference tests match the review question.

Data Extraction and Synthesis

Two reviewers (JC and JG) independently extracted the relevant data. From each included study, we retrieved information on study and patient characteristics, type of the index test, reference test and relevant statistics including mean difference, 95% LoA and/or standard deviation (SD) of the mean difference.

If multiple comparisons were presented in one article, we only recorded the measurements that were assessed in controlled experimental settings, as these represent the agreement in the most optimal settings. Two studies compared a near vision tool with both a near VA card and a distance chart [13, 14]. Another study used both types of reference charts for evaluating a distance tool [15]. We only present the comparisons using the reference chart with the closest resemblance to the original tool, as we consider these as the best representation for the measurement accuracy. One study reported VA in ETDRS letters, which we manually converted to logMAR using standard conversion charts [13]. For the included study by our own research group, the mean difference and 95% LoA were calculated manually since we had access to the study database [16]. If the reported 95% LoA were inconsistent in different sections of the original article [17], we could not report these numerical data.

The individual studies all differed greatly with regards to the studied digital tools, reference standards and study populations. This high between-study heterogeneity precluded a meta-analysis of these outcomes, as the generalizability of a possible pooled estimate was expected to be low.

Additional Subgroup Analyses for Different Visual Acuity Ranges

Subgroup analyses illustrate how measurement accuracy of a similar tool can differ for different VA ranges. Two of the included studies reported outcomes of subgroup analyses in the original article [10, 18]. We had access to the databases of the included studies by our own research group and used these data for additional subgroup analyses [10, 16]. Subgroups were based on achieved VA of ≤ 0.5 logMAR (≥ 0.3 Snellen) and VA > 0.5 logMAR (< 0.3 Snellen). This is the cut-off value for low vision stipulated by the World Health Organization [19].

Data on Test–Retest Variability of Visual Acuity Assessments

Variation between two assessments of VA is common, partly owing to the psycho-physical nature of the test. This is demonstrated when an individual is assessed twice within a short time interval, using the same chart, and further confounds the outcomes when different charts are compared [20]. Therefore, to put in perspective the agreement between the digital tools and the clinical charts, we will also report outcomes of studies regarding repeatability (test–retest variabilities) of the conventional Snellen and ETDRS charts. A comprehensive literature search identified relevant papers [1, 21,22,23,24,25].

Statement of Ethics Compliance

This systematic review is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Results

Search Strategy and Study Selection

The study flow chart can be found in Fig. 2. Our search resulted in 679 Embase and 408 PubMed citations. After removal of duplicates, 903 were screened for relevance by titles and abstract, followed by full-text screening of 40 potentially eligible articles. Subsequently, 32 articles were excluded on the basis of criteria depicted in the flow chart. Two articles were added from the medRxiv preprint server and seven articles were added after manual screening of references.

Fig. 2
figure 2

Study flow diagram based on PRISMA guidelines [8]

Quality Assessment

The evaluation of the risk of bias and applicability of the included studies is depicted in Fig. 3. In terms of risk of bias, most striking are concerns regarding the blinding of participants and/or researchers. Some studies explicitly reported on blinding, i.e. performing the index test (digital tool) without knowledge of the score of the reference test (conventional chart) and vice versa. For some studies it was unclear whether the researcher and/or subject had been blinded. Concerning patient selection, patients were mostly recruited consecutively. One study, Nik Azis et al. [26], reported a convenience sampling method for recruitment, by excluding children who were deemed uncooperative. Overall, there were no concerns regarding applicability for any of the studies: all studies matched the review question.

Fig. 3
figure 3

Risk of bias and applicability concerns summary. Scores are based on the QUADAS-2 tool [12]

Data Extraction

An overview of the study characteristics can be found in Table 1. The 17 identified studies examined 13 different tools. A total of 3591 eyes were included. There is a large heterogeneity among the included studies; participants differed in age, nationality and medical history. The majority of studies examined a population with a wide age range. Four studies focused on testing VA in children. Some studies examined only healthy participants, other studies examined patients or a mixed group of both.

Table 1 Study characteristics of the 15 included peer-reviewed studies and two pre-print publications

Seven unique evaluated tools have been developed to test distance vision [15,16,17,18, 26,27,28,29,30], and four unique tools specifically focus on near vision testing [13, 14, 30,31,32]. One tool assesses both [33]. Most of the studies assessed corrected VA, meaning that study subjects wore their habitual correction (i.e. glasses/contacts), if applicable. The digital tools involve different devices, including smartphones, tablets and computers. Most digital tools are available as an application for smartphones and tablets. Two tools are web-based and do not need installation of an app. Most of the publications were validation studies in experimental controlled environments. One study evaluated a smartphone-based near vision test to assess VA at an emergency department [34]. For one study, the digital tool was assessed unsupervised by subjects in their own home environment [11]. Different reference standards were used to assess agreement.

Comparisons of Distance Visual Acuity Assessments

Overall Measurement Accuracy

An overview of the comparisons of the distance VA assessments can be found in Fig. 4 [10, 11, 15, 16, 18, 26,27,28,29,30, 33]. Most articles reported outcomes for different subgroups or per eye, resulting in 18 comparisons. The mean differences between the digital tools and the reference standards (i.e. bias) range from – 0.08 to 0.10 logMAR. Most of the digital tools provide a slightly worse VA score (i.e. higher logMAR score) than the actual VA as measured by the reference standard. The distribution of the differences between the two tests (i.e. random error), as expressed by the 95% LoA, varies greatly between the studies. It ranges from ± 0.08 logMAR (lowest variability) to ± 0.47 logMAR (highest variability) from the mean difference. Separate comparisons per eye were mostly comparable within studies.

Fig. 4
figure 4

Mean differences between distance visual acuity assessments (digital tool minus reference standard) and 95% limits of agreement in logMAR. Some articles reported separate comparisons per subgroup or per eye. The dashed lines represent ± 0.15 logMAR, a difference that has been suggested in literature to be clinically acceptable [20]. Abbreviations: N number of paired observations; 95% LoA 95% limits of agreement

The study by Bastawrous et al. demonstrates that the accuracy of the same digital tool (Peek Acuity) varies when compared to different reference charts. Han et al. reported on different study populations, illustrating how test accuracy slightly differs when various groups are assessed under similar conditions. The studies by our own study group (Wisse et al. [16], Muijzer et al. [10] and Claessens et al. [11]) all focus on the same digital tool (Easee). Interestingly, the random error is much higher for uncorrected VA assessments in healthy individuals (of whom some have refractive errors) and patients with keratoconus than for corrected VA assessments in patients with uveitis.

Subgroup Analyses for Different Visual Acuity Ranges

The subgroup analyses illustrate how the measurement accuracy of a similar tool can differ for different VA ranges (Fig. 4). In all of these comparisons, the measurement accuracy appears lower in the poorer VA subgroups, illustrated by the higher mean differences and, most notably, the wider 95% LoA. In better VA ranges, these 95% LoA are smaller.

Test–Retest Variability of Clinical Wall Charts

Test–retest variabilities of logMAR and Snellen charts have been added as a reference in Fig. 4 [1, 21,22,23,24,25]. Test–retest variability of logMAR charts ranges from ± 0.07 to ± 0.18 logMAR (from the mean difference). Snellen charts are less consistent, with reported ranges from ± 0.18 to ± 0.34 logMAR. The line assignment in particular, often used in clinical practice, shows a great variation when measurements are repeated.

Comparisons of Near Vision Assessments

For the comparisons focusing on near VA assessments, the mean differences range from – 0.03 to 0.09 logMAR [13, 30, 32, 33]. The 95% LoA range from ± 0.17 to ± 0.35 logMAR from the mean difference. One study examined the Sightbook tool in clinical practice, at an emergency department [34]. This self-administered tool provided a more accurate representation of the VA recorded by consulted ophthalmologic residents (using Rosenbaum near cards) when compared to a distance Snellen chart assessed by non-ophthalmologic personnel at the emergency department [mean difference 0.06 ± 0.40 logMAR (Sightbook app) vs. 0.21 ± 0.35 logMAR (Snellen chart)].

Discussion

Many digital tools are available to self-test VA, though a clinical validation is often lacking [6]. This systematic review presents the 17 publications on 13 different tools for the self-assessment of VA currently available.

Our systematic review identified low mean differences of the digital tools when compared to reference standards for assessing distance VA, suggesting a low systematic bias. The mean differences ranged from – 0.08 to 0.10 logMAR. The digital tools only slightly underestimate the VA score of the patients and we consider these low values to be negligible in clinical practice. The 95% LoA vary between studies, ranging from ± 0.08 logMAR to ± 0.47 logMAR from the mean difference. Most of the 95% LoA are rather wide, suggesting considerable variability of the VA assessments of the digital tools. As stated before, there is always a certain variation in repeated measurements in the same person [1, 20]. A study on the variability of VA assessments in a large eye clinic reported a test–retest variability of ± 0.15 logMAR, when different charts and different examiners assessed the same patient’s VA within a short time interval [20]. The authors conclude that, in general, differences of less than 0.15 logMAR (i.e. 1.5 lines) are considered measurement variation and should therefore not be considered as indicative of an actual clinical change. Surprisingly, despite the different chart designs and examiners that are compared in this study, this reported variability does not substantially differ from the test–retest variability of same-chart measurements. The test–retest variability of Snellen charts is actually even wider, as depicted in Fig. 4. This figure also illustrated that the 95% LoA of most digital tools exceed the 95% LoA reflecting test–retest variability of traditional VA charts. On the basis of these findings, the digital tools appear less precise than traditional VA charts. Obviously, as with any medical technological device, quality differences of the different tools affect performance. Importantly, these technologies are developing continuously and an improvement in accuracy is attained with every design iteration.

The accuracy of the near vision tests seems better than the distance VA assessments, expressed by the smaller 95% LoA. The near vision cards and the assessments using tablets are very comparable in nature, which might explain the good agreement. Although near vision testing has a remarkably smaller role in clinical practice, these tools are very easy to use. It has been proven effective to have patients do a self-test by handing over a tablet at an emergency unit [34]. Especially in an emergency setting, convenience and time-effectiveness outweigh accuracy.

To the best of our knowledge, this is the first systematic review that focusses on clinically validated digital tools for self-assessing VA. We conducted a comprehensive literature search and reviewed all reference lists of included studies using PRISMA guidelines. Two reviewers independently performed the literature screening, data extraction and risk of bias assessment. The bias analysis identified concerns regarding blinding of the subject and researcher for the outcome of the tests in nine studies. We do not consider this a major problem since VA tests report a quantifiable outcome which can be recorded without interpretation. In addition, the two compared tests were different, so learning effects are not expected. One study, Nik Azis et al. [26], reported a convenience sampling method for recruitment, by excluding children who were deemed uncooperative. This may have negatively affected generalizability of the outcomes.

When comparing VA assessments two factors are important to consider. First the type of chart used significantly affects the obtained VA outcome [4]. Consequently, this affects the observed differences between the digital tools and the reference charts. There is always variation when two different VA charts are compared, and one should be careful when using charts interchangeably [5]. This also explains the difference in agreement of the Peek Acuity tool, as studied by Bastawrous et al., when different reference charts are used (Snellen vs. ETDRS, see Fig. 4) [28].

Secondly, the precision of the assessment depends strongly on the achieved VA of the tested subject. When examining the Bland–Altman plots of individual studies, we noticed that the accuracy of the tests improves for patients with better VA scores. This was also demonstrated in the subgroup analyses in Fig. 4. For the study by Zhang et al., the variability of the difference was remarkably smaller for subjects with an achieved VA < 1.0 logMAR than for the ≥ 1.0 logMAR subgroup (± 0.12 vs. ± 0.22). From our own study data, we also learned that the measurement accuracy of one specific tool (Easee) differed for various VA ranges and study populations. For example, the additional subanalysis of the data of Wisse et al. revealed an evidently smaller 95% LoA for the healthy individuals with higher VA scores (e.g. 95% LoA of 0.00; 0.58 for visual acuities ≤ 0.5 logMAR, versus 95% LoA of – 0.75; – 0.25 for visual acuities > 0.5 logMAR) [16]. More importantly, the measurements of this specific digital tool were substantially more accurate when assessing corrected VA (in patients with uveitis), compared to assessing uncorrected VA (in healthy individuals and patients with keratoconus, with refractive errors) [10, 11, 16]. The digital self-assessments provide less accurate results for poorer VA ranges, regardless of underlying ocular comorbidities. Thus, wide 95% LoA do not necessarily imply inadequate testing conditions, or a low technical quality of the studied digital tools, but can be largely attributed to the poor vision of the tested population. A lower accuracy in poorer ranges is a common feature of VA assessments. A recurrent clinical reason is that the person administering the conventional test might terminate the assessment too early. The Snellen chart in particular shows a poorer precision in lower VA ranges and the differences between Snellen and ETDRS increase in this range [21]. This can be explained by the chart design. When testing Snellen acuity, the tester uses a line assignment method. The poor vision lines usually contain only 1 or 2 letters. Thus, missing a letter on these lines can make a huge difference in this range [5]. In clinical practice, both time and convenience are essential, and therefore Snellen charts remain popular. We suggest that the accuracy of the digital VA self-assessments in poorer VA ranges could be improved by extending the initial assessment of individuals with poor scores for retesting (i.e. performing another assessment with different optotypes for internal validation and adjustment of the initial VA score).

Two studies show a remarkable narrow 95% LoA: Ansell et al. [29] and Zhang et al. [18] in the subgroup with VA better than 0.1 Snellen. These papers show a better agreement than well-executed test–retest studies of VA wall charts, evaluated in controlled, experimental settings.

Future research should focus on the performance of the digital tools in unsupervised conditions. Some of the included studies reported specifically that tests were performed at fixed distances, for example with the head of the participant leaning against an ophthalmologic chin piece and the smartphone or chart fixed on a desk (Brucker et al. [14]). We expect the accuracy to be different in real-world, less-controlled conditions. We encourage stratifying for different VA ranges when evaluating agreement between VA charts. We strongly recommend using logMAR charts as reference charts as they are the most accurate and consistent. For follow-up purposes, good repeatability (test–retest reliability) of the tools is important. This creates excellent opportunities for follow-up and signaling worsening of vision, as obtaining baseline measurements will allow future vision comparisons. Only two of the included validation studies reported on repeatability [13, 28].

Digital testing with mobile technology has many important advantages that outweigh accuracy. Traditional in-hospital VA testing requires patients to physically attend a clinic and consumes substantial hospital resources. The former might be particularly difficult for patients from rural areas, the elderly and immobilized patients. With the increasing digitization and availability of mobile devices, the digital tools have the potential to identify the most important cause of visual impairment worldwide: uncorrected refractive errors. Visual impairment has a negative impact on school performance, employability and quality of life in general. Strikingly, 53% of the visually impaired people have poor vision that can be prevented or remedied by glasses or contacts [35]. For screening of visual impairment, one is mostly interested in identifying a low VA range, rather than determining an exact value. When looking at the accuracy of the Peek Acuity tool (Bastawrous et al. [28]) in the forest plot, the 95% LoA are wider than the previously discussed test–retest studies of traditional charts. Notwithstanding, the tool has been successfully implemented in school screening programmes for identifying visual impairment. This has been investigated among various study populations around the globe [36,37,38]. Sensitivity, specificity and predictive value varied among the studies, but included some promising results. A screening study in the USA illustrated how sensitivity differed per age category and found the highest sensitivity for detecting decreased vision in 3–5-year-olds (93–100%) [38]. In a study in Kenya, the sensitivity (77%) was reported to be similar to the conventional screening method: standard E-card assessments by school teachers [37]. Importantly, when visual impairment was detected by the smartphone application, personalized SMS reminders to attend the hospital were sent to the children’s parents or guardians, increasing adherence to hospital referral. Specificity of the digital screening was a bit lower compared to the conventional method (91% vs. 97%). The authors of the original article suggested testing strategies to reduce the false positive rate, including retesting the children who tested positive at first and a local triage service to review all children who initially screened positive.

Our systematic review indicates that the low rate of bias makes digital tools for self-assessment of VA a promising avenue for delivering eye care remotely. The precision of most tools seems lower than traditional VA charts, though these differences diminish when assessing individuals with better VA. These self-assessments of visual function have great potential for screening purposes, particularly to increase access to eye care, which is acutely relevant in the current COVID-19 pandemic. The landscape of digital medicine has been rapidly changing, especially over the last few years: we expect the accuracy of the current tools to improve with every iteration and new tools to be introduced in the coming years.