The objective of this work was to establish and validate a glossary of usability attributes aimed at improving usability evaluation practices to support the user-centered design of WRD. The established glossary, the RUG, provides a shared and validated terminology that is easily accessible and implementable by developers. To this end, our glossary facilitates the search and selection of context-specific outcome measures and usability research methods within the online Interactive Usability Toolbox (IUT) of ETH Zurich [14]. The generalizability and validity of the UA definitions comprised in our glossary were supported by the ratings of 70 developers of WRD from 17 countries around the world, who showed high agreement (≥ 4.0) on 32 of the 43 UA, and moderate agreement (4.0 > agreement ≥ 3.5) on other 10 UA. Likewise, developers agreed on the relevance of most of these attributes in the field of WRD, with 27 UA considered as highly relevant (≥ 4.0) and other 12 as moderately relevant (4.0 > relevance ≥ 3.5). Improved definitions for the attributes considered relevant but with moderate or low agreement ratings are also proposed based on the feedback provided by the respondents. All the comments provided by the respondents and the improved definitions are included in Additional file 2: Annex 2.
The high agreement ratings for most of the UA included in our glossary underline that, despite the wide interpretation of UA in the literature [6,7,8,9] our definitions are in general adequate and could serve as reference for future studies or for people interested in comprehensive usability evaluation of WRD. It is interesting to highlight that most UA with moderate or high-to-moderate agreement ratings are terms usually found within the field of engineering, e.g. autonomy, complexity, robustness, technical requirements and wearability [11]. We hypothesize that most developers possess an engineering background, which may lead them to interpret these terms in alignment with engineering-based definitions. Consequently, when prompted to provide a perspective on these terms from a different field, such as usability, discrepancies may arise. Widening the perspective of research and development teams beyond the engineering requirements is fundamental to promote the development of WRD that are usable and effectively respond to users’ needs [2].
A special case is that of ergonomics, the only attribute with low agreement but with high relevance. Ergonomics is a very wide umbrella term used differently across different fields and, thus, can be understood in different ways. In fact, this was the attribute that received the most comments. Instead of considering it as part of usability, ergonomics has long been studied as a separate field of research interacting with usability [19] and there are longstanding international efforts such as the Ergonomics Research Society or the International Ergonomics Association [20], that have stated definitions of the term ergonomics that can be adapted to suit specific fields. Consequently, several of the aspects regarding ergonomics relate also to usability, including other UA of our glossary such as comfort or wearability, and therefore, some WRD developers might consider that the whole field of ergonomics cannot be synthesized as a single, specific UA. Due to its high relevance, we consider it crucial to integrate ergonomics into the IUT, enabling developers to access the available tools for assessing the ergonomics of WRD, even though simplifying the entire field as a UA may be an oversimplification. Based on the feedback provided by the respondents and the definitions stated by the aforementioned organizations, the improved definition for ergonomics in the RUG is “the degree to which the interactions among users and elements of a WRD are optimized to increase human well-being and overall system performance including anatomical, anthropometric, physiological and biomechanical characteristics that relate to the intended use of a WRD”.
Complementary to the high agreement ratings obtained, the high (27 out of 43) and moderate (12 out of 43) relevance ratings of most UA underscore the multifaceted nature of usability. This observation highlights that usability is not a singular, simplistic concept but rather a complex interplay of various dimensions and attributes [16]. Consequently, to conduct a comprehensive assessment of usability, it becomes evident that multiple attributes of usability must be taken into consideration, highlighting the necessity for a holistic evaluation approach that transcends the prevalent trend in the field. Currently, the field predominantly relies on the use of three dimensions to describe usability (i.e. effectiveness, satisfaction, and efficiency) and usability evaluation is predominantly related to functional or performance-related outcomes [21, 22], followed by the evaluation ease of use, safety and comfort [16, 23], which may overlook the richness of usability. As expected, in our survey, many of the most widespread attributes related to the usability of WRD received very high relevance ratings (≥ 4.5): safety, usefulness, comfort, reliability, wearability, effectiveness, functionality, meet user needs, and satisfaction. However, efficiency received a high but not very high rate, indicating that other attributes are more relevant to the developers than only the three stated by ISO 9241–11. The glossary provided within this study, which deems most UA as relevant, signifies that the UA summarized and validated therein serve as pivotal elements that effectively encapsulate and represent the entirety of usability. A detailed analysis of the individual ratings (see Additional file 2: Annex 2) raises the need to debate whether the four attributes with relevance scores below 3.5 should be included in the glossary. Aesthetics and embodiment have borderline low-to-moderate relevance. Since they have been previously found to be design criteria important for the primary users of WRD under comparable terms such as “appearance” and “avoid machine body disconnection” [2], respectively, we consider they should be included in the list of UA of the IUT. Both definitions stated for these UA have high agreement, therefore, they do not need improved descriptions but rather more awareness from developers to be included as part of their design criteria, because both have poor scores in this regard. On the other hand, the UA technical requirements received a low relevance score and exhibited borderline moderate-to-low agreement among respondents. Comments associated with this attribute suggest that developers do not necessarily perceive it as an integral component of usability but rather believe that technical requirements and usability requirements are complementary in technology developments. Considering this valuable feedback, it is prudent to consider removing this attribute from the glossary. On the other hand, pleasure stands as the only UA marked with a low relevance score, albeit displaying high agreement in its definition. A detailed examination of the definition provided for this UA shows that it could be closely intertwined with the attribute of satisfaction, which holds very high relevance in the field. Hence, it may be reasonable to also consider omitting pleasure from the set of UA. Both UA are closely related to two psychology-related codes expressed by end-users of lower limb robotic devices for gait rehabilitation, including “positive feeling of being able to stand up and walk again” and “sense of wellness (physical and/or mental)” [2], underlining their relevance for end-users.
From the remaining 41 attributes, improved definitions were proposed for eight UA considered highly relevant (≥ 4.0) but with moderate (adaptability, complexity, ease of use, helpfulness, meet user needs, robustness, and wearability) or low (ergonomics) agreement ratings. In fact, most of these UA were the ones that more respondents commented on: ergonomics (10 comments), adaptability, helpfulness, wearability, and technical requirements with 4 comments each, and robustness and durability with 3 comments each. Three of these attributes (ease of use, meet user needs, and wearability) are also often included as design criteria (ratings ≥ 4.0), underpinning the importance of providing definitions that are agreed upon by developers in the field.
Moreover, a detailed analysis of the boxplots in Fig. 2 and the summary of the ratings in Table 3, show that while most of the attributes of the glossary are considered relevant in the field of WRD and that there is a high agreement with their proposed definitions, they have not been often included as design criteria in previous developments [16]. This can be confirmed by comparing the respondents’ years of experience in the field (mdn = 7) and the number of dedicated usability studies performed (mdn = 2). Therefore, our study underlines that usability is still poorly considered as part of the design criteria during device development, even if developers recognize its relevance. Actually, 10 respondents (17.14%) indicated that they had not performed any dedicated usability study in their career and two respondents (2.86%) reported they had never had contact with end-users of their devices. We consider there must be a paradigm shift in WRD development towards implementing user-centered design to properly address users’ needs during device developments [24,25,26], since it is unlikely that developments done without both involving users [27] and considering usability issues will be successful in reaching end-users [1, 28, 29].
It is worth noting that the highest correlation among all the studied combinations was found between the ratings of “relevance in the field” and “previously included as design criteria in technology developments” (moderate correlation, ρ = 0.62, p-value ≈ 0.00). This could be explained by the fact that developers may only include as design criteria the attributes that they consider relevant and overlook the ones that they do not consider important. In fact, the eight UA seldomly included as design criteria (ratings < 3.00) are not considered highly relevant in the field (relevance < 4.0). These are accessibility, aesthetics, autonomy, desirability, embodiment, error recovery, frustration, and pleasure. All of these UA exhibit high or moderate (only in the case of autonomy) agreement in their respective definitions. Therefore, their infrequent inclusion as design criteria, despite their moderate relevance scores, cannot be attributed to having ambiguous definitions. Instead, this pattern illustrates that some UA are potentially less relevant in specific application cases of WRD or could arise from a potential lack of awareness regarding their significance from the perspective of end-users. It's important to note that all the listed UA originally emerged as design criteria demanded by primary or secondary end-users in a prior study on lower limb WRD [2].
A moderate correlation between the professional experience related to the “number of dedicated usability studies performed” and the “number of users personally interacted” was found (ρ = 0.55, p-value ≈ 0.00). This can be easily understood because the more usability studies performed, the more users are involved in these studies. Similarly, more users must be involved in usability evaluation as technology becomes more mature, which explains the positive correlation between higher TRLs and both the “number of usability studies performed” (ρ = 0.54, p-value ≈ 0.00) and “number of users personally interacted” (ρ = 0.52, p-value ≈ 0.00). In this regard, results show that the peak values for both user involvement and usability studies are in late TRLs (i.e. 6, 8 and 9), corresponding to the stages of prototypes validated and product. Similar results were found in a previous study [16], highlighting the relevance of user involvement to develop technologies that go beyond the prototype phase and successfully reach end-users [30].
Previous efforts to define usability in WRD [7, 8] contained 17 attributes each and agreed on seven of them. Nonetheless, some of them are related to services that must be provided by the distributors of the WRD or are entirely device-centered. Moreover, in contrast to our work, none of these models validated the attributes and their definition within the local or global community of WRD developers, limiting the diffusion, impact, and generalizability of the proposed glossaries. Therefore, their selection of terms for what is considered usability was arbitrary, and some of the proposed definitions are not specifically related to usability. The RUG comprises all the UA included in previous efforts and provides definitions specifically related to usability, including the four UA included in the COST action dictionary and the factors and subfactors in the EXPERIENCE questionnaire from Eurobench [11, 12]. The detailed comparison between these previous works in the field and the attributes of our glossary that encompass their definitions are presented in Additional file 3: Annex 3.
Therefore, the RUG is the most comprehensive set of UA available in the field of WRD to evaluate usability and has been externally assessed and improved by developers from most of the active countries working in the field of WRD, thus enhancing its generalizability. It can be readily accessed through the IUT website (www.usabilitytoolbox.ch), enabling developers to have immediate open access to the definitions of each UA and to identify context-specific outcome measures and usability evaluation methods related to each attribute. Three examples are presented in Table 5. The results of this study do not aim to point to specific attributes as being more important than others, but rather underline that all attributes should ideally be considered for a holistic usability evaluation. Despite the glossary being built entirely in English, it was mostly agreed upon by both native and non-native English speakers. In fact, all the definitions within our glossary are not aimed exclusively at the field of WRD but were rather built from a usability perspective. This means that they could possibly be useful to be implemented in other fields related to wearables, robotics, and health technologies overall. In case such interest arises, we recommend engaging developers from each specialized field to evaluate the significance of the attributes included in our glossary and the appropriateness of the proposed definitions within their respective domains. This evaluation is advised before directly implementing the current glossary.
Table 5 Examples of measurement tools selected using the IUT to evaluate specific usability attributes of three different WRD for different target users: an upper limb WRD for amputated children, an augmentation lower limb WRD for adults, and a lower limb WRD for gait rehabilitation of post-stroke adultsLimitations and future workThe estimated target sample size of the global validation stage was not fully met. Nevertheless, in line with the previous online survey experience of the research team [16], all measures to reach the largest possible sample were taken. The survey was widely shared through several channels (e.g. social media, conferences, email lists, research centers and companies, the IUT website, and Exoskeleton Report) to reach WRD developers from different countries and from both academia and industry. Additionally, the data collection period was extended until there was no increase in the responses gathered. To increase the completion rate, the survey was designed dividing the glossary into the UA batches to guarantee a reasonable response time (below 10 min.). Nevertheless, this raises an additional limitation to the study, since not all respondents rated all UA, representing a possible confound. The authors gave priority to increasing the number of responses collected, since the main objective of the study was to obtain an external validation of the glossary with the participation of a wide sample of respondents.
Collecting the professional background information of the respondents in the global survey would have enabled us to explore potential correlations between each rating and the respondents' profiles. This is important because some respondents may have a technical development-oriented perspective, while others might have professional backgrounds more closely aligned with being end-users of the technologies (e.g. clinicians or people with neurological injuries), thereby reflecting perspectives from real-life scenarios. The current study purposely targeted only technology developers because they are mostly the ones conducting and designing usability evaluations or WRD. Therefore, we aimed to reach a consensus among them. Nevertheless, understanding that there might be differences between end-users and developers regarding the perception and relevance of the usability attributes, it would be interesting to perform another study targeting only end-users. The study would be aimed at comparing the understanding and relevance of the UA included in the RUG and to check if end-users identify additional usability attributes that ought to be added to the glossary. Such an effort would require a different survey and different distribution channels to the ones used in this work. We strongly suggest including a question to identify the background of the respondents in the survey and assess possible differences in their responses. As indicated before, this is an important limitation of our study.
Another limitation of our effort is that the proposed methodology was aimed at reaching an external validation of the glossary but could instead be considered a participative assessment and improvement of the proposed definitions. Therefore, it remains as a somewhat subjective methodology, because we did not implement our global validation stage as a truly iterative process with multiple rounds of evaluation where participants could reach a consensus. Ideally, the global validation could have taken the form of an e-Delphi study [31], but such an approach is highly resource and effort demanding, which might have further limited the participation of developers. We consider that the participation of developers from several countries and with different native languages was fundamental to making the glossary generalizable, understandable, and representative to developers from all continents. For developers interested in translating the RUG to other languages, we strongly suggest such translation is performed carefully by native speakers with knowledge of the field, to make sure the specificity of the terms is preserved. Lastly, it might be worth to regularly updating the RUG based on the potential emergence of new disruptive technologies, because WRD is still a developing field. Doing it is important to assess if new attributes are needed when such devices appear in the field. A new survey can be carried out to this end. If performed, we strongly suggest also considering the application(s) of the WRD with whom respondents have experience. This is important because the relevance of certain usability attributes can depend on the application of a given WRD, as it already discussed in our paper. Alternatively, any other type of global coordinated effort between leading organizations in the field or WRD can lead to an updated version of the RUG when considered necessary by the demands of the people working in the field.
Comments (0)