Internet-based cognitive–behavioural therapy (iCBT) is an effective intervention for subthreshold depression.
Self-help intervention apps such as iCBT have problems with treatment adherence.
There is insufficient knowledge on how to improve adherence.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICYEncouraging messages from messaging applications with a chatbot character may improve adherence to an iCBT programme and augment iCBT’s effects.
Future studies should consider adding chatbots to iCBT to improve long-term treatment adherence; they should also check its impact on depression and anxiety.
BackgroundSubthreshold depression has a high prevalence and risk for developing major depression (MDD).1 2 Depression leads to poor job performance and significant economic losses for employees.3 Therefore, treating subthreshold depression and primary prevention of depression in the workplace are important issues worldwide.
Evidence-based intervention, such as internet-based cognitive–behavioural therapy (iCBT), potentially improves the accessibility to the care of people with subthreshold depression that do not seek professional help. iCBT has been effective in reducing depressive symptoms and preventing MDD in adults including workers with subthreshold depression.4–6 Further, iCBT has the advantages of teleoperation, cost reduction and maintenance of treatment quality compared with face-to-face psychotherapy. These strengths will be particularly pertinent for the working population.
On the other hand, iCBT has its own challenges of low adherence and dropout, with reports of approximately 10% lower for iCBT compared with face-to-face.7 Approaches to increase adherence and retention need to be further studied,8 not only because of the insufficient effectiveness of the treatment but also because of the risk for relapse.9 Recent reports have suggested that a combination of human and automated encouragement in iCBT can reduce treatment dropouts and increase effectiveness.10 Such strategies could be implemented in various ways, such as periodic encouraging emails, user interfaces that change in response to user inputs and in-person or artificial intelligence (AI)/chatbot feedback.11
The number of digital mental health intervention studies using chatbots has increased in recent years,12 and iCBT with chatbots has the advantage of promoting self-learning.13 The immediate responsiveness and human-like nature of chatbots may also benefit both human and automated encouragement, and we focused on the potential of chatbots to improve adherence. In a recent randomised controlled trial (RCT) of an iCBT with a chat-type AI agent called Woebot, conversational agent group showed reduced dropout rates by 20% over control group.14 However, no studies have yet directly examined the impact of chatbots only on promoting engagement and reducing dropout rates for iCBT.13 Therefore, we developed an original, lovable chatbot character that interacts with users individually according to their progress in iCBT, and studied the improvement in dropout and adherence by using the chatbot in conjunction with iCBT.
ObjectiveThe purpose of this study was to investigate whether a chatbot add-on to iCBT,15 16 which has already been shown to be effective in treating depression, could increase the completion rate of iCBT programmes for workers with subthreshold depression. Therefore, the primary outcome was defined as the completion rate of iCBT programmes.
This study takes two novel approaches: the first is that the chatbot did not provide a therapeutic intervention, but instead gave users a clear role as a supporter who encouraged them to continue with iCBT. This is the first study to design a directly observable impact of chatbots on improving engagement and reducing dropout and will contribute significantly to future research on the use of chatbots in iCBT. The second is that users do not converse with the chatbot within the iCBT application, but with a messaging service commonly used in Japan. This means that additional chatbots will be used without modifying iCBT application and the familiarity with the messaging service can lower psychological hurdles for daily operation.
MethodsTrial designThis study is RCT that investigates the improvement in completion rates of iCBT that is combined with a chatbot among workers with subthreshold depression. The study was conducted in an open-label, stratified block randomisation manner, with two arms: a group using iCBT and a chatbot (iCBT+chatbot group) and a group not using a chatbot (iCBT group). This study reports completion rates at 8 weeks and effects on psychological measures such as depression, anxiety and well-being. We followed CONSORT (Consolidated Standards of Reporting Trials) guidelines17 and completed the CONSORT checklist (online supplemental file 1). The complete study design and procedures are listed in the study protocol (online supplemental file 2).
ParticipantsThe inclusion criteria for participants were as follows: (1) full-time employees of Sony Group Corporation and Sony Corporation; (2) residents of Japan; (3) aged 20‒60 years; (4) owned a smartphone (iPhone or Android); (5) agreed to use the iCBT app; and (6) agreed to use fitbit (Fitbit), fitabase (Small Step Labs LLC, fitbit data collection service) and LINE (LINE Corporation, messaging service).
The exclusion criteria for participants were as follows: (1) inability to read and write Japanese texts; (2) undergoing follow-up and treatment by a psychiatrist or other mental health professional; (3) a total Patient Health Questionnaire-9 (PHQ-9)18 19 score at the time of application of 15 or above, or 10‒14 with 2 or 3 on the 9th item (suicidal ideation); and (4) plan to retire (retire or change jobs to other companies) during the participation period.
We recruited participants in April 2022, screened 334 applicants and invited 149 who met the eligibility criteria to an information session. They comprised 15 employees who scored four or less and 134 participants who scored between 5 and 9 or between 10 and 14 but scored 0 or 1 on the 9th item (suicidal ideation). The 149 applicants participated in an online information session and provided electromagnetic consent after a detailed explanation by a clinical research coordinator (CRC). Participants who completed IC (Informed Consent) were asked to complete the psychoeducation for the application during the orientation session. Participants who did not complete the psychoeducation during the orientation were asked to complete the psychoeducation lesson within a specified period. We discontinued the intervention for safety reasons if participants met the following conditions: a PHQ-9 score of 15 or higher or a score of 10‒14 with a score of 2 or 3 on its 9th item (suicidal ideation) over 3 weeks.
InterventionsInternet-based cognitive–behavioural therapyA smartphone app named ‘Resilience Training SE (Sony Edition)’ includes six iCBT components: psychoeducation (PE), behavioural activation (BA), self-monitoring (SM), cognitive restructuring (CR), assertiveness training (AT) and problem-solving (PS). The app was created for university students15 16 and it was necessary to modify the expressions related to school life and part-time work to those related to social life and work. During the orientation session, all participants first received psychological training on the importance of resilience to stress, CBT and weekly self-check (PHQ-9). According to PE, the app was programmed in the order of BA, SM, CR, AT and PS, each with an approximate completion time of 1 week. Each component consisted of a PE lesson describing a cognitive or a behavioural skill and a worksheet to practise what was learnt.15 16 Online supplemental figure 1 shows screenshots of the iCBT app. Participants were told that the test period would end after 8 weeks. The app opened weekly and prompted participants to answer the self-check. If a participant did not respond for several days, an automated email was sent to the participant, asking them to respond to the self-check. If a participant scored 15 or higher, or between 10 and 14 with a score of 2 or 3 on the 9th item (suicidal ideation), the administration sent an email advising them to contact psychological services such as occupational health. If the condition persisted for 3 consecutive weeks, the administration advised them to contact health services and informed them that the intervention would be discontinued. Each of the nine items constituting the PHQ-9 scores from 0 (not at all) to 3 (almost every day), with a range of 0–27 points. Scores of 10–14 are classified as moderate, 15–19 as moderately severe and 20–27 as severe. The administration sent participants a web-based questionnaire during the information session and 4 and 8 weeks afterwards to collect their responses.
ChatbotFigure 1 shows the conversation image and stamp of the chatbot named EPO, a cloud-like character designed for this study. The chatbot served as a human-like companion to participants in the iCBT+chatbot group, sending them personalised messages every morning and evening for 8 weeks, to encourage them to continue using the iCBT programme. Messages were sent through LINE, which they use for daily communication, to make the communication look more human-like. The chatbot system retrieved participants’ learning progress from the online server at specific times in the morning and evening. It then retrieved messages that matched the designed progress scenarios from the dialogue system developed by Sony Group Corporation, and sent the personalised messages to each participant’s messaging application. We developed a database of about 300 messages, including encouraging messages based on each participant’s learning progress, surveys to adjust message frequency and daily messages to deepen communication with characters. In addition, the chatbot asked semi-open-ended questions, such as favourite lesson, to increase participants’ engagement and encourage them to continue using the app. Online supplemental table 1 shows some examples of messages.
Chatbot character stickers, conversational images.
FitbitAll participants received a fitbit charge4 (Fitbit) in advance and were instructed to wear it for 8 weeks to collect life log data. Participants could view their own sleep, step count and other information using the fitbit app. However, because the conditions on fitbit use were the same in both groups, fitbit use would not have influenced the group comparisons.
OutcomesThe primary outcome was the completion rate of the ‘Resilience Training SE.’ Completion rate for the app was defined as the percentage of participants who completed the lesson, which consisted of five components, to completion within 8 weeks (56 days) from the day after the end of PE. Completing the lesson to the end was defined as reading the entire lesson and completing a problem-solving component worksheet before the epilogue.
Secondary outcomes were changes from baseline to week 8 on the PHQ-9 measuring depression, Generalized Anxiety Disorder-7 (GAD-7)20 measuring anxiety, CBT skills,21 The Satisfaction with Life Scale (SWLS)22 measuring well-being, WHO-523 24 measuring well-being, Presenteeism Scale from WHO Health and Work Performance Questionnaire (Presenteeism)25 measuring presenteeism, Work and Social Adjustment Scale (WSAS)26 measuring social function, and Utrecht Work Engagement Scale (UWES)27 28 measuring work engagement.
Sample sizeAs dropout rates improved by 20% in prior studies using a conversational agent14 or using programmes with feedback features,29 we expected that participants who used chatbot support would improve their completion rates by 20%. Assuming two-sided α-level of 0.05 and 80% power, a total sample size of 124 participants was required. We recruited 150 participants to ensure statistical power in case 20% of participants did not attend the orientation session or declined to participate after hearing the explanation.
RandomisationWe used permuted block randomisation stratified by pre-assessment PHQ-9 scores (4 or less, 5 or more). For allocation, researchers who were not involved in participant recruitment created a random allocation sequence in advance using R V.4.1.1. After participants were educated about the study at an information session and informed consent was obtained, allocation was performed. Participants were assigned to the two groups following the order of the time stamps received by the consent acquisition system, using an automated allocation system. CRC was responsible for enrolling participants and assigning them to the intervention, and the researchers with the exception of the CRC was concealed from participant assignment.
MaskingParticipants and researchers were not blinded to the intervention. Secondary outcomes were self-reported by participants.
Statistical analysesWe used SAS Studio V.5.2 (SAS Institute) for the statistical analyses. Participants were analysed in the full analysis set (FAS) according to the intention-to-treat principle, regardless of the actual intervention received or study discontinuation. For the primary analysis, the completion rates for the iCBT+chatbot group and the iCBT group were compared using the χ2 test with a two-sided significance level of 5%.
As a secondary outcome, we analysed the PHQ-9 scores using the mixed-effects model for repeated measures (MMRM). We estimated mean differences of the change scores at 1‒8 weeks from baseline between the two groups. In this analysis, we restricted the analysis set to participants with baseline PHQ-9 scores of 5 or higher to assess the impact of depression on participants with subthreshold depression. The MMRM modelled the change scores at 1‒8 weeks from baseline as outcomes and included the intervention condition (with or without chatbot), time of assessment (as nominal variables), age, baseline PHQ-9 scores and interaction terms between intervention and time of assessment as fixed effects. An unstructured covariance structure was used for modelling of the correlations of outcome variables. We calculated the SMD using the SD of the baseline PHQ-9 score.
We also analysed GAD-7, CBT skills, SWLS, WHO-5, Presenteeism, WSAS and UWES using MMRM for the FAS. We used the same model to analyse PHQ-9 scores for these outcomes. GAD-7, CBT skills, SWLS, WHO-5, WSAS and UWES were measured three times (at baseline, week 4 and week 8), and presenteeism was measured twice (at baseline and at week 8).
FindingsParticipants characteristicsFigure 2 shows the CONSORT diagram. We randomly assigned 149 applicants, 74 to the iCBT+chatbot group and 75 to the iCBT group. Of the 149 participants, 143 were included in the analysis as an FAS, excluding 4 participants who were unable to participate in the intervention due to system issues that prevented them from logging into the iCBT application and 2 participants who did not complete the psychoeducation within a specified time period. For the primary outcome—completion rate—we included 142 of the 143 participants (follow-up rate was 99.3%, 142/143), with the exception of 1 participant for whom we discontinued the intervention because of meeting protocol-based discontinuation criteria.
CONSORT (Consolidated Standards of Reporting Trials) diagram. iCBT, internet-based cognitive–behavioural therapy.
Table 1 shows the baseline demographic and clinical characteristics for each group, which were balanced.
Table 1Baseline characteristics of all participants (N=143) and by each component
Primary analysesTable 2 shows the completion rates for each group. The iCBT+chatbot group showed a statistically significantly higher completion rate than the iCBT group (p<0.05).
Table 2Completion rates of iCBT
Secondary analysesOnline supplemental table 2 shows the results of the analyses of secondary outcomes, PHQ-9, GAD-7, CBT skills, SWLS, WHO-5, Presenteeism, WSAS and UWES. The change in PHQ-9 at week 8 in the iCBT+chatbot group was −2.21 (95% CI −3.21 to −1.22, ES=−0.75) and that in the iCBT group was −2.30 (95% CI −3.30 to −1.30, ES=−0.78); both groups showed significant improvements. The mean difference for the PHQ-9 at 8 weeks was 0.08 (95% CI −1.33 to 1.5, ES=0,03), which was insignificant. As with the PHQ-9, both groups improved CBT skills other than PS, GAD-7, WHO-5, SWLS and Presenteeism. No improvement was observed for PS in CBT skills in either group and for WSAS and UWES in the iCBT+chatbot group. As with PHQ-9, no significant difference was observed between the two groups for secondary outcomes except for UWES, but contrary to expectations, the iCBT group improved significantly more than the iCBT+chatbot group for UWES (p<0.05).
Online supplemental table 3 shows the change from baseline in PHQ-9 at 1‒8 weeks (adjusted and unadjusted).
Adverse eventsWe informed the Sony Bioethics Committee that one of the participants had been hospitalised during the study because of a traffic accident (which was judged unlikely to have been caused by this study). Apart from this, none of the participants had serious adverse events.
DiscussionA messaging application with a lovable chatbot character significantly increased the completion rates of iCBT during the 8-week intervention period. The group that used the chatbot was 15.6 percentage points (95% CI 1.19 to 30.0) more likely to complete all lessons within 8 weeks than the group without the chatbot.
The chatbot that we developed for this study sent fully automated encouraging messages according to the individual’s programme progress. We designed the chatbot character to be friendly and expressive of emotions and used 10 chatbot character stickers along with the messages to combine automation with human-like qualities. A study of a digital smoking cessation programme reported that the addition of a chatbot more than doubled user engagement compared with a traditional programme.30 Our study extended these findings to iCBT for subclinical depression.
The higher completion rate when using the chatbot could also be because the pace of the lessons was controlled by messages based on the user’s individual iCBT progress. Online supplemental table 4 shows iCBT completion rates after ten weeks (8 weeks of intervention plus 2 weeks of allowance), suggesting that the chatbot helped control lesson pace. In the table, the risk difference at 8 weeks was 15.6%, whereas it decreased to 7.9% at 10 weeks, with no significant difference between the two groups. The lower risk difference at 10 weeks may simply reflect the fact that the messaging app was no longer active after 8 weeks. Or it may have been due to the email sent by the management office at 8 weeks to inform participants of the end of the study period. In the email, we wrote that we would allow participants to continue to use the application 2 weeks after notification of the end of the study. The fact that fewer participants in the iCBT+chatbot group completed the app within 2 weeks than in the iCBT group suggests that the chatbot was able to control the pace to complete the lessons within 8 weeks. The usability survey also showed that more than half (51%) of those who used the chatbot indicated that a message from the chatbot was the trigger for their use of iCBT.
However, the average 10-week completion rate for both groups was almost 40 percentage points lower than we had expected. The most likely reason for the lower than expected completion rate was that the amount of learning in the iCBT programme was too much for the duration of the experience, in addition to the participants’ background of being busy with work and family duties. When asked in the usability survey why they were unable to complete the lessons, approximately 51% (38/75) indicated that they were too busy to take the time to use the application. We think 2 months is not enough time to learn the five components, and there are too many components when using a self-help app. The appropriate amount and duration of learning for the target audience is an important issue.
Secondary outcomes such as PHQ-9 showed improvement for subthreshold depressed employees, with or without chatbot use. Previous studies have shown that human or automated encouragement can reduce depression,10 since the iCBT+chatbot group had a higher completion rate than the iCBT group, we expected the chatbot to improve PHQ-9 scores more, but there was no significant difference in the amount of change between the two groups. Of the secondary outcomes, UWES improved in the iCBT group compared with the iCBT+chatbot group, but the reasons are unknown. We attribute the lack of significant differences between the groups in changes on the PHQ-9 to the following. First, participation in the study and initial experience of PE and BA may have been sufficient to improve depression. Online supplemental table 4 shows that the PHQ-9 of both groups had already improved by more than −2 points at 2‒3 weeks. Second, the weekly self-check and visualisation of the life log by wearing the fitbit all the time, performed independently from the classes, might have contributed to the improvement of depression (68% of all respondents were satisfied with the use of the fitbit). Online supplemental table 5 shows the survey response rate for each week for the secondary outcomes, which was high, ranging from 70% to 90%. The effect of self-check on improving depression has been demonstrated in previous studies.10 Further research is needed to examine whether there is a difference in depression when the completion rate is further increased and whether the contribution of chatbots to lesson progress control improves outcomes in the long term rather than the short term.
Conclusion and implicationsThis is the first RCT that attempts to examine whether a human-like automated guidance function enabled by a chatbot could increase adherence to an iCBT programme for subthreshold depression. The results suggest that the personalised messages sent by the chatbot helped participants control their pace in attending lessons and improve programme adherence without human guidance. Despite the improved completion rates, and contrary to expectations, PHQ-9 and GAD-7 scores at 8 weeks were similarly improved in both groups with and without the use of the chatbot.
This study has two limitations. First, some users felt that the iCBT programme, which involved PE and five components for 2 months, required too much learning and, thus, the completion rate was lower than expected. Second, because the study was conducted with Sony employees as an in-house study, it may be that our sample had higher digital literacy than the general population. Even among such people, a chatbot messaging app helped increase the adherence. Lastly, this study was not designed to test the chatbot’s efficacy in improving subthreshold depression symptoms.
Future studies should review the iCBT programme’ structure and its experience and continuously improve the chatbot to enable it to eventually promote clinical indicators, such as PHQ-9 scores. For example, we believe that in addition to progress-based messages, individualised messages that capture the user’s personality and characteristics could provide more detailed support.
Data availability statementData are available upon reasonable request. After the publication of the primary findings, the deidentified and completely anonymised individual participant-level dataset will be posted on the UMIN-ICDR website (https://www.umin.ac.jp/icdr/index-j.html) for access by qualified researchers.
Ethics statementsPatient consent for publicationNot applicable.
Ethics approvalThis study was approved by Sony Bioethics Committee (#21-17-0001). Participants gave informed consent to participate in the study before taking part.
Comments (0)