Making Composite Time Trade-Off Sensitive for Worse-than-Dead Health States

4.1 Results

In the paper, we tested three modifications of cTTO to verify whether these modifications will result in the emergence of a correlation between the cTTO-elicited utility and health state severity measured with LSS for WTD states. In our study, we replicated the lack of significant correlation for the standard cTTO (arm A). Of the experimental arms, a statistically significant correlation emerged only in arm B.

In arm B, to sort the health states between WTD or BTD, no comparison versus immediate death was used. Instead, a single task was used in which both alternatives offered at least 10 years of life, which might have made the sorting task less abhorrent. When designing the study, we hypothesized that the comparison versus immediate death may be so appalling to some respondents that only very severe states will be considered WTD and subject to LT-TTO. Subsequently, once lead-time is used in LT-TTO, the respondents may avoid living in these severe states by trading off many years in full health, which will result in very negative elicited utility values. Our results in arm B as compared with arm A seem to confirm this hypothesis. First, more states were considered WTD. Second, in the WTD states, the mean utility was less negative. Third, the CDFs for the utility values elicited seem to diverge for the utility values in the range \((-0.5,0.5)\). The increase of the number of utility values in the range \((-0.5,0)\) seems to drive the emergence of the correlation between LSS and negative utility.

Our results are in concordance with these reported previously in the literature. Jakubczyk et al. [8], in their arm B, used the sorting question just like ours. They reported an increase in the proportion of WTD states compared with the standard cTTO and that a correlation emerged in arm B between the negative utility and other measures of severity. However, in their arm B the TTO implementation for both BTD and WTD states differed substantially from the standard cTTO, and they only used ten health states in the design. Jakubczyk et al. [9] compared the proportion of WTD states for various sorting questions. Among others, they used the framings that match arm A and arm B in the present paper. Jakubczyk et al. [9] found that when the latter is used instead of the former, the propensity to consider a state WTD increases.

The increased proportion of WTD states in arm B as compared with arm A results in the decrease of the estimated value of state 55555 from \(-0.479\) to \(-0.588\). Conveniently for the sake of comparability of value sets produced with these arms, the decrease is not too large, as it is reduced by the increase of mean elicited utilities conditional on a state being WTD. An advantage of arm B over arm A was the reduction of the number of inconsistencies between logically ordered states for WTD states (see Table 3). The proportion of Pareto-ranked states whose utility values were correctly ordered in a strict sense amounted to 68.3% for arm B compared with 34.4% for arm A. Admittedly, this increase is driven by many utility values being censored in \(-1\) in arm A, which results in lack of strict ordering. Nonetheless, such clustering of values in \(-1\) for arm A reduces the amount of information, thus the increase in proportion of strict ordering seems to be an advantage.

The percentage of responses at \(-1\) was substantially reduced in arm B compared with arm A, showing a large reduction in the floor effect. This is unexpected, since removing the comparison versus immediate death from the cTTO at the first glance has no direct relation to the absolute number of responses at \(-1\), in which respondents trade all life years in LT-TTO task. Our hypothesis is that once respondents choose immediate death over living in a given health state, they might be subsequently more inclined to trade off all the life years in LT-TTO, as they have already committed to considering a health state as WTD. Qualitative evidence is needed to test this hypothesis and to see whether there are alternative explanations.

Experimental arms C and D did not result in the statistically significant correlation between negative utility and LSS. For arm C, a non-significant negative slope was observed (larger than in arm B, in absolute terms), but there was a substantial variation of elicited utility values in the much enlarged range of possible values, which resulted in a large estimation error. A much larger sample would be needed to establish the impact of arm C in a more precise manner. Looking beyond the analysis of correlation between LSS and utility, our results in arm C agree with those reported earlier. The proportion of \(<-1\) values among the \(\le -1\) values in arm C amounted to approximately 97%, which seems consistent with 92% reported in [8]. In addition, the mean utility of 55555 elicited in arm C in the present study, \(-2.239\), is close to \(-2.15\) and \(-2.52\) reported in [8] in two of their study arms (different from our arm C, but also allowing for the elicitation of utility values \(<-1\)).

Arm D seems to offer no improvement in the distribution of the utilities obtained.

Finally, note that the study arms did not differ substantially in perceived difficulty (see Supplementary Materials).

4.2 Limitations

We see the following limitations of our study. First, we interviewed respondents from an online panel. Such samples may differ substantially from representative samples of the general population. For instance, in [9], a much larger proportion of WTD states was observed in an online sample than typically seen in general population. We also used more health states per respondent and a larger proportion of severe states for each respondent than what is common in valuation studies of EQ-5D instruments. In consequence, we would expect to observe substantially fewer WTD observations in a sample obtained using the EQ-VT protocol and coming from a general population, thus the assessment of the correlation between LSS and negative utility may require a larger number of respondents. Nevertheless, we see no reason to expect any other impact of using such a sample on the absence or presence of the correlation.

Another limitation of our study is that for simplicity, we used no feedback module in any of the arms, i.e., there was no possibility for the respondent to retrospectively indicate some of the utility values as elicited wrongly. In valuation studies using EQ-VT, such a module is used. For instance, in [5], \(8.3\%\) of the cTTO-derived values were flagged by the respondents in the module and removed from subsequent analysis. It would be interesting to see how the proportion would compare between our study arms and what the correlation would look like if only the non-flagged utility values were used. On the basis of previous studies, whether the flagged observations are used in the modeling seems to have little impact on the value set but a substantial impact on the number of inconsistencies in cTTO values [22].

Finally, LSS is only a rough measure of health state severity. Simply summing the level values across the dimensions does not account for the fact that respondents may attribute different importance to these dimensions and that the differences between neighboring levels may not be perceived as equally important.

4.3 Further research

With regard to the main goal of the paper, the following future research could be considered, as indicated above. First, it would be interesting to see the results for arm C in larger samples. In the literature, utility values \(<-1\) were observed when the elicitation allowed for it [8], thus studying the distribution of these values seems warranted. However, samples larger than ours seem needed to obtain results with satisfactory precision.

Second, data for the TTO variant used in arm B could be collected from samples of the general population. Using arm B in the context of valuing pediatric utility instruments such as EQ-5D-Y-3L may be particularly interesting, as the acceptance of immediate death for a child may be even more appalling to the respondents [2, 12].

Going beyond the goal of the present paper, we think that our results suggest the following possibly interesting research questions. As presented in Sect. 3.2, in arm C many observations were censored in \(-10\), which means that the lowering of the censoring threshold did not eradicate censoring but only changed the censoring point. It may indicate that some respondents focus on avoiding living in very severe states even for a relatively short time (e.g., a year) so much that they do not fully internalize the trade-offs (also, see [11], for attempts to estimate the\(<-1\) utility values). Qualitative studies may help to understand the actual mechanisms and shed some light on how to interpret very low negative values.

Second, our results demonstrate that changing the sorting question may improve some characteristics of the distribution of elicited utility values. Other sorting questions than those used in our arms A and B are possible, for instance, Jakubczyk et al. [9] used six different framings. Perhaps using some other sorting questions could be embedded in the cTTO and tested for their impact on the elicited values.

Comments (0)

No login
gif