In pursuit of new lupus therapies: The importance of trial endpoints

Author: Claire Barnard

The two phase 3 trials of anifrolumab for the treatment of systemic lupus erythematosus (SLE) reported different primary results, with TULIP-1 finding no significant difference in SRI-4 response rates among patients treated with anifrolumab versus placebo, and TULIP-2 demonstrating significantly higher BICLA response rates with the type I interferon receptor-targeted monoclonal antibody.

Why did these two trials report apparently contradictory results, which should we trust, and what do these findings mean for the future design of clinical trials and the search for new effective therapies for SLE? medwireNews speaks to Eric Morand, Professor of Medicine at Monash University in Melbourne, Australia, and investigator on both TULIP-1 and TULIP-2, to find out.

Different results or not so different results?

Even though the primary outcomes in TULIP-1 and TULIP-2 “were discordant in terms of showing a response,” it is important to note that “the overall results trended strongly in favor of anifrolumab over placebo,” says Morand.

“In TULIP-2, BICLA response was the primary endpoint and it was robustly positive, an unambiguous and highly significant result,” and the majority of the “key secondary endpoints that were multiplicity adjusted were also positive, as well as several other non-multiplicity-adjusted endpoints,” he adds, noting that some of these endpoints “were also positive in TULIP-1.”

The primary and key multiplicity-adjusted secondary outcomes from both trials are summarized in the Figure. In TULIP-2, 47.8% of the 180 participants randomly assigned to receive intravenous anifrolumab 300 mg every 4 weeks achieved a BICLA response at 1 year, compared with 31.5% of the 182 patients given placebo, giving a significant between-group difference of 16.3 percentage points.

On the other hand, rates of the primary outcome of SRI-4 response at 1 year in TULIP-1 were comparable among the 180 patients given anifrolumab 300 mg (on the same dosing schedule as TULIP-2) and the 184 participants given placebo, at 36% and 40%, respectively.

Summary of the TULIP-1 and TULIP-2 trials of anifrolumab in patients with lupus — Figure: Primary and key secondary outcome results for anifrolumab 300 mg every 4 weeks versus placebo in TULIP-1 and TULIP-2 (prespecified analysis). The key secondary outcomes were adjusted for multiple comparisons; results given with rounding as per published papers. *Statistically significant results (p≤0.05) versus placebo; †patients classified as having high interferon gene signature expression relative to healthy controls; ‡sustained from week 40, in patients taking at least 10 mg/day of prednisone or equivalent at baseline; §in patients with CLASI activity score of at least 10 at baseline.

Morand notes that “in TULIP-1, BICLA was nominally positive, but was not one of the key multiplicity-adjusted endpoints.” One-year BICLA response rates in this trial were numerically higher among patients treated with anifrolumab versus placebo, at 37.2% versus 26.6% in the prespecified analysis, but the between-group difference was not formally analyzed for statistical significance. However, participants in the anifrolumab group were significantly more likely to achieve a sustained BICLA response than those given placebo (hazard ratio=1.93) in a post-hoc analysis.

A change of primary endpoint

Morand says that the two trials were originally designed with the same primary endpoint – SRI-4 response – but the primary endpoint for TULIP-2 was amended after the TULIP-1 results became available.

“The trials were not synchronous and TULIP-2 was scheduled to finish several months after TULIP-1,” he remarks.

He explains that when the TULIP-1 investigators observed a negative result, the sponsor “put a lock on the data for TULIP-2, which they didn’t open until analysis around the endpoints of TULIP-1 were completed,” and this was done “without any access to the TULIP-2 data.”

Based on a review of the TULIP-1 results and literature on SLE endpoints by expert advisors, the primary endpoint for TULIP-2 was altered “prior to unblinding and database unlocking, and after being notified to the [US] FDA,” after which time the TULIP-2 “data were unlocked, and the endpoints applied statistically,” he adds.

Discussing this change of primary endpoint, Kevin McConway, Emeritus Professor of Applied Statistics from The Open University in the UK, feels that the researchers’ decision was “very interesting, but not terribly unusual.”

He says: “I think the justification for looking at this new primary outcome on the second trial is clearly stated,” and the change was “well documented,” with details given in the supplementary appendix of the TULIP-2 publication.

McConway points out that the TULIP-2 investigators present “an analysis of what the results would have been for the outcome measure that was used in the first trial” as a secondary outcome, noting that “it is interesting to contemplate what would have happened, if they hadn’t made the choice to switch the primary.”

And Morand notes that “as it happened, the SRI [response rate] in TULIP-2 was nominally significant in favor of anifrolumab, even though it was not in TULIP-1.”

Indeed, in TULIP-2, the SRI-4 response rate at 1 year – an additional secondary endpoint that was not adjusted for multiple comparisons – was 55.5% in the anifrolumab group compared with 37.3% in the placebo arm, with a nominally significant between-group difference of 18.2 percentage points.

“So, the question really is, why would SRI-4 be discordant between the two studies of the same drug in similar populations?” asks Morand, noting that “the answer to that is still unknown, and there’s a lot of work that needs to be done on the data to try to work out exactly how that happened.”

McConway thinks that “it would be interesting to see a pooling of the results” of the TULIP-1 and TULIP-2 trials in a meta-analysis, which “may help clarify things.” He expects that such an analysis would reveal “quite strong evidence of an effect,” because both trials pointed to an improvement in BICLA response with anifrolumab versus placebo.

How do the SRI and BICLA endpoints differ?

When considering TULIP-1 and TULIP-2, the key question is not only “do we trust the results?” but also “what have they told us about the fragility of endpoints in lupus trials?” stresses Morand.

He says that “important differences” between the SRI and BICLA endpoints are likely to explain why the trials had different primary results, and understanding these differences “is something that’s really going to help the field going forward.”

Morand explains that both endpoints are composite measures, and both include the same instruments – SLEDAI, BILAG, and the physician’s global assessment (PGA) – but these instruments “are used differently in the SRI and BICLA scores.”

“In the SRI the first thing that has to be met is that the SLEDAI has to go down by at least 4 points, and to do that, at least one organ system effectively has to completely resolve,” whereas “the BILAG and PGA are used to ensure that no other organ system got worse, so they’re a sort of backup in the composite [score].”

On the other hand, “in the BICLA, the primary measure of improvement is the BILAG, and the SLEDAI and PGA are used as the backup,” he continues, and points out that “improvement can be substantial and clinically meaningful, although not complete” to meet the threshold for improvement using the BILAG. This means that BILAG “is more sensitive to change than the SLEDAI, which requires complete resolution,” but “is also stringent because it requires improvement in all active domains, whereas SLEDAI only requires improvement in one domain,” he explains.

Morand summarizes that “BICLA gains sensitivity but retains stringency” when compared with SRI.

“I suspect [this is] the reason for the difference in outcomes for SRI and BICLA, but that has not yet been formally evaluated,” he says.

Addressing the need for better trial endpoints

While the SRI-4 and BICLA endpoints vary in terms of sensitivity, it is important to note that both measures are “based on two disease activity measures that originated about 30 years ago and weren’t primarily designed for use in clinical trials […] and they’re widely acknowledged to be imperfect,” emphasizes Morand.

At present, “our endpoints are the weak link in the chain in the ability to robustly and repeatedly demonstrate efficacy” of new therapeutic agents, he adds, noting that consequently, “a lot of research activity” is focused on improving lupus trial endpoints.

For instance, Morand and colleagues developed a new measure called Lupus Low Disease Activity State (LLDAS), published in 2016, followed by a prospective multicentre validation published in 2019.

Morand says that the LLDAS “has been tested post hoc in several trial databases and works well,” and was used as a prespecified secondary endpoint in a phase 2 trial of the Janus kinase inhibitor baricitinib in SLE patients.

Does an ideal endpoint exist?

In the future, Morand anticipates “a much more robust empirical approach to endpoint design and testing,” but adds that “it is going to take time, and require the [regulatory agencies] to accept these new endpoints.

“They currently recommend the SRI and BICLA even though they have failed more often than they have succeeded.”

Furthermore, “I don’t think we’re going to get to a place where there’s a single endpoint that works for every drug,” he stresses.

He points out that there are “a lot of trials currently in phase 2 in lupus,” and while “such trials need to have an a priori declared primary endpoint,” he thinks that “we should study carefully the individual data of each trial for each compound and work out which data are the most informative about that therapeutic target.”

“We need to be creative and innovative using our phase 2 datasets to evolve phase 3 study endpoints that inform about particular compounds.”

Kevin McConway tends to agree. He highlights that it is important to follow good practice in clinical trial design – in terms of “registration, details, laying absolutely everything out, including the statistical analysis, in advance,” but “it’s not foolproof, and can go wrong because patients [with SLE] are extremely variable, and it is possible to show what’s going on in different ways.”

What does all this mean for anifrolumab?

In the meantime, for anifrolumab, “it is my understanding the sponsor is in the process of preparing regulatory submissions, which should happen this year,” says Morand.

“It is my belief from talking to colleagues that most are persuaded by the evidence and would like to see this drug approved [but] we await the conclusion of the regulators,” he adds.

And if anifrolumab is approved, Morand thinks “the whole field would also like to see further studies with the drug,” to address when to use it.

“Do you use it in early disease to prevent future complications? Do you use it to treat patients after everything else has failed? The trials don’t answer this question; we need strategy studies comparing treatment approaches,” he says.

“I hope that in the next couple of years, [anifrolumab] will be approved, we’ll do the strategy trials to work out how to use it, and we’ll start to use it.”

The need for new therapies

Morand is optimistic that if approved, anifrolumab will help to address a large unmet need for new SLE therapies, with only one agent – belimumab – approved for the condition in the past 50 years.

He believes that the problem with suboptimal trial endpoints is “the most important” reason why there has been so little progress with approvals for SLE, but another key issue “is that the disease that we clinically classify as lupus is probably a group of diseases with a group of different biological foundations.” For instance, he notes that “the interferon signature is detectable in 70–80% of patients with active lupus but not in 20–30%, at least in Caucasian populations,” which “straightaway tells you that patients are not the same.”

Therefore, “we are taking a laser-focused approach” in clinical trials by treating patients with targeted agents such as monoclonal antibodies, but “the patient group is actually heterogeneous,” he says.

“And if you don’t know the proportion of patients in your trial who have a biology amenable to your drug, then the risk of failure is greatly increased.”

Morand “can see a time when this will be better,” because the technology is now available to do gene expression analysis on every patient, but cautions that such testing is currently “very expensive” and limited to research settings. He also notes that the bioinformatics capacity for analyzing gene expression data “is probably a limiting factor,” but the increasing use of artificial intelligence should improve our ability to understand genetic data.

In the future, “I can see a time where we’ll do a test on a patient to determine the likelihood of response to a given drug,” he says.

Other promising agents in the pipeline

In addition to anifrolumab, two other agents have shown promise as new therapeutic agents for SLE: baricitinib and ustekinumab.

“They’re some way behind anifrolumab in timing, and they both had a positive phase 2 trial,” but this “is no guarantee of a positive phase 3 trial because of these measurement problems that we have,” says Morand.

If these agents do demonstrate favorable efficacy and safety profiles in phase 3 trials, “then 2–3 years from now we could have three new drugs for lupus after only one approved in the past 50 years,” he adds.

“And that would truly be magnificent.”