Invited speakers

You can download the slides of the talks below from this page.

I01 Innovative trial design

Monday, August 24

11.00 - 12.30

Gerard van Breukelen - Maastricht University, The Netherlands

Efficient design of trials with clustered data: how to deal with unknown and possibly heterogeneous variance parameters in the design stage?

The sample size needed for sufficient power to detect a clinically relevant treatment effect in a randomised trial depends on the outcome variance. Unfortunately, the outcome variance per treatment arm is unknown in the design stage, and misspecification may lead to an inefficient design. This problem is aggravated in trials with clustered data, such as cluster randomised trials (CRTs), and trials with repeated outcome measurements. In both cases we have at least two, and usually more, variance parameters on which the sampling variance of the treatment effect depends. Relatedly, the sample size must be determined at two levels then (number of clusters and number of individuals per cluster, resp. number of individuals and number of measurements per individual). The optimal sample size per level of CRT depends on the unknown outcome variance between and within clusters, and on the cost per included cluster resp. individual. In a trial with repeated measures, we have the same problem, now with individuals as clusters and measures as individuals, and usually a more complicated (e.g. autoregressive) covariance structure. Well-known solutions to overcome the dependence of the optimal sample size on unknown parameters are sequential, adaptive and Bayesian design, which will be briefly discussed. Here, we focus on another approach, Maximin design, which maximizes the minimum efficiency or the minimum relative efficiency, rather than the expected efficiency as in Bayesian design. We briefly present some results on Maximin design of trials with repeated measures, considering covariance structures such as compound symmetry and 1st order autoregression. We elaborate on Maximin design of cluster randomised trials, allowing for treatment-dependent variances and costs, and we compare this design with the popular balanced design.

Dominic Magirr - Lancaster University, United Kingdom

Fixed and adaptive clinical trial designs with threshold selection for a continuous biomarker

Biomarker-driven designs are used in the development of targeted therapies that exhibit enhanced efficacy in a subset of the overall population. The predictive properties of candidate biomarkers to be used in a phase III trial are often unreliable. In particular, the threshold chosen to classify patients as either “biomarker positive” or “biomarker negative” may be based on insufficient data. In this talk, I will describe more flexible designs which enable the sponsor to determine the threshold using data collected during the trial.

Maryam Safarkhani - University of Utrecht, The Netherlands

Optimal design for discrete-time event history data

Randomized controlled trials (RCTs) in longitudinal studies are designed to compare the effectiveness of different treatments over time. A special type of outcomes is the survival endpoint which measures the length of time between becoming exposed to the risk of an event and event occurrence. Although the underlying event process usually operates in continuous time, data on event times are often collected in discrete time intervals, such as weeks, months or years, leading to interval-censored or discrete-time data. In this case, we know only the time interval during which an event occurs instead of precise timing of the event. For this reason, it may be considered more natural to use a model for discrete event times.
A design for RCTs with event history data may depend on the number of subjects, the number of repeated measures per subject, and the treatment group sizes, among others. In practice, however, the number of subjects and measurements per subject are usually restricted by budget constraints. Therefore, an important design issue is to find an optimal combination of these characteristics that maximizes the efficiency of the treatment effect estimator given a maximal budget. The aim of this talk is to present optimal designs for RCTs with discrete-time survival endpoints and also to discuss effects of other design parameters, such as cost ratios or underlying survival parameters on optimal designs. Furthermore, it has been shown that a strong (baseline) covariate increases the power to detect the treatment effect. We will also look at the effect of a predictive covariate on optimal designs in discrete-time survival endpoints.

I02 Statistical methodology for clinical research in rare diseases

Monday, August 24

14.00 - 15.30

Ralph-Dieter Hilgers - Aachen University, Germany

New developments in integrated design and analysis of small population group trials

The ability of conventional statistical methods to evaluate new therapeutic approaches for any given rare diseases is limited due to the small number of patients concerned. Thus, there is an urgent need not only to develop new therapeutic approaches to treat diseases, but also to adapt established statistical approaches and to develop new methods in order to overcome the constrains of conventional methods. This is the starting point of the IDeAl ("Integrated Design and Analysis of small population group trials ") research project, which aims to utilize and connect all possible sources of information in order to optimize the complete process of a clinical trial.
The IDeAl project will explore new methods in the design, analysis, interpretation and assessment of clinical trials in small population groups. Thus, the aim is to merge and to integrate these aspects, so that the efficiency of therapy evaluation in small population groups can be increased significantly.
The IDeAl project addresses the most important aspects on design, analysis, interpretation and assessment of clinical trial results. The task schedule is divided in 10 work packages. The work packages focus on the assessment of randomization, the extrapolation of dose-response information, the study of adaptive trial designs, the development of optimal experimental designs in mixed models, as well as pharmacokinetic and individualized designs, simulation of clinical studies, the involvement and identification of genetic factors, decision-theoretic considerations, as well as the evaluation of biomarkers.
Obviously each of the work packages considers overlapping aspects of design, analysis, interpretation and assessment of clinical trial results. In this presentation, an overview of the project will be given and recent results are discussed.

Nigel Stallard - University of Warwick, United Kingdom

Recent advances in methodology for clinical trials in small populations: the InSPiRe project

The Innovative Methodology for Small Populations Research (InSPiRe) project is one of three projects funded under the EU Framework Programme 7 call for New methodologies for clinical trials in small population groups. The 40-month project brings together experts in innovative clinical trials methods from eight institutions to develop new approaches for the design, analysis and interpretation of trials in rare diseases or small populations. In such settings the large clinical trials generally used to evaluate new drugs and other healthcare interventions are often infeasible. New approaches to the design of such studies, or improved methods of data analysis and decision-making are therefore needed. The InSPiRe project is developing methods that can enable reliable results to be obtained from trials more efficiently, ultimately leading to improved healthcare for these small population groups. With the aim of enabling rapid evaluation of treatments whilst maintaining scientific and statistical rigour, we are developing new methods that include the combination of trial data with information from other studies, adaptive trial designs that allow most efficient use of current data and optimal decision-making processes to reach conclusion as quickly as possible. This talk will briefly outline the work of the InSPiRe project, describing the four main work packages on early dose-finding trials, decision-theoretic designs, confirmatory trials and personalized medicines, and evidence synthesis in the planning of clinical trials in small populations. An overview of progress on the research work, which started in June 2014, will be presented.

Kit Roes - University Medical Center Utrecht, The Netherlands

Advances in Small Trials dEsign for Regulatory Innovation and eXcellence

Clinical research designs to study new drugs and treatments for rare diseases face the fundamental challenge that they are badly needed to evaluate treatments for often devastating diseases, but are severely limited in the numbers of patients that can be recruited within a reasonable timeframe. This may require more from statistical methodology and statisticians than “squeezing out” additional efficiency in conventional or existing improvements in trial designs for large diseases. The European Union has acknowledged that, and has funded three projects that aim to provide new methodology for clinical trials in rare diseases and, related, personalized medicine. The Advances in Small Trials dEsign for Regulatory Innovation and eXcellence (Asterix) is one of them. In this presentation, the unique approaches for the Asterix project will be presented an include the following:

(Quantitative) methods to include patient level information and patient perspectives in design and decision making throughout the clinical trial process.
Statistical design innovations for rare diseases in individual trials and series of trials.
Re-consideration of the scientific basis for levels of evidence to support decision making at the regulatory level.
A framework for rare diseases that allows rational trial design choices.
Validation of new methods against real life data and regulatory decisions to improve for regulatory decision making.

It thus serves, together with presentations of the other two EU projects, as an introduction and context to the presentations on “Statistical methodology for clinical research in rare diseases” at the ISCB 2015.

I03 Subgroup analyses

Monday, August 24

16.00 - 17.30

Matthias Briel - University Hospital Basel, Switzerland
Believe it or not – empirical research on credibility criteria for subgroup analyses

Subgroup analysis of randomised controlled trials (RCTs) seeks to determine whether a treatment effect varies across subgroups defined by patient characteristics. Findings of subgroup analyses offer the promise of individualizing patient care, and are common in RCTs: 40%-65% of trials report subgroup analyses. In particular, claims of subgroup effect, in which authors convey a conviction or belief of a difference of treatment effects between patient subgroups, can have a substantial impact on clinical practice and policy decision. Up to 60% of RCTs reporting subgroup analyses also claim subgroup effects. However, many inferences from subgroup analyses have proved spurious. The credibility of a putative subgroup effect represents the likelihood that a difference in treatment effect between subgroups is real, and reflects a continuum ranging from extremely unlikely to highly plausible. The last two decades have seen considerable work in documenting the limitations of subgroup analysis, and in developing criteria to guide clinicians, scientists, and health policy makers in making appropriate inferences regarding their credibility. This presentation will summarize empirical evidence on (1) the extent to which claims of subgroup effects are consistent with existing credibility criteria, (2) the planning of subgroup analyses in RCT protocols and the agreement with corresponding full journal publications, and (3) the association of industry funding with the reporting of subgroup analyses in RCTs.

Armin Koch - Hannover Medical School, Hannover, Germany
The role of subgroups in confirmatory clinical trials for drug licensing Empirical Bayes integrative significance analysis of temporal differential gene expression induced by genomic abnormalities

Frank Bretz - Novartis Pharma AG, Basel, Switzerland
Sample size calculations for confirmatory subgroup analyses

I04 High dimensional data: analysis of microbiota data

Tuesday, August 25

9.00 - 10.30

Jeanine J. Houwing-Duistermaat - Leiden University Medical Center, Leiden, The Netherlands

Statistical methods to analyze repeated measurements of overdispersed categorical data: an application in longitudinal microbiome data

Statistical modelling of clustered categorical data can be challenging. Our work is motivated by a longitudinal study on human gut microbiome measurements where multivariate count data distributed over a number of bacterial categories are available. In the motivating study, bacterial data for subjects from helminth endemic areas at two time points are collected and the question of interest is to assess the effect of infection status on the bacteria composition at phylum level. The multinomial regression model can be used for analysis. However, the presence of the correlation within a study subject (overdispersion) and the clustering over time need to be modeled properly. To account for overdispersion, the Dirichlet-multinomial model is typically used. The alternative approach is the multinomial logistic mixed model (MLMM). However, fitting this model requires numerical integration over the random effects distribution. We propose to extend the MLMM and use the combined model (CM), in which we incorporate two sets of random effects: a Dirichlet distributed random effect to accommodate the overdispersion and a set of normally distributed random effects to model the association between the repeated measurements. We evaluate the performance of the methods via simulations. Finally, we apply our method to the bacterial data and estimate the effect of infection status over time in longitudinal setting.

Pier Luigi Buttigieg - Alfred-Wegener-Institut Helmholtz-Zentrum für Polar- und Meeresforschung, Bremerhaven, Germany
Navigating the multidimensional space of microbial data in the meta'omic era

Multi'omic technologies associated with enhanced contextual data resources are permeating the microbial and biomedical sciences and have generated a suite of new challenges in informatics and data analysis. Multivariate methods from numerical ecology and graph analysis have become popular in navigating high-dimensional community, functional, and environmental data. However, the skillful application of these techniques is often inhibited by complex properties of 'omics data, combined with the relative nascence of deep statistical understanding in many of the relevant fields. Areas of particular concern include the principles of effective replication, the handling of compositionality, and the appreciation of key methodological assumptions. In an effort to render knowledge of multivariate statistics more accessible to the microbiomics community, we have created an online, open, and extensible Guide to Statistical Analysis for Microbial Ecology (GUSTA ME; https://mb3is.megx.net/gustame). A combination of user-friendly interfaces have been implemented to ease discovery of and access to content which has been tailored to audiences with limited mathematical training: wizards interactively guide users to method descriptions, graphical walkthroughs illustrate how these methods have been used in the literature, and warnings intervene when risks are present. To complement this content, a suite of web applications allows users to test several methods with either their own data or on example data sets. GUSTA ME is an open and living project: contributions from the community are very welcome, particularly those introducing users to new or promising methods. Altogether, GUSTA ME aims to serve as a living repository which coherently gathers introductory analytical knowledge relevant to 'omics-enabled microbial research.

Leo M. Lahti - University of Helsinki, Helsinki, Finland
Temporal dynamics and population diversity of the human intestinal ecosystem

The diverse microbial communities of the human body have a profound impact on our physiology and health. Although their composition and function have been studied extensively, we have only a limited understanding of the temporal dynamics governing these complex ecosystems. Combining information across many individuals can help to uncover general characteristics of microbiome dynamics that extend beyond individual variation. To assess how individual variation is reflected at the population level, we integrate short time series from a hundred individuals with deep phylogenetic profiling of the intestinal microbiota from a thousand healthy western adults from the phylogenetic HITChip microarray atlas to monitor the temporal dynamics in a thousand species-like bacterial phylotypes that represent the majority of the known microbial diversity in the intestine. In parallel to the dominating gradual variation, we show that specific microbial taxa exhibit contrasting, stable configurations of low and high abundance. These bi-stable 'tipping elements' vary quite independently, are frequently observed in a host in various combinations, and exhibit notable differences in their robustness and contributions to the overall community composition. Host factors such as age can affect the resilience and move the system towards a tipping point of an abrupt switch. Establishing links between community stability and host health is a key challenge for microbiome studies. We discuss the emerging approaches and propose how targeted analysis of the dynamics and alternative states in specific taxonomic groups can simplify the characterization and possible manipulation of the intestinal microbiota.

Yalcin Yavuz - Danone Nutricia Research, Utrecht, The Netherlands

Challenges in microbiota community profiling analysis

The gut microbiota is a complex ecosystem with a huge number of different species that are distributed along the gastrointestinal tract. This ecosystem plays a critical role in nutrition where it is responsible for degrading a number of dietary substances that are non-digestible for the host and for the production of vitamins. Thus, composition of the intestinal microbiota, i.e. count of bacterial species which can be obtained by using analytical technique of 16S rRNA gene pyro-sequencing is of interest in a large number of clinical trials in Danone Nutricia Research. Poisson regression models would be the first choice as framework for the univariate analysis of count microbiota data. However, in real life, count data do not always meet the assumption of equal variance-mean relationship induced from Poisson distribution leading to over-(or under-)dispersion. The source of the over- (or under-)dispersion may be due to a higher than expected occurrence of zero counts. The absence of a count for a microbe can be due to the fact that the microbe is not present in the sample (structure zeros) or that the microbe is present but happens not to present in the sample (sampling zeros). Hence, there is a distinction between structural zeros, which are inevitable, and sampling zeros, which occur by chance. This inflation of zero count of bacteria species is one of the most challenging difficulties in microbiota community profiling.We demonstrate the use of two different models for over-dispersed count data with certain levels of zero inflation: zero-inflated Poisson and zero-inflated negative binomial models and will discuss the performance of the models using data from one of the clinical trials conducted in Danone Nutricia.

I05 Dynamic prediction in oncology

Tuesday, August 25

11.00 - 12.30

Dimitris Rizopoulos - Erasmus University Medical Center, Rotterdam, The Netherlands

Personalized screening intervals for biomarkers using joint models for longitudinal and survival data

Screening and surveillance procedures are routinely used in medicine for the early detection of disease and the close monitoring of progression. Biomarkers are one of the primarily tools used in these procedures, but their successful translation to clinical practice is closely linked to their ability to accurately predict clinical endpoints during follow-up. Motivated by a study on patients who received a human tissue valve in the aortic position, in this work we are interested in optimizing and personalizing screening intervals for longitudinal biomarker measurements. Our aim is twofold: First, to appropriately select the model to use at time t, the time point the patient was still event-free, and second, based on this model to select the optimal time point u > t to plan the next measurement. To achieve these two goals we develop measures based on information theory quantities that assess the information we gain for the conditional survival process given the history of the subject that includes both baseline information and his/her accumulated longitudinal measurements.

Hein Putter - Leiden University Medical Center, The Netherlands

Pseudo-observations, landmarking and dynamic prediction

Landmarking was originally introduced as a way of dealing with the problem of immortal time bias in the context of time-dependent covariates in survival analysis. It has later been proposed as a method of obtaining dynamic predictions for a survival outcome without the need of constructing comprehensive models for the stochastic behaviour of intermediate events or longitudinal measurements in relation with the survival outcome of interest. The aim of this talk is to show how landmarking can be used in conjunction with another popular method in survival analysis, pseudo-observations, to construct dynamic regression models for non-standard survival outcomes. Two illustrations of the use of these “dynamic pseudo-observations” will be given. The first is in a competing risks context, where interest is in estimating the conditional cumulative incidence of a given cause, given covariates. In the other application, interest is in direct regression models for expected residual healthy life in an illness-death multi-state model.

Jeremy Taylor, Krithika Suresh & Norm Good - University of Michigan, United States & CSIRO Mathematical and Information Sciences, Australia

A dynamic prediction model for colon cancer surveillance data

Dynamic prediction models make use of patient-specific longitudinal data to update individualized survival probability predictions based on current and past information. Models are first fit to a training data set. The results from that fit are then combined with an individual’s data to extrapolate into the future. The predictions can be updated as new longitudinal data for the individual is collected. In this talk I will present a model that is derived from surveillance studies on individuals characterized as high-risk for colorectal cancer. In this study colonoscopy (COL) and fecal occult blood test (FOBT) results are available. We first specify a Poisson process model for the development of advanced adenoma or colorectal cancer (AAC). The AAC events of interest are measured at the time of COLs, and are thus interval censored. This naturally leads to a generalized non-linear model with a complementary log-log link as a dynamic prediction tool that produces individualized probabilities for the risk of developing AAC in the near future. This model allows predicted risk to depend on a patient's baseline characteristics and time-dependent covariates. Information on the dates and results of COLs and FOBTs were incorporated using time-dependent covariates that contributed to patient risk of AAC for a specified period following the test result.

I06 STRengthening Analytical Thinking for Observational Studies (STRATOS)

Wednesday, August 26

9.00 - 11.00

Katherine Lee - University of Melbourne, Australia
Regression modelling with missing data: principles, methods, software and examples

Missing data are ubiquitous in research, and are often an unwelcome headache for hard-pressed analysts. Depending on the context, and the reasons for the missing data, the simple solution of analyzing the subset of complete records may or may not be adequate. Therefore, focusing on regression modelling, the Missing Data Topic Group has set out to review the issues raised by missing data, outline when a complete records analysis is likely to be ‘good enough’. Then, for situations when a more sophisticated approach is needed, we review, illustrate and critique the established methodology and associated software. Using data from the Youth Cohort Study of England and Wales (a publically available dataset) we illustrate how a masters-level analyst faced with a missing data problem should engage with our proposed materials, and in particular what they should learn and put into practice

Stephen Evans - London School of Hygiene & Tropical Medicine, London, United Kingdom

STRengthening Analytical Thinking for Observational Studies: the STRATOS initiative: study design

Appropriate and valid study design is crucial for valid conduct of observational studies(OS). These contribute to establishing causal effects, together with other evidence (e.g. mechanistic studies, clinical trials), though some OS do not attempt to infer causality, e.g. prognostic studies or estimation of. The appropriateness of any design depends on the research question, in the context of current theory and knowledge, availability of valid measurement tools, and the proposed uses of the results. In theory, some study designs are seen as less biased; in practice validity is topic- and context-specific. Hierarchies of OS are often proposed, e.g. cohort studies high, followed by case-control and cross-sectional studies. However, their relative validity represents a continuum, and ‘less valid’ study designs may yield equally valuable information. It is unusual for a single OS to deliver definitive results, so assessing epidemiologic evidence almost always involves combining information from different study populations, designs, investigators, and methods. None can be perfect; rather, the aim is to contribute to the pool of knowledge for a particular issue, in a particular population and risk period. Valid design involves a context-specific balance between these competing considerations. For example a blood test may yield better estimates of exposure (reducing information bias), but reluctance to give blood may increase missing data (potentially increasing information bias and increasing random error) and lower response rates (increasing selection bias). Whatever design is used, it is important to be able to conduct sensitivity analyses, and/or including control exposures expected to have null effects.

Aris Perperoglou - University of Essex, United Kingdom
Multivariable regression modelling using splines. A review of available packages in R

Building explanatory models depends heavily on two interrelated aspects: the selection of variables and their functional form. To deal with complex functional forms of continuous variables statisticians have developed a variety of spline methods, many of which are implemented in R packages. This project will investigate options that are available in R for building multivariable regression models with splines, compare between different approaches and provide practical guidance on available software. Out of a total of more than 6200 packages on CRAN (May 2015) we identified a subset of approximately 100 packages that have some spline related function. Packages were classified into two categories, the ones that create spline bases and those that fit regression. For the first type of packages we identified the types of splines available, whether a user can define degrees of freedom, number and position of knots and if there are available methods for determining the smoothing parameters. For regression packages we looked into types of regression models, whether the package includes criteria for the significance of a non-linear effect or graphical tools to identify variables that should be transformed, procedures for variable selection and multivariable methods. In this presentation we will present our first findings. We will show what are the most used packages, what are their interdependencies and their basic features. Furthermore, we will discuss the framework for further research, where we will evaluate the quality and performance of packages with the aim to provide detailed guidelines for applied researchers.

Terry Therneau - Mayo Clinic, Rochester, Minnesota, United States

The STRATOS survival task group

The survival task group has only recently organized. Since any of the important topics in modeleling survival data overlap with those of other task groups (selection of variables and functional form, measurement error, causal effects, etc.) our intial work will focus on aspects that are particular to survival data. These include, in no particular order - recurrent and competing events, and multi-state models - time dependent effects - time varying covariates and the common error of conditioning on the future, e.g., survival curves of responders vs non-responders - relative uses and merits of the proportional hazards, additive hazards, and accelerated failure time models - relative survival - interval censoring - random effects ("frailty") - joint modeling of survival and longitudinal markers This talk will give an overview of the topics and our directions along with a deeper look at issues in topic, and hopefully engender wider discussion and/or participation in the project.

I07 Statistics for pharmacoepidemiology

Wednesday, August 26

14.00 - 15.30

Michal Abrahamowicz - McGill University, Montreal, Canada
Flexible modelling of cumulative effects of time-varying treatments in longitudinal studies

An accurate assessment of the safety or effectiveness of drugs in longitudinal cohort studies requires defining an etiologically correct time-varying exposure model, which specifies how previous drug use affects the hazard of the event of interest. To account for the dosage, duration and timing of past exposures, we proposed a flexible weighed cumulative exposure (WCE) model: WCE(τ|x(t), t<τ) = ∑w(τ–t)[x(t)] where τ is the current time when the hazard is evaluated; x(t) represents the vector of past drug doses, at times t< τ; and the function w(τ–t) assigns importance weights to past doses, depending on the time elapsed since the dose was taken (τ–t). The function w(τ–t) is modeled using cubic regression B-splines. The estimated WCE(τ) is then included as a time-dependent covariate in the Cox’s PH model. Likelihood ratio tests are used to compare w(τ–t) against the standard un-weighted cumulative dose model, and to test the Ho of no association. Recently, the WCE model was extended to flexible Marginal Structural modeling (MSM) with IPT weights [2], and to comparative effectiveness research (CER), to test for the possible differences between cumulative effects of two drugs. The accuracy of the estimates and tests is assessed in simulations. The applications are illustrated by re-assessing the associations of (a) glucocorticoids and infections, (b) benzodiazepines and fall-related injuries, (c) didanosine and cardiovascular risks in HIV (MSM analysis), and (d) insulins and heart attacks in diabetes (CER analysis).

Tjeerd van Staa - Manchester University, United Kingdom
Randomised trials using routine electronic health records

There is increasing interest in using routinely collected data, such as electronic health records (EHRs) or registry data, for pragmatic randomised trials. This presentation will describe experiences in the UK and Sweden, including a pilot trial that randomised patients at the point of care using a flagging system and followed patients using the EHRs. Different approaches to ensuring data quality will be provided. It will also be discussed whether blinding to treatment allocation is always the gold-standard approach or whether it would introduce bias in pragmatic trials. Statistical challenges to pragmatic trials, including switching between treatments, will be explored. The presentation will conclude with an overview of the European GETREAL project. The overall objective of GetReal is for pharmaceutical R&D and healthcare decision makers to better understand how real-world data and analytical techniques can be used to improve the relevance of knowledge generated during development, e.g., through innovation in clinical trial design. Examples will be given of possible statistical improvements in the design of pragmatic trials.

Stephen Evans - London School of Hygiene and Tropical Medicine, London, United Kingdom

Statistical challenges in pharmacoepidemiology

Knowledge about drugs used in the population is derived from basic science, randomised clinical trials (RCTs), observational studies and spontaneous reporting. Chronologically science is followed by RCTs then spontaneous reporting may provide “signals” which are tested in observational studies in some instances. There are statistical challenges in each of these areas. Meta-analysis of RCTs has increased in importance. Exploratory data analysis for meta-analysis seems to be a neglected field. Examples will be given from the meta-analyses of rosiglitazone showing the need for understanding of the data is as important as the statistical method chosen. Deciding on causality is a general problem in epidemiology and there are special challenges in pharmacoepidemiology. Conventional analyses with standard confidence intervals do not reflect the real uncertainties. Examples in the field of diabetes and cancer will be given, showing the effect of using modern causal modelling.. In the last 20 years, statistical analysis of spontaneous reports has shown great advances. Further advances can be made and in particular the analysis of reports for drugs as they arrive on the market. The current methods effectively filter out most reports and do not provide much help when a drug is first introduced to the market. Some methods require a large number of reports for a drug before they can be regarded as useful. If the process of signal detection is to be effective in obtaining early warning of problems, then we need to utilise methods that are able to detect signals at an earlier stage. A new approach to doing this will be outlined.

I08 Unbiased reporting, integrity, and ethics

Wednesday, August 26

14.00 - 15.30

Doug Altman - University of Oxford, United Kingdom

Complete and accurate reporting of research – an ethical imperative

Clinical research has value only if the study methods have validity and the research findings are published in a usable form. Research publications should not mislead, should allow replication (in principle), and should be presented so that they can be included in a subsequent systematic review and meta-analysis.
Deficiencies in research methods have been observed since early in the last century, and major concerns about reporting – especially of clinical trials – have been expressed for over 50 years.
The World Medical Association’s Helsinki Declaration states that “Researchers have a duty to make publicly available the results of their research on human subjects and are accountable for the completeness and accuracy of their reports.” Many studies are never published and, for those that are, a wealth of evidence demonstrates widespread selective reporting of research findings, leading to bias, and inadequate reporting of research methods and findings that prevents readers using the information. The present unacceptable situation is shocking; so too is its wide passive acceptance.
The EQUATOR Network was created in 2006 as a concerted effort to bring together guidance on research conduct and reporting that existed in a fragmentary form across the literature, and promote its dissemination and adherence. Many reporting guidelines exist but their impact is as yet slight. I will consider what actions are needed by different stakeholders to help to raise standards more rapidly.

Patrick Bossuyt - University of Amsterdam, The Netherlands

Transparency and "spin" in reporting studies of the diagnostic accuracy of medical tests

Communication about completed research projects is an essential element of the everyday life of scientists. We communicate, not just because we are happy with study findings - or disappointed by them - but also because we want to allow others to appreciate the validity of our methods, and to enable or colleagues to replicate what we did. Clinical research is special, because there “others” do not just encompass fellow scientists; our audience also includes clinicians, other health care professionals, and decision-makers. They all rely on the results of strong research to guide their actions.
Unfortunately, deficiencies in the reporting of research have been highlighted in several areas of clinical medicine. Essential elements of study methods are often not well described, or even completely omitted, making clinical appraisal difficult. Reporting problems are not restricted to non-informative descriptions of methods and study design. Sometimes there is selective reporting of study results, failure to publish, and in some cases researchers cannot resist unwarranted optimism in their interpretation of study findings. All of these contributes to what is now referred to as “waste in research”.
The evaluation of medical tests is no exception. In this presentation we will illustrate the severity of the problem, by providing examples of substandard reporting, selective reporting, generous interpretation of study findings, also referred to as “spin” in reporting, and failure to publish. We have estimated that “Spin” may be present in about one in three published reports of diagnostic accuracy studies.[3] We will list possible causes, and point to initiatives to curb this development, and to improve the value of clinical research.

Lisa McShane - National Cancer Institute, Rockville, United States

Reproducibility of omics research: responsibilities and consequences

Irreproducible biomedical research is particularly concerning because flawed findings have the potential to make their way to clinical studies involving human participants. Many factors have been suggested as contributors to irreproducible biomedical research, including poor study design, analytic instability of measurement methods, sloppy data handling, inappropriate and misleading statistical analysis methods, improper reporting or interpretation of results, and on rare occasions, outright scientific misconduct.
Potential for these problems to occur is amplified when the research involves use of novel measurement technologies such as “omics assays” which generate large volumes of data requiring specialized expertise and computational approaches for proper analysis and interpretation.
Roles of statisticians and other computational scientists have often become confused in omics research, and sometimes basic principles of scientifically sound and ethical clinical research seem to be forgotten in translation.
It has been suggested that greater involvement of statisticians and requirements to make data and computer code publicly available are important first steps toward addressing problems of irreproducibility. Through a series of case studies I explore the many dimensions of reproducible omics research, discuss how various proposed remedies for irreproducibility may or may not be effective, and identify the many parties that have a responsibility to promote and support a culture of reproducible omics research.

I09 Inference in Infectious Disease Epidemics

Wednesday, August 26

16.00 - 17.30

Philippe Lemey - University of Leuven, Belgium
Inferring spatial evolutionary processes from viral genetic data

The influence of phylogeography is spreading throughout biology. Among the different methodologies to study the geographical history of genetic lineages, phylogenetic diffusion models have become particularly popular to reconstruct the origin and emergence of pathogen outbreaks and to study their transmission dynamics. Here, I will review how Bayesian phylogenetic diffusion models explicitly connect sequence evolution to geographic processes and allow inferring pathogen spread through time and space. In particular, I will focus on recent advances that facilitate hypothesis testing, including an approach to reconstruct spatiotemporal history and simultaneously test the contribution of potential predictors of viral movement. Modeling studies have shown that the use of `effective’ distances, measured along the relevant transport networks, can reduce complex spatiotemporal patterns to surprisingly simple and regular wave-like patterns. Motivated by this, I will explore the use of Bayesian multidimensional scaling (BMDS) techniques to incorporate the concept of effective distances to study pathogen dispersal in ‘effective space’. The main applications will focus seasonal influenza, for which most insights are generated by analyses of influenza A/H3N2. I will show how the global seasonal circulation patterns of the different seasonal influenza lineages can be modeled ‘air transportation space’, but the rate at which they travel this network varies extensively and relates to their evolutionary dynamics and epidemic success.

Leonhard Held - University of Zurich, Switzerland

Combining social contact data with time series models for infectious diseases

The availability of geocoded health data and the inherent temporal structure of communicable diseases have led to an increased interest in statistical models for spatio-temporal epidemic data. The spread of directly transmitted pathogens such as influenza and noroviruses is largely driven by human travel and social contact patterns. To improve predictions of the future spread, we combine social contact matrices with a spatio-temporal endemic-epidemic model for infectious disease counts. We investigate the combined model for public health surveillance data on noroviral gastroenteritis.

Michael Höhle - Stockholm University, Sweden

Spatio-temporal epidemic models of communicable diseases

Addressing the concept of 'space' is an important addition to temporal models for communicable disease modelling, because it allows addressing spatial heterogeneities in the transmission. This may be important in the discussion of trend analyses, fade-out probabilities or short-term prediction. In this work we provide an overview of available spatio-temporal susceptible-infectious-recovered (SIR) based on compartmental models. In particular, we will discuss the meta-population model, the multivariate time-series SIR model and the so-called endemic-epidemic model while showing relationships between them. Altogether, we propose a more regression based approach towards epidemic modelling and illustrate how functionality of the R package 'surveillance' can be used as a tool to perform such data driven spatio-temporal analyses.

I10 Dynamic treatment regimes

Thursday, August 27

9.00 - 10.30

Bibhas Chakraborty - National University of Singapore, Singapore

Q-learning Residual Analysis with Application to A Schizophrenia Clinical Trial

Q-learning is a regression-based approach that uses longitudinal data to construct dynamic treatment regimes, which are sequences of decision rules that use patient information to inform future treatment decisions. An optimal dynamic treatment regime is composed of a sequence of decision rules that indicate how to individualize treatment using the patients' baseline and time-varying characteristics to optimize the final outcome. Constructing optimal dynamic regimes using Q-learning depends heavily on the assumption that regression models at each decision point are correctly specified; yet model checking in the context of Q-learning has been overlooked in the literature. In this work, we show that residual plots obtained from standard Q-learning models may fail to adequately check the quality of the model fit, and hence present a modified Q-learning procedure that accommodates residual analyses using standard tools. We present simulation studies showing the advantage of the proposed modification over standard Q-learning. We illustrate this new Q-learning approach using data collected from a sequential multiple-assignment randomized trial of patients with schizophrenia.

Vanessa Didelez - University of Bristol, United Kingdom

Lessons from Simulations of Marginal Models for Dynamic Treatments with Survival Outcomes

A dynamic treatment is a set of rules that assigns a decision value for each time point when a treatment decision needs to be made based on the patient's history so far. An example in the context of HIV studies is "start treatment when CD4 count first drops below 600". An optimal dynamic treatment is such a set of rules that optimises a criterion reflecting better health for the patient. Dynamic Marginal Structural Models (DMSMs) have become popular tools to evaluate and compare a set of dynamic strategies. The particular difficulty with survival outcomes is that such marginal models are not necessarily compatible with the set of conditional models which would typically be used to simulate survival data with time-varying confounding. While this problem can be solved for non-dynamic MSMs, only approximations can be obtained in the dynamic case. As an alternative we therefore investigate how similar a structural accelerated failure time model can be made to a DMSM. This reveals among others that too simple DMSMs can never appropriately model the kind of data that they are supposedly designed for. We show that our simulation proposal, despite being an approximation, can nevertheless be used to demonstrate how much and in what way common violations of assumptions or incorrect analyses lead to biased conclusions.

Susanne Rosthøj - University of Copenhagen, Denmark

Formulation and estimation of dosing strategies for treatment of children with acute lymphoblastic leukaemia

During oral maintenance therapy of childhood leukemia the children receive two antileukemic agents. The antileukaemic mechanisms of maintenance therapy are poorly understood and there is currently no international consensus on dose adjustment strategies. Based on observational data with detailed registration of doses and blood counts we discuss how to formulate new dynamic treatment strategies. Such strategies takes as input the observed treatment and medical history and outputs a recommendation as to the dose the child should receive until the next examination. The formulation and estimation of optimal dynamic treatment regimes has received much attention in the statistical literature for the past 12 years. In general, focus has been on optimizing the mean of an outcome measured by the end of the treatment phase, the time points of examination are assumed to be coincident across patients, there are typically few time points for dose adjustments and the decision is often binary, such as whether to start or stop treatment. However, in the treatment of childhood leukemia, the focus is on maintaining blood counts within a target range 2-4 weeks ahead (receding horizon), examination and treatment decision times are frequent but irregularly scheduled, and the decision involves the sizes of the two doses. We consider a multivariate structural nested mean model (Robins (1997)) to account for the receding horizon. The problem is similar to Model Predictive Control used in control theory in engineering. The issue of irregular sampling is handled combining a regression approach with missing data techniques. Robins (1997) In: Berkane (ed) Latent variable modeling and applications to causality. Springer, NY