Removing Outliers and Overdispersion in Poisson Mixed-effects Models for Count Data Analysis

Understanding Poisson Mixed-effect Regression with glmmTMB: Interpreting Residual Plots and Removing Outliers

Introduction to Poisson Mixed-effects Models

Poisson mixed-effects models are a type of generalized linear model that accounts for the dependence between observations when they belong to the same group. In this context, groups refer to clusters or units, such as participants, words, or conditions. The model is particularly useful in analyzing count data with various levels of variation.

In the study design you mentioned, you built a Poisson mixed-effect model using glmmTMB (Generalized Linear Mixed Models for R), which examines eye fixations on words while reading. This type of analysis can provide valuable insights into how participants respond to different conditions and how these responses relate to the number of fixations.

Understanding Residual Plots

Residual plots are a crucial aspect of regression analysis, providing information about the relationship between observed and predicted values. In Poisson mixed-effects models, residual plots serve two primary purposes:

  1. Checking model fit: By examining the distribution of residuals, you can verify whether your model adequately captures the data’s underlying patterns.
  2. Detecting outliers and overdispersion: Residual plots help identify observations that lie far from the predicted line, suggesting potential issues such as outliers or overdispersion.

The Problem at Hand

Your study results in residual plots with significant KS test (Kolmogorov-Smirnov test) and outlier tests. This indicates that your model has some challenges in explaining the data’s distribution. We’ll delve into these issues, discuss how to interpret the residual plots, and explore strategies for removing outliers.

Understanding Significant KS Tests

The KS test assesses whether there is a significant difference between the empirical distribution function of the residuals and a specified cumulative distribution function (CDF). In other words, it compares the observed data against a hypothesized distribution. If the p-value associated with the KS test is below your chosen significance level, you can reject the null hypothesis that the residuals follow the CDF of interest.

Example: Interpreting KS Test Results

Suppose your study uses a Poisson distribution as the reference CDF for the residual plots. In this case, if the p-value from the KS test is < 0.05, it means you can reject the null hypothesis that the residuals follow the Poisson distribution. This result would indicate that your model may not be accurately capturing the underlying patterns in your data.

Removing Outliers

Identifying and removing outliers are essential steps when dealing with residual plots exhibiting significant KS test results or outlier tests. Here, we’ll explore how to identify outliers using several techniques and discuss strategies for their removal.

Identifying Outliers Using Visual Inspection

Visual inspection is a fundamental technique in identifying outliers. By plotting your residuals against the fitted values (residuals vs. predicted), you can spot observations that deviate significantly from the pattern. These points are likely candidates for outlier removal.

Example: Residuals vs. Predicted Values Plot

Let’s take a closer look at an example residual plot:

{< highlight language="R" >}
model &lt;- glmmTMB(fixation_count~ CONDITION + (1|PARTICIPANT) + (1|WORD) + offset(log(TRIAL_TIME_1)), 
    data = my_data, ziformula=~1, family = poisson)
summary(model)

res &lt;- simulateResiduals(model, plot = TRUE)
plot(residuals(res), fitted(model))

By examining this residual vs. predicted values plot, you can identify points that appear to be far from the overall trend.

Understanding DHARMa Plots and Red Vertical Lines

DHARMa plots are a tool used for checking model diagnostics in Poisson mixed-effects models. These plots help assess whether your model adequately accounts for overdispersion or underdispersion.

The red vertical line on these plots represents the expected value of the residual distribution if the variance function is correct (usually the identity function). If this line does not appear at the center of the histogram, you can suspect that the variance function may not be correctly specified.

Example: Understanding DHARMa Plots

Here’s an example DHARMa plot:

{< highlight language="R" >}
library(bayesplot)

model &lt;- glmmTMB(fixation_count~ CONDITION + (1|PARTICIPANT) + (1|WORD) + offset(log(TRIAL_TIME_1)), 
    data = my_data, ziformula=~1, family = poisson)
 DHARMa_plot(model)

When examining this plot, you should verify that the red vertical line appears at the center of the histogram.

Removing Overdispersion and Outliers

Overdispersion and outliers can both negatively impact model fit and accuracy. By implementing steps outlined in this section, you’ll be better equipped to address these challenges.

Strategies for Removing Outliers

When dealing with residual plots exhibiting significant KS test results or outlier tests, consider the following strategies:

  • Transformations: Consider applying transformations such as log transformation or square root transformation to the data.
  • Weighting: Implement weighting schemes where observations closer to the predicted line are assigned higher weights than those that lie farther away.

Strategies for Addressing Overdispersion

Overdispersion occurs when the variance of your model is greater than what would be expected given the mean. By implementing these steps, you’ll be able to better capture the underlying patterns in your data:

  • Variance component estimation: Implement procedures such as the varComp() function from the R package “broom” to estimate variance components.
  • Corrected standard errors: Adjust your model by incorporating corrected standard error estimates.

Conclusion

In this article, we explored Poisson mixed-effects models for analyzing count data. We delved into residual plots and discussed strategies for interpreting KS test results and removing outliers. By examining DHARMa plots and addressing overdispersion, you’ll be better equipped to develop accurate models that effectively capture the underlying patterns in your data.

Further Reading

If you’d like more information on Poisson mixed-effects models or need further guidance on residual plot interpretation and outlier removal, please see:


Last modified on 2024-05-31