Resolving the `_check_google_client_version` Import Error in Airflow 1.10.9

Airflow 1.10.9 - cannot import name ‘_check_google_client_version’ from ‘pandas_gbq.gbq’

Problem Overview

In this blog post, we will delve into a specific issue that occurred on an Airflow cluster running version 1.10.9, where the pandas_gbqgbq 0.15.0 release caused problems due to changes in the import statement of _check_google_client_version from pandas_gbq.gbq. We’ll explore how this issue can be resolved by looking into Airflow’s packaging and constraint files.

Background

Airflow is a popular open-source platform for programmatically managing workflows and tasks. When installing Airflow, users often rely on requirements.txt files to manage dependencies. However, the package managers used in these files might not always reflect the latest changes or versions available.

In this specific case, we’re dealing with Airflow version 1.10.9 and pandas_gbq version 0.15.0, which has introduced a breaking change that caused issues when importing pandas_gbq.gbq.

The Issue

The error message displayed on the Web UI is:

cannot import name '_check_google_client_version' from 'pandas_gbq.gbq'

This indicates that the Python interpreter cannot find the _check_google_client_version function in the pandas_gbq package.

Troubleshooting Steps

1. Understand Airflow’s Packaging

Before we dive into solutions, it’s essential to understand how Airflow packages its dependencies. The requirements.txt file is used as a reference for installing Airflow and its dependencies. However, this file might not always reflect the most recent versions available.

Installation Script Approach

One recommended approach to install Airflow in a repeatable manner is by using the installation script provided on the official Airflow documentation [1]. This script uses constraints files to ensure that all required packages are installed with specific versions.

Since we’re running Airflow 1.10.9, which might be somewhat outdated, it’s worth considering upgrading to a newer version of Airflow to avoid compatibility issues like this one.

# Create a constraints.txt file from pip freeze
pip freeze > requirements/constraints.txt

# Modify the constraints.txt file
# Replace '>=0.15.0' with '==0.14.1'
cat requirements/constraints.txt | grep -v '^pandas-gbq\|apache-airflow$' | sed '/^pandas-gbq\s*==\s*0.15.0/,/^$/d' > modified-constraints.txt

# Use the modified constraints file during installation
pip install --constraint=modified-constraints.txt apache-airflow[aws,celery,crypto,gcp,jdbc,mysql,password,postgres,slack,statsd]==1.10.9

2. Manually Modifying the requirements.txt File

Another approach is to manually modify the requirements.txt file to specify a particular version of pandas_gbq.

# Modify the requirements.txt file
apache-airflow[aws,celery,crypto,gcp,jdbc,mysql,password,postgres,slack,statsd]==1.10.9
pandas==0.25.3
pandas-gbq==0.14.1

This method ensures that only pandas_gbq version 0.14.1 is installed alongside Airflow 1.10.9.

Conclusion

Resolving the cannot import name '_check_google_client_version' from 'pandas_gbq.gbq' error in Airflow 1.10.9 requires understanding how Airflow packages its dependencies and taking steps to ensure compatibility with pandas_gbq version 0.15.0. By using installation scripts or modifying requirements.txt files, users can install Airflow and its dependencies with specific versions.

Upgrading to a newer version of Airflow is also recommended to avoid potential issues in the future.

References

[1] Apache Airflow Documentation: Installation Script


Last modified on 2023-05-26