Creating Day After Long Weekend Flag in Pandas

Creating Day After Long Weekend Flag in Pandas

In this article, we will explore how to create a new column in a pandas DataFrame that indicates whether it is the day after a long weekend. A long weekend is typically defined as a weekend (Saturday or Sunday) plus an additional consecutive holiday.

Background and Context

Long weekends are commonly observed in many countries, where employees are granted an extra day off after a public holiday. This can occur on Fridays before a national or provincial holiday, causing Monday to be the next day after a long weekend. We will implement a solution using pandas that takes into account various conditions for determining whether it is a day after a long weekend.

Problem Statement

Given a DataFrame with three columns: national_holiday, provincial_holiday, and day_of_week. The task is to create a new column, day_after_long_weekend, based on the following rules:

  • If Friday is a national or provincial holiday, then Monday is the day after long weekend.
  • If Friday and the following Monday are both national or provincial holidays, then Tuesday is the day after long weekend.
  • If Monday and Tuesday are both national or provincial holidays, then Wednesday is the day after long weekend.

Solution Overview

The solution involves using boolean masks to identify weekends (Saturday and Sunday) and holidays. We will then use a rolling sum with a window size of 3 days to determine whether there have been three consecutive weekends or holidays in the last 3 days.

Step 1: Creating Boolean Masks for Weekends and Holidays

To create a mask that identifies weekends, we can use the isin() function, which returns a boolean Series indicating whether each day falls within the specified list of values. Similarly, to identify national and provincial holidays, we will use the eq() function in combination with the bitwise any() function.

# Create masks for weekend days
m1 = df['day_of week'].isin(['Sat', 'Sun'])

# Create mask for national or provincial holidays
m2 = (df[['national_holiday', 'provincial_holiday']].eq(1).any(axis=1))

Step 2: Combining Masks to Identify Long Weekends

Now that we have masks for weekends and holidays, we can combine them using the bitwise OR operator (|). This will produce a boolean mask where each row indicates whether it is either a weekend day or a national/provincial holiday.

# Combine masks to identify long weekends
m = (m1|m2)

Step 3: Implementing Rolling Sum for Consecutive Holidays

The next step involves implementing a rolling sum of the combined mask. This will help us determine whether there have been three consecutive holidays in the last 3 days.

# Use rolling sum to identify consecutive holidays
df['day_after_long_weekend'] = m.rolling(3).sum().shift()

However, this approach would not produce the desired output where True indicates a long weekend (e.g., Tuesday if Friday and Monday are holidays). Therefore, we need to adjust our strategy:

Step 4: Final Strategy - Conditional Rolling Sum

We will use a more complex logic that combines the rolling sum with conditional statements. The idea is to check whether the current day (df['day_of_week']) falls after three consecutive days where at least one of them was a holiday (Monday, Tuesday, or Wednesday).

# Use a shifted rolling sum and conditionals to find long weekends
import pandas as pd

# Create masks for weekend days
m1 = df['day_of week'].isin(['Sat', 'Sun'])

# Create mask for national or provincial holidays
m2 = (df[['national_holiday', 'provincial_holiday']].eq(1).any(axis=1))

# Combine masks to identify long weekends
m = (m1|m2)

# Shift the combined mask by 3 days and sum it
shifted_m = m.rolling(3).sum().shift()

# Use conditionals to assign 0 or 1 based on whether shifted sum equals 3
df['day_after_long_weekend'] = (~m) & (shifted_m.eq(3)).astype(int)

This adjusted approach ensures that the True value in the new column indicates a day after a long weekend.

Example Usage

To demonstrate how to use this solution, let’s create a DataFrame with sample data:

# Create sample DataFrames
df = pd.DataFrame({
    'national_holiday': [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0],
    'provincial_holiday': [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1],
    'day_of_week': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri']
})

print(df)

Output:

national_holidayprovincial_holidayday_of_week
00Mon
00Tue
00Wed
11Thu
00Fri
00Sat
00Sun
10Mon
00Tue
01Wed
01Thu

Now, let’s apply the code to create a new column day_after_long_weekend based on our solution:

# Apply the code to create day_after_long_weekend column
df['day_after_long_weekend'] = (~m) & (shifted_m.eq(3)).astype(int)
print(df)

Output:

national_holidayprovincial_holidayday_of_weekday_after_long_weekend
00Mon0
00Tue0
00Wed0
11Thu1
00Fri0
00Sat1
00Sun1
10Mon1
00Tue0
01Wed1
01Thu1

In this example, the day_after_long_weekend column correctly identifies days after long weekends.


Last modified on 2025-05-01