Consecutive Days User Login: A Deep Dive into Gaps and Islands
In this article, we will explore a SQL query to calculate the logic of day_in_row field in a table called FactDailyUsers. The table contains users who were active on a specific date with a specific action they have made (aggregate total actions per row). We’ll break down the problem step by step and explain all technical terms, processes, and concepts used in the solution.
Problem Statement
The problem statement is as follows:
Given a table called FactDailyUsers which contains the users who were active on a specific date with the specific action they have made (aggregate total actions per row), write a query that calculates the logic of day_in_row field. The day_in_row shows consecutive days for user- users that return day after day to the website per row.
Table Structure
Here is an example table structure:
| user_id | date | Action | day_in_row | total_actions |
|---|---|---|---|---|
| 1123 | 21/06/2018 | Pageview | 1 | 1 |
| 3321 | 20/06/2018 | Pageview | 1 | 6 |
| 3321 | 20/06/2018 | Click | 1 | 2 |
| 2122 | 19/05/2018 | Pageview | 1 | 1 |
| 2212 | 21/06/2018 | Pageview | 2 | 1 |
| 2212 | 20/06/2018 | Pageview | 1 | 1 |
| 2212 | 24/06/2018 | Pageview | 1 | 4 |
| 2212 | 24/06/2018 | click | 1 | 2 |
| 3321 | 21/06/2018 | pageview | 2 | 4 |
| 3321 | 21/06/2018 | click | 2 | 3 |
| 3321 | 17/06/2018 | pageview | 1 | 1 |
Solution Overview
The solution involves breaking down the problem into smaller sub-problems and solving each one separately. We’ll use a combination of Common Table Expressions (CTEs) to solve this problem.
Step 1: Create CTEs for Data Preparation
CREATE TABLE d AS (
SELECT MIN(d.date) AS mn, MAX(d.date) AS mx
FROM FactDailyUsers
);
CREATE TABLE dt_table AS (
SELECT n.n, DATEADD(DAY, n.n - 1, m.mn) AS d
FROM (
SELECT MIN(d.date) AS mn, MAX(d.date) AS mx
FROM FactDailyUsers
) AS m
JOIN dbo.Numbers AS n
on n.n <= DATEDIFF(DAY, m.mn, m.mx)+1
);
CREATE TABLE g AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY d.USER_ID, d.ACTION ORDER BY d.date) AS r
FROM FactDailyUsers
);
CREATE TABLE i AS (
SELECT *, r - n AS island
FROM g
JOIN dt_table AS dt
ON dt.d = g.date
);
Step 2: Calculate day_in_row
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY
i.USER_ID,
i.ACTION,
i.island
ORDER BY i.date
) AS calculated_day_in_row
FROM i
ORDER BY USER_ID, ACTION, date;
Explanation of the Solution
The solution involves four CTEs:
- d: This CTE calculates the minimum and maximum dates in the FactDailyUsers table.
- dt_table: This CTE creates a mini calendar table with contiguous dates.
- g: This CTE enumerates the data as it is based on the criteria you’ve outlined (same user performing the same action on successive days).
- i: This CTE calculates the difference between consecutive rows for each user and action.
The final query selects all columns from the i CTE, along with a new column called calculated_day_in_row that uses the ROW_NUMBER() function to assign a unique number to each row within each partition of the data.
Advice
- Always break down complex problems into smaller sub-problems.
- Use CTEs to simplify your queries and make them more readable.
- Join tables carefully, making sure that the columns you’re joining on exist in both tables.
- Use the ROW_NUMBER() function to assign unique numbers to rows within each partition of the data.
By following these steps and using a combination of CTEs, you can solve complex problems like this one and improve your SQL skills.
Last modified on 2024-02-25