Merging Two Tables to Find Total Number of Books Sold for Each Day

SQL Query to Find Total Number of Books Sold for Each Day by Merging Two Tables

In this article, we will explore a common challenge faced by data analysts and developers: merging two tables based on one or more common columns. In this case, our goal is to find the total number of books sold for each day for a specific product.

Understanding the Data

We are given two tables: transactions and catalog. The transactions table contains information about each transaction, including the date of the transaction (txn_day) and the quantity of products sold (quantity). The catalog table contains information about each product, including its product_id, title_name.

The query provided attempts to find the total number of books sold for each day for a specific product by joining the two tables on the product_id. However, this approach leads to incorrect results.

Analysis of Incorrect Results

To understand why the initial query produces incorrect results, let’s analyze the data. We notice that each product_id appears in multiple rows of the transactions table, corresponding to different market_ids. This means that the sum of quantities for a specific product_id is actually tripled.

Correct Approach: Merging Two Tables on Multiple Common Columns

To correctly find the total number of books sold for each day for a specific product, we need to merge the two tables based on multiple common columns: product_id and market_id. This allows us to filter out duplicate entries and calculate the correct sum of quantities.

The Correct SQL Query

The corrected SQL query is as follows:

SELECT   t.txn_day
        ,t.product_id
        ,c.title_name
        ,SUM(t.quantity) AS quantity
FROM     catalog c
         INNER JOIN transactions t ON c.product_id = t.product_id AND c.market_id = t.market_id
GROUP BY t.txn_day, t.product_id, c.title_name
ORDER BY c.title_name;

Explanation of the Query

Here’s a breakdown of the corrected query:

  1. SELECT statement: We select the required columns (txn_day, product_id, and title_name) from both tables.
  2. FROM clause: We specify the two tables to join, catalog (alias c) and transactions (alias t).
  3. INNER JOIN clause: We perform an inner join on the common columns product_id and market_id. This ensures that only matching rows from both tables are included in the result set.
  4. GROUP BY clause: We group the results by the joined columns (txn_day, product_id, and title_name) to calculate the sum of quantities for each group.
  5. ORDER BY clause: Finally, we order the results by the title_name column.

Example Output

Running this corrected query on the sample data will produce the following output:

3/1/2019    B0001   Harry Potter 1  4
3/1/2019    B0002   Harry Potter 2  3
3/1/2019    B0003   Harry Potter 3  1

As expected, the results accurately reflect the total number of books sold for each day for each product.

Conclusion

In this article, we explored a common challenge in data analysis and developed an efficient SQL query to merge two tables based on multiple common columns. By understanding the importance of specifying common columns in the join clause and grouping by relevant fields, we can ensure accurate results when working with merged datasets.


Last modified on 2023-09-08