Optimizing a Complex SQL Query to Fetch Friends' Email Addresses by Input Email

SQL Query to Get the List of Users by Email

In this article, we will explore a complex SQL query that fetches the list of friends’ email addresses based on a provided input email. We will start with understanding the sample data and then move on to explaining the given solution, its shortcomings, and how to improve it.

Understanding the Sample Data

We have two tables: users and user_relations. The users table contains user information such as user_id and email. The user_relations table stores the relationship between users, including the requestor_user_id, receiver_user_id, and friend_status.

| users |

user_idemail
1[email protected]
2[email protected]
3[email protected]
4[email protected]
5[email protected]
6[email protected]
7[email protected]

| user_relations |

user_relation_idrequestor_user_idreceiver_user_idfriend_status
1241
2261
3271
452NULL
557NULL
672NULL
7741
8751
9761
10421
11431
12451
13461
14471

We are given an example input email: [email protected]. We need to find the email addresses of users who have a friend status of 1 with the user having this email.

The Given Solution

The provided solution uses two JOINs, which seems inefficient. Let’s break down the query and understand its shortcomings:

SELECT 
case when u.user_id = r.receiver_user_id then requestor.email else receiver.email end as friend_email
FROM users u
JOIN user_relations r 
  ON (r.requestor_user_id = u.user_id OR r.receiver_user_id = u.user_id)
AND r.friend_status = 1 
LEFT JOIN users requestor ON requestor.user_id = r.requestor_user_id
LEFT JOIN users receiver ON receiver.user_id = r.receiver_user_id
WHERE u.email in ('<a>[email protected]</a>') 
GROUP BY friend_email
HAVING COUNT(DISTINCT u.user_id) > 1

The query has several issues:

  • It uses two LEFT JOINs, which can lead to incorrect results if the table is not properly indexed.
  • The WHERE clause filters on individual user emails instead of using a single email as an input parameter. This makes it difficult to reuse the query with different inputs.

Improving the Solution

As suggested in the provided solution, we don’t care whether friends are receivers or requestors. Therefore, we can simplify the query by removing the two JOINs and focusing on finding users who have a friend status of 1:

SELECT DISTINCT
 u2.email as friend_email
FROM users u
INNER JOIN user_relations r 
  ON (r.requestor_user_id = u.user_id OR r.receiver_user_id = u.user_id) 
INNER JOIN users u2
 ON (r.requestor_user_id = u2.user_id OR r.receiver_user_id = u2.user_id)
     AND u2.user_id <> u.user_id
WHERE u.email = '<a>[email protected]</a>'
      AND r.friend_status = 1 

However, we can further improve this query by removing the inner JOIN with u and only joining with r. This reduces the number of rows being processed:

SELECT DISTINCT
 u2.email as friend_email
FROM user_relations r
INNER JOIN users u2
 ON (r.requestor_user_id = u2.user_id OR r.receiver_user_id = u2.user_id)
     AND u2.user_id <> r.requestor_user_id
WHERE r.friend_status = 1 
      AND u2.user_id IN (
        SELECT user_id
        FROM users
        WHERE email = '<a>[email protected]</a>'
      )

This query is more efficient and easier to understand.

Conclusion

In this article, we explored a complex SQL query that fetches the list of friends’ email addresses based on a provided input email. We analyzed the given solution, its shortcomings, and how to improve it. By removing unnecessary JOINs, using IN instead of filtering individual user emails, and reducing the number of rows being processed, we can create more efficient and easier-to-understand SQL queries.


Last modified on 2023-08-22