Saturday, January 25, 2025

SQL Intermediate Concepts (Joins, Aggregations, Grouping, Sub queries, Window functions)


ebook - Mastering SQL: A Comprehensive Guide to Database Mastery

SQL Intermediate Concepts

Joins

Joins are a fundamental concept in SQL that allows you to combine data from two or more tables based on a related column. This is incredibly useful when dealing with normalized databases where data is spread across multiple tables. In this chapter, we will explore the different types of joins and their use cases, along with practical examples.


What is a Join?

A join is a SQL operation that retrieves data from multiple tables by combining rows based on a related column between them. The related column usually contains the same data type and holds the relationship between the tables.

For example, consider two tables:

employees Table:

employee_id

name

department_id

1

John Doe

1

2

Jane Smith

2

3

Bob Brown

NULL

4

Alice White

3



departments Table:

department_id

department_name

1

Sales

2

HR

3

IT

4

Marketing

To retrieve the name of each employee along with their department name, you would use a join to combine these tables.


Types of Joins

SQL supports several types of joins, each with a specific purpose:

1. INNER JOIN

The INNER JOIN retrieves records that have matching values in both tables. Rows without matches in either table are excluded.

Syntax:

SELECT columns
FROM table1
INNER JOIN table2
ON table1.column = table2.column;

Example: Retrieve the names of employees along with their department names:

SELECT employees.name, departments.department_name
FROM employees
INNER JOIN departments
ON employees.department_id = departments.id;

Result:

name

department_name

John Doe

Sales

Jane Smith

HR

Alice White

IT


2. LEFT JOIN (or LEFT OUTER JOIN)

The LEFT JOIN retrieves all records from the left table, and the matching records from the right table. If no match is found, NULL values are returned for columns from the right table.

Syntax:

SELECT columns
FROM table1
LEFT JOIN table2
ON table1.column = table2.column;

Example: Retrieve all employees and their department names, including those who are not assigned to a department:

SELECT employees.name, departments.department_name
FROM employees
LEFT JOIN departments
ON employees.department_id = departments.id;

Result:

name

department_name

John Doe

Sales

Jane Smith

HR

Bob Brown

NULL

Alice White

IT


3. RIGHT JOIN (or RIGHT OUTER JOIN)

The RIGHT JOIN retrieves all records from the right table, and the matching records from the left table. If no match is found, NULL values are returned for columns from the left table.

Syntax:

SELECT columns
FROM table1
RIGHT JOIN table2
ON table1.column = table2.column;

Example: Retrieve all departments and the employees assigned to them:

SELECT employees.name, departments.department_name
FROM employees
RIGHT JOIN departments
ON employees.department_id = departments.id;

Result:

name

department_name

John Doe

Sales

Jane Smith

HR

Alice White

IT

NULL

Marketing


4. FULL JOIN (or FULL OUTER JOIN)

The FULL JOIN retrieves all records when there is a match in either the left or right table. If there is no match, NULL values are returned for the unmatched columns.

Syntax:

SELECT columns
FROM table1
FULL JOIN table2
ON table1.column = table2.column;

Example: Retrieve all employees and all departments, including those without matches:

SELECT employees.name, departments.department_name
FROM employees
FULL JOIN departments
ON employees.department_id = departments.id;

Result:

name

department_name

John Doe

Sales

Jane Smith

HR

Bob Brown

NULL

Alice White

IT

NULL

Marketing


5. CROSS JOIN

The CROSS JOIN returns the Cartesian product of two tables, meaning every row in the first table is combined with every row in the second table.

Syntax:

SELECT columns
FROM table1
CROSS JOIN table2;

Example: Retrieve all possible pairs of employees and departments:

SELECT employees.name, departments.department_name
FROM employees
CROSS JOIN departments;

Result:


name

department_name

John Doe

Sales

John Doe

HR

John Doe

IT

John Doe

Marketing

Jane Smith

Sales

Jane Smith

HR

Jane Smith

IT

Jane Smith

Marketing

Bob Brown

Sales

Bob Brown

HR

Bob Brown

IT

Bob Brown

Marketing

Alice White

Sales

Alice White

HR

Alice White

IT

Alice White

Marketing



In this chapter, you learned about the various types of SQL joins, including INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, and CROSS JOIN. Each join type has its specific use cases and helps in combining data from multiple tables efficiently. In the next chapter, we will explore SQL Aggregations and Grouping to analyze and summarize data.

SQL Aggregations and Grouping

Aggregation functions and grouping are essential tools in SQL for summarizing and analyzing data. They allow you to calculate totals, averages, counts, and more, across rows of data. In this chapter, we will explore SQL aggregation functions and how to use the GROUP BY clause effectively.


Aggregation Functions

SQL provides a variety of aggregation functions to perform calculations on a set of values and return a single value. Here are some commonly used functions:

1. COUNT

The COUNT function returns the number of rows that match a specified condition.

Syntax:

SELECT COUNT(column)
FROM table;

Example: Count the number of employees in the employees table:

SELECT COUNT(*) AS total_employees
FROM employees;

Result:

total_employees

4


2. SUM

The SUM function calculates the total sum of a numeric column.

Syntax:

SELECT SUM(column)
FROM table;

Example: Calculate the total salaries of employees:

SELECT SUM(salary) AS total_salaries
FROM employees;

Result:

total_salaries

250000


3. AVG

The AVG function calculates the average value of a numeric column.

Syntax:

SELECT AVG(column)
FROM table;

Example: Calculate the average salary of employees:

SELECT AVG(salary) AS average_salary
FROM employees;

Result:

average_salary

62500


4. MIN

The MIN function retrieves the smallest value in a column.

Syntax:

SELECT MIN(column)
FROM table;

Example: Find the lowest salary in the employees table:

SELECT MIN(salary) AS lowest_salary
FROM employees;

Result:

lowest_salary

40000


5. MAX

The MAX function retrieves the largest value in a column.

Syntax:

SELECT MAX(column)
FROM table;

Example: Find the highest salary in the employees table:

SELECT MAX(salary) AS highest_salary
FROM employees;

Result:

highest_salary

90000


The GROUP BY Clause

The GROUP BY clause groups rows that have the same values in specified columns into summary rows, like "total salary by department." It is often used with aggregation functions.

Syntax:

SELECT column, aggregation_function(column)
FROM table
GROUP BY column;

Example:

Calculate the total salary for each department:

Tables:

employees Table:

employee_id

name

department_id

salary

1

John Doe

1

50000

2

Jane Smith

2

60000

3

Bob Brown

NULL

40000

4

Alice White

1

90000

departments Table:

department_id

department_name

1

Sales

2

HR

3

IT

Query:

SELECT departments.department_name, SUM(employees.salary) AS total_salary
FROM employees
INNER JOIN departments
ON employees.department_id = departments.department_id
GROUP BY departments.department_name;

Result:

department_name

total_salary

Sales

140000

HR

60000


HAVING Clause

The HAVING clause is used to filter groups based on aggregate functions. It works similarly to the WHERE clause but is specifically for grouped data.

Syntax:

SELECT column, aggregation_function(column)
FROM table
GROUP BY column
HAVING condition;

Example:

Find departments with a total salary greater than 100,000:

Query:

SELECT departments.department_name, SUM(employees.salary) AS total_salary
FROM employees
INNER JOIN departments
ON employees.department_id = departments.department_id
GROUP BY departments.department_name
HAVING SUM(employees.salary) > 100000;

Result:

department_name

total_salary

Sales

140000

 



Subqueries and Nested Queries in SQL

Subqueries, also known as nested queries, are queries embedded within other queries. They allow for more complex data retrieval in a simple and organized manner. The primary reason for using subqueries is to break down large, complex SQL operations into manageable, logically structured parts. Subqueries can simplify the query structure and make it more readable by avoiding redundant joins.

Why use subqueries?

  • Modular Approach: They allow breaking down complex logic into smaller, reusable parts.

  • Simplification: Subqueries can simplify the query syntax by embedding conditions directly in the query.

  • Performance: In certain cases, subqueries can optimize query performance, though there are caveats.

Types of Subqueries

Subqueries can be categorized based on their placement within the query: in the SELECT, WHERE, or FROM clauses. Let's explore each type with examples.

Subqueries in SELECT Clause

Subqueries in the SELECT clause are often used when you want to retrieve a calculated value that depends on another table. This allows you to create columns dynamically based on other query results.

Example:

SELECT employee_id,
      (SELECT department_name
        FROM departments
        WHERE department_id = employees.department_id) AS department_name
FROM employees;


In this example, the subquery fetches the department name for each employee. The result is included as a new column (department_name) in the outer query.

Subqueries in WHERE Clause

The most common use of subqueries is in the WHERE clause. This allows you to filter the results based on conditions derived from another query.

Example:

SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id IN
    (SELECT department_id
    FROM departments
    WHERE location_id = 1400);


Here, the subquery returns a list of department IDs where the location_id is 1400, and the outer query uses that list to filter employees working in those departments.

Subqueries in FROM Clause

A subquery in the FROM clause is treated as a derived table (a virtual table) and can be used to retrieve data just like any other table in the database.

Example:

SELECT dept_name, avg_salary
FROM (SELECT department_name AS dept_name, AVG(salary) AS avg_salary
      FROM employees
      GROUP BY department_name) AS subquery
GROUP BY dept_name;

In this example, the subquery calculates the average salary for each department, and the outer query groups by the department name to return the results.

Correlated vs Non-Correlated Subqueries

Subqueries can either be correlated or non-correlated, and the key difference lies in whether the subquery references columns from the outer query.

Non-Correlated Subqueries

A non-correlated subquery is independent of the outer query. It can be executed on its own because it doesn’t rely on the outer query’s data.

Example:

SELECT first_name, last_name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

The subquery (SELECT AVG(salary) FROM employees) does not reference the outer query. It returns the average salary of all employees, and the outer query returns employees with a salary higher than that average.

Correlated Subqueries

A correlated subquery references one or more columns from the outer query, meaning it cannot be executed independently. For each row processed by the outer query, the subquery is executed.

Example:

SELECT e.first_name, e.last_name
FROM employees e
WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id);

In this example, the subquery (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) depends on the department_id from the outer query. This means the subquery will run once for each row in the outer query.

Nested Queries

Nested queries involve placing one subquery inside another. These are also known as nested subqueries. This allows you to use the results of a subquery as the input for another subquery.

Example:

SELECT department_name
FROM departments
WHERE department_id IN
    (SELECT department_id
    FROM employees
    WHERE salary >
        (SELECT AVG(salary) FROM employees));


Here, the innermost subquery calculates the average salary, the middle subquery filters employees earning above the average salary, and the outer query retrieves departments where such employees work.

Performance Considerations

While subqueries can make queries more intuitive, they can also introduce performance challenges, especially when nested or used in large datasets.

  • Costly Execution: Correlated subqueries can be particularly expensive because the subquery is executed multiple times—once for each row in the outer query.

  • Optimization with JOINs: In many cases, subqueries can be replaced with JOIN operations, which may perform better because they allow the database engine to process data more efficiently.

Example (Using JOIN instead of Subquery):

SELECT e.first_name, e.last_name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE e.salary > (SELECT AVG(salary) FROM employees);

In this example, instead of using a subquery in the WHERE clause, we use a JOIN to combine the tables, and this can sometimes lead to better performance.

Best Practices for Using Subqueries

While subqueries offer flexibility, it’s important to follow best practices to ensure your queries are efficient and maintainable.

Avoid Redundant Subqueries: If the same subquery is used multiple times in a query, consider refactoring the query or using temporary tables to store intermediate results.

Consider EXISTS Instead of IN: When dealing with correlated subqueries, using EXISTS may be more efficient than IN, as EXISTS stops searching as soon as it finds a match.

Example using EXISTS:

SELECT first_name, last_name
FROM employees e
WHERE EXISTS (SELECT 1 FROM departments d WHERE e.department_id = d.department_id AND d.location_id = 1400);

Use JOIN When Possible: Whenever appropriate, consider replacing subqueries with JOIN operations, as they tend to be more efficient for large datasets.

In this chapter, we explored subqueries and nested queries, which allow for more complex SQL operations. Subqueries can be used in the SELECT, WHERE, and FROM clauses, and they come in two types: correlated and non-correlated. We also discussed how to use nested queries for multiple levels of data retrieval and the importance of considering performance when working with subqueries.

Key Takeaways:

  • Subqueries help simplify complex SQL queries.

  • Correlated subqueries depend on the outer query, while non-correlated subqueries do not.

  • Performance can be impacted, so consider using JOINs or EXISTS when possible.


Set Operations in SQL

Set operations in SQL allow you to combine the results of two or more SELECT statements into a single result set. These operations are based on standard set theory concepts and are widely used to compare and manipulate data from multiple queries.

Key Set Operations in SQL:

  • UNION

  • UNION ALL

  • INTERSECT

  • EXCEPT (or MINUS in some databases)

Understanding Set Operation Rules

Before diving into the specifics of each operation, it is essential to understand the general rules that govern set operations:

  1. Column Count and Data Types:

    • The number of columns in all SELECT statements involved must be the same.

    • Corresponding columns must have compatible data types.

  2. Order of Rows:

    • The result of a set operation does not guarantee a specific row order unless an ORDER BY clause is explicitly used in the final result.

  3. Duplicates:

    • Some set operations remove duplicates by default (e.g., UNION), while others retain all rows (e.g., UNION ALL).


Set Operations in Detail

UNION

The UNION operation combines the result sets of two or more SELECT statements, removing duplicate rows by default.

Syntax:

SELECT column1, column2
FROM table1
UNION
SELECT column1, column2
FROM table2;

Example:

SELECT first_name
FROM employees
WHERE department_id = 10
UNION
SELECT first_name
FROM employees
WHERE job_id = 'IT_PROG';

This query retrieves the unique first names of employees who either belong to department 10 or have the job ID 'IT_PROG'.

Key Points:

  • Removes duplicates from the combined result.

  • All SELECT statements must return the same number of columns with compatible data types.

UNION ALL

The UNION ALL operation is similar to UNION, but it includes all rows from the result sets, including duplicates.

Syntax:

SELECT column1, column2
FROM table1
UNION ALL
SELECT column1, column2
FROM table2;

Example:

SELECT first_name
FROM employees
WHERE department_id = 10
UNION ALL
SELECT first_name
FROM employees
WHERE job_id = 'IT_PROG';

This query retrieves all first names from both conditions, including duplicates.

Key Points:

  • Retains duplicate rows.

  • Often faster than UNION since no duplicate elimination is performed.

INTERSECT

The INTERSECT operation returns only the rows that are common to the result sets of the SELECT statements.

Syntax:

SELECT column1, column2
FROM table1
INTERSECT
SELECT column1, column2
FROM table2;

Example:

SELECT first_name
FROM employees
WHERE department_id = 10
INTERSECT
SELECT first_name
FROM employees
WHERE job_id = 'IT_PROG';

This query retrieves the first names of employees who belong to department 10 and also have the job ID 'IT_PROG'.

Key Points:

  • Returns only the rows present in all result sets.

  • Not supported by all databases (e.g., MySQL does not support INTERSECT directly).

EXCEPT (or MINUS)

The EXCEPT operation returns the rows from the first SELECT statement that are not present in the second SELECT statement. In some databases like Oracle, this operation is called MINUS.

Syntax:

SELECT column1, column2
FROM table1
EXCEPT
SELECT column1, column2
FROM table2;

Example:

SELECT first_name
FROM employees
WHERE department_id = 10
EXCEPT
SELECT first_name
FROM employees
WHERE job_id = 'IT_PROG';

This query retrieves the first names of employees who belong to department 10 but do not have the job ID 'IT_PROG'.

Key Points:

  • Returns rows from the first query that are not in the second query.

  • Removes duplicates by default.

  • Not supported by all databases (e.g., MySQL does not support EXCEPT but can achieve similar results with NOT IN).

Window Functions in SQL

Window functions in SQL perform calculations across a set of rows related to the current row, without collapsing rows into a single result. These functions are useful for ranking, cumulative totals, and running averages.

Key Elements:

  • PARTITION BY: Divides rows into groups.

  • ORDER BY: Defines the order within each group.

  • OVER(): Specifies the window for the function.

Common Window Functions

Ranking Functions:

  • ROW_NUMBER(): Assigns a unique number to rows.

SELECT employee_id, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM employees;

  • RANK(): Assigns ranks, skipping numbers for ties.

  • DENSE_RANK(): Similar to RANK(), but no gaps in ranks.

Aggregate Functions:

  • SUM(): Calculates cumulative totals.

SELECT employee_id, SUM(salary) OVER (PARTITION BY department_id ORDER BY employee_id) AS cumulative_salary
FROM employees;

  • AVG(): Computes running averages.

  • COUNT(): Counts rows in the frame.

First/Last Value Functions:

  • FIRST_VALUE(): Returns the first value in the frame.

  • LAST_VALUE(): Returns the last value in the frame.

LEAD/LAG Functions:

  • LEAD(): Accesses the next row.

  • LAG(): Accesses the previous row.

SELECT employee_id, LAG(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS previous_salary
FROM employees;

Practical Example

Running Total of Sales:

SELECT sales_id, region, SUM(sales_amount) OVER (PARTITION BY region ORDER BY sales_id) AS running_total
FROM sales;

Window functions enable complex analytics while preserving row-level data. They are essential for advanced reporting and are widely supported across databases.

ebook - Mastering SQL: A Comprehensive Guide to Database Mastery


No comments:

Search This Blog