Share it!

Introduction

SQL, or Structured Query Language, is essential for data manipulation across various platforms, and SAS provides powerful SQL capabilities through PROC SQL. This procedure enables SAS professionals to manage and transform data efficiently using SQL within the SAS environment. In this guide, we’ll explore how to use PROC SQL for data manipulation in SAS, covering its syntax, common applications, and advanced techniques for effective data management.

Why Use PROC SQL for Data Manipulation in SAS?

PROC SQL is favored by many SAS professionals for several reasons:

  1. Flexibility: It allows you to perform data retrieval, transformation, and summarization in one step.
  2. Compatibility: SQL skills are highly transferable, allowing users to leverage their knowledge across platforms.
  3. Efficiency: PROC SQL handles complex data manipulation tasks with minimal coding.

Using PROC SQL in SAS can streamline tasks, improve code readability, and allow for complex data transformations within a single procedure.

Basic Syntax of PROC SQL

PROC SQL in SAS follows similar SQL syntax used in other database systems. The basic syntax for querying and manipulating data with PROC SQL is as follows:

SAS
PROC SQL;
    SELECT column1, column2
    FROM dataset_name
    WHERE conditions;
QUIT;

Key PROC SQL Statements for Data Manipulation

1. SELECT Statement

The SELECT statement is essential in SQL and enables you to specify the columns you want to retrieve. The statement includes options for filtering, sorting, and grouping data.

Example:

SAS
PROC SQL;
    SELECT name, age, salary
    FROM employees
    WHERE department = 'Sales';
QUIT;

This query retrieves the name, age, and salary columns from the employees table where department equals ‘Sales’.

2. WHERE Clause

The WHERE clause filters rows based on specified conditions. It is one of the most commonly used clauses in PROC SQL.

Example:

SAS
PROC SQL;
    SELECT *
    FROM employees
    WHERE age > 30;
QUIT;

This query fetches all columns from employees for rows where age is greater than 30.

3. ORDER BY Clause

The ORDER BY clause sorts the results by one or more columns, either in ascending or descending order.

Example:

SAS
PROC SQL;
    SELECT name, age
    FROM employees
    ORDER BY age DESC;
QUIT;

This query retrieves the name and age of employees and orders them by age in descending order.

4. GROUP BY and HAVING Clauses

The GROUP BY clause aggregates data by specified columns, often used with functions like COUNT, SUM, AVG, etc. The HAVING clause further filters these grouped results.

Example:

SAS
PROC SQL;
    SELECT department, AVG(salary) AS average_salary
    FROM employees
    GROUP BY department
    HAVING average_salary > 50000;
QUIT;

This query calculates the average salary for each department and only includes departments where the average salary exceeds 50,000.

5. JOINs in PROC SQL

PROC SQL supports INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN, enabling data combination from multiple datasets based on common columns.

Example:

SAS
PROC SQL;
    SELECT a.name, a.department, b.project_name
    FROM employees a
    LEFT JOIN projects b
    ON a.emp_id = b.emp_id;
QUIT;

This query performs a LEFT JOIN between the employees table and projects table on emp_id.

Common Data Manipulation Tasks Using PROC SQL

1. Creating New Columns with Calculated Values

PROC SQL allows you to create new columns using arithmetic operations and functions directly in the SELECT clause.

Example:

SAS
PROC SQL;
    SELECT name, salary, salary * 0.1 AS bonus
    FROM employees;
QUIT;

This code calculates a 10% bonus on the salary for each employee and displays it in a new column called bonus.

2. Removing Duplicate Rows with DISTINCT

The DISTINCT keyword eliminates duplicate rows in the output.

Example:

SAS
PROC SQL;
    SELECT DISTINCT department
    FROM employees;
QUIT;

This query returns a list of unique departments from the employees table.

3. Creating Tables and Views

PROC SQL can create tables and views, enabling you to save results or create reusable datasets.

Example of Creating a Table:

SAS
PROC SQL;
    CREATE TABLE high_salary AS
    SELECT name, salary
    FROM employees
    WHERE salary > 60000;
QUIT;

This code creates a new table high_salary containing employees with salaries over 60,000.

Example of Creating a View:

SAS
PROC SQL;
    CREATE VIEW sales_employees AS
    SELECT *
    FROM employees
    WHERE department = 'Sales';
QUIT;

This creates a view sales_employees that dynamically retrieves sales department data without duplicating storage.

Advanced Data Manipulation with PROC SQL

1. Using Subqueries

PROC SQL supports subqueries, allowing you to include a query within another query. Subqueries are useful for performing nested operations or filtering based on aggregated values.

Example:

SAS
PROC SQL;
    SELECT name, salary
    FROM employees
    WHERE salary > (SELECT AVG(salary) FROM employees);
QUIT;

This query selects employees whose salaries exceed the average salary.

2. Using Case Statements for Conditional Logic

The CASE statement allows for conditional calculations and data transformation within PROC SQL.

Example:

SAS
PROC SQL;
    SELECT name, 
           CASE 
               WHEN age < 30 THEN 'Junior'
               WHEN age BETWEEN 30 AND 50 THEN 'Mid-Level'
               ELSE 'Senior'
           END AS experience_level
    FROM employees;
QUIT;

This code categorizes employees based on age into Junior, Mid-Level, and Senior levels.

3. Aggregating Data with Statistical Functions

PROC SQL includes various statistical functions, such as SUM, AVG, MIN, MAX, and COUNT, to summarize data.

Example:

SAS
PROC SQL;
    SELECT department, COUNT(*) AS employee_count
    FROM employees
    GROUP BY department;
QUIT;

This query counts the number of employees in each department.

FAQs

  1. What is PROC SQL in SAS used for?
    PROC SQL is used to perform data manipulation, querying, and summarization in SAS using SQL.
  2. How is PROC SQL different from DATA step?
    PROC SQL uses SQL syntax for data manipulation, offering a more flexible and often concise alternative to the DATA step.
  3. Can I perform joins with PROC SQL in SAS?
    Yes, PROC SQL supports various join types (INNER, LEFT, RIGHT, FULL) for combining data from multiple tables.
  4. What is the use of GROUP BY in PROC SQL?
    GROUP BY aggregates data based on specified columns, useful for generating summary statistics.
  5. How do I create a new table with PROC SQL?
    Use the CREATE TABLE statement to save results in a new dataset.
  6. What is the difference between a table and a view in PROC SQL?
    A table is a physical dataset, whereas a view is a virtual table that fetches data dynamically.
  7. How do I filter data in PROC SQL?
    Use the WHERE clause to specify conditions and filter rows.
  8. Can I calculate new columns in PROC SQL?
    Yes, you can calculate new columns using arithmetic operations directly in the SELECT statement.
  9. What is the purpose of the DISTINCT keyword?
    DISTINCT removes duplicate rows in the output, showing unique values.
  10. How does PROC SQL handle missing values?
    PROC SQL treats missing values as NULLs, which can be managed using specific conditional statements.

External Resources

For further learning, explore these resources:


PROC SQL in SAS is a powerful tool for data manipulation, simplifying complex tasks and enhancing efficiency. From simple queries to advanced data transformations, understanding and using PROC SQL effectively can significantly improve your data analysis capabilities in SAS.


Share it!