Share it!

Introduction

As a SAS professional, mastering the art of querying and extracting data from datasets is an essential skill for any data analyst or programmer. One of the most fundamental and powerful techniques for data querying in SAS is through the PROC SQL procedure, which allows users to write SQL queries within the SAS environment. This capability enables you to harness the power of SQL for managing and manipulating data efficiently.

In this article, we will explore how to perform basic SELECT queries with PROC SQL in SAS. From simple column selection to filtering, sorting, and joining datasets, these basic SQL techniques will provide you with a solid foundation for querying data effectively.


1. What is PROC SQL in SAS?

PROC SQL is a procedure in SAS that allows you to execute SQL queries directly within the SAS environment. This powerful tool allows you to interact with your SAS datasets and relational databases using standard SQL syntax. It enables SAS users to access and manipulate data without needing to rely solely on SAS-specific procedures.

SQL (Structured Query Language) is a universal language for managing and querying relational databases, and PROC SQL allows you to use these standard SQL commands to work within SAS. For example, you can perform operations like selecting data, filtering records, sorting, joining tables, aggregating data, and much more.


2. The Basics of a SELECT Query

The SELECT statement is the cornerstone of any SQL query. It allows you to retrieve data from one or more tables. In SAS, you can use PROC SQL to write a SELECT query that extracts specific columns from your datasets.

Syntax of the SELECT Statement:

SAS
proc sql;
    select column1, column2, ... 
    from dataset;
quit;
  • select: Specifies the columns you want to retrieve.
  • from: Specifies the table or dataset from which to retrieve the data.

Example of a Basic SELECT Query:

SAS
proc sql;
    select Name, Age, Salary 
    from employees;
quit;

In this example, the query retrieves the Name, Age, and Salary columns from the employees dataset.


3. Filtering Data Using WHERE Clause

To extract a subset of data that meets specific conditions, you can use the WHERE clause in your SELECT query. The WHERE clause allows you to filter rows based on given criteria.

Syntax of WHERE Clause:

SAS
proc sql;
    select column1, column2
    from dataset
    where condition;
quit;

Example of Using the WHERE Clause:

SAS
proc sql;
    select Name, Age, Salary
    from employees
    where Age > 30;
quit;

This query selects employees who are older than 30, returning only the Name, Age, and Salary of employees who meet this condition.

Common WHERE Conditions:

  • Equality: WHERE column = value;
  • Inequality: WHERE column <> value;
  • Greater than/Less than: WHERE column > value;
  • IN: WHERE column IN (value1, value2, ...);
  • LIKE: WHERE column LIKE 'pattern';
  • BETWEEN: WHERE column BETWEEN value1 AND value2;

4. Sorting Data Using ORDER BY Clause

To sort the result set returned by a SELECT query, use the ORDER BY clause. By default, the ORDER BY clause sorts the data in ascending order (from smallest to largest), but you can change this by specifying DESC for descending order.

Syntax of ORDER BY Clause:

SAS
proc sql;
    select column1, column2
    from dataset
    order by column1 [ASC | DESC];
quit;

Example of Sorting Data:

SAS
proc sql;
    select Name, Salary
    from employees
    order by Salary desc;
quit;

In this example, the query retrieves Name and Salary from the employees dataset and sorts the results in descending order by Salary.


5. Using Aliases in SELECT Queries

An alias is a temporary name given to a column or table in the result set. Aliases can make your query more readable and easier to work with, especially when working with multiple columns or complex expressions.

Syntax for Aliases:

  • Column Alias: column AS alias_name
  • Table Alias: dataset AS alias_name

Example of Using Aliases:

SAS
proc sql;
    select Name as Employee_Name, Salary as Annual_Salary
    from employees;
quit;

In this example, the Name column is aliased as Employee_Name and the Salary column is aliased as Annual_Salary in the output.


6. Combining Multiple Conditions with AND/OR

You can combine multiple conditions in the WHERE clause using logical operators like AND and OR. The AND operator returns results that meet all conditions, while the OR operator returns results that meet at least one condition.

Example of Using AND/OR:

SAS
proc sql;
    select Name, Age, Salary
    from employees
    where Age > 30 and Salary > 50000;
quit;

This query retrieves employees who are older than 30 and have a salary greater than 50,000.


7. Using DISTINCT to Remove Duplicates

If you want to retrieve only unique values from a column, you can use the DISTINCT keyword. This is particularly useful when you want to find distinct values without duplicates.

Syntax of DISTINCT:

SAS
proc sql;
    select distinct column
    from dataset;
quit;

Example of Using DISTINCT:

SAS
proc sql;
    select distinct Department
    from employees;
quit;

This query retrieves the unique departments from the employees dataset.


8. Limiting the Number of Results with LIMIT

If you want to restrict the number of rows returned by your query, you can use the LIMIT clause. This is especially helpful when working with large datasets, and you only need to preview a small subset of the data.

Syntax of LIMIT:

SAS
proc sql;
    select column1, column2
    from dataset
    limit number_of_rows;
quit;

Example of Using LIMIT:

SAS
proc sql;
    select Name, Salary
    from employees
    limit 10;
quit;

This query retrieves only the first 10 rows from the employees dataset.


9. Best Practices for Writing SQL Queries in SAS

To optimize your SELECT queries and ensure efficient execution, here are some best practices to keep in mind:

  1. Avoid Using SELECT *: Always specify the columns you need instead of selecting all columns. This reduces the amount of data retrieved and improves performance.
  2. Use Indexes: If your dataset is large, indexing frequently queried columns can significantly speed up data retrieval.
  3. Limit Results Early: Use the LIMIT or WHERE clause to reduce the size of the result set early in the query process.
  4. Check for NULL Values: Always account for NULL values in your queries. Use the IS NULL or IS NOT NULL operators to handle these cases.

10. Conclusion

Understanding how to perform basic SELECT queries with PROC SQL in SAS is a foundational skill for any SAS professional. By mastering the key elements of SQL syntax—such as filtering, sorting, joining, and using aliases—you will be able to query and manipulate data effectively within the SAS environment.

As you progress in your career, SQL will remain an essential tool in your SAS toolkit, enabling you to handle more complex data tasks and ensuring that you can work efficiently with large datasets. With practice, these basic queries will become second nature, helping you streamline your data workflows and improve your productivity.


External Resources for Further Learning:

  1. SAS PROC SQL Documentation
  2. SAS SQL Syntax Guide
  3. SQL in SAS: A Beginner’s Guide

FAQs

  1. What is PROC SQL in SAS?
  • PROC SQL is a procedure in SAS that allows users to write SQL queries to interact with datasets within the SAS environment.
  1. How do I select specific columns in SAS SQL?
  • Use the SELECT statement followed by the column names: SELECT column1, column2 FROM dataset;.
  1. Can I filter data in a SELECT query?
  • Yes, use the WHERE clause to filter data based on specific conditions: WHERE column = value;.
  1. How do I sort the results in SAS SQL?
  • Use the ORDER BY clause to sort the data: ORDER BY column_name ASC|DESC;.
  1. What is the purpose of the DISTINCT keyword in SAS SQL?
  • DISTINCT eliminates duplicate values in the result set.
  1. Can I use SQL functions in SAS queries?
  • Yes, SAS SQL supports a variety of functions like AVG(), SUM(), COUNT(), and more.
  1. How do I limit the number of rows in a query?
  • Use the LIMIT clause to restrict the number of rows returned.
  1. Can I join tables in SAS SQL?
  • Yes, you can join tables using the JOIN keyword in SAS SQL.
  1. What is the use of aliases in SAS SQL?
  • Aliases provide temporary names for columns or tables, making queries more readable.
  1. How do I retrieve unique values in SAS SQL?
  • Use the DISTINCT keyword to retrieve unique values from a column.

Share it!