Introduction to Testing in SAS with `%check_rows`

Andrew Li & Mike Atkinson

Unit testing

Unit tests compare the expected output with the actual output.

{SASUnit} and {FUTS} are popular frameworks for writing unit tests in SAS.

“A unit test provides a strict, written contract that the piece of code must satisfy” - Wikipedia

Benefits of unit testing

As a SAS programmer/analyst, you should consider implementing tests in your workflow.

Problems are found early in the development cycle
Increased confidence when making changes - insures against unintentionally changing code behaviour
- changes can be tests any time easily
Can document the functionality of the program
Prevents re-introducing already fixed bugs
etc.

When do we need tests?

There are two main applications of SAS:

One off Programs

Tests are often informal and not reproducible
Checking the log
Comparing results with what we need
Tracking tables

Standardized and Reusable Programs

Macros or Programs used in different reports or studies
It needs to work with different data and parameters
In this case, tests should be included!

Introduction to `%check_rows();`

Perform a check on table(s) row counts using the expression provided.

%check_rows(expr, severity = error);

Expression

The expression {expr} parameter captures what you expect the rows to actually be.

expected	actual	tests

%check_rows compares the expected value with actual value of row counts.

description	expected	actual	tests
number of observations in `mylib.clinics`	20	20
number of observations in `mylib.clincs`	20	19

Expression rules

Valid expressions

The expression parameter must contain a comparison operator and only table names.
- Accepted comparison operators: =, <>, >=, <=, >, <
The “left hand side” of the operator allows for a table name with optional schema prefix followed by an optional valid arithmetic operator and another table name with optional schema prefix.
- Accepted arithmetic operators: +, -, *
The “right hand side” of the operator follows same regular expression rules.

%check_rows(%str(table_1 = table_2));
%check_rows(%str(table_1 + table_2 = table_3));
%check_rows(%str(table_1 + table_2 = table_3 + table_4));
%check_rows(%str(mylib.all = details + mylib2.summary));

Invalid Expressions

Is not a valid library name or table name.
Contains more then one or is missing a comparison operator.

%check_rows(%str(1_table_1 = table_2));     /* Invalid table name */
%check_rows(%str(table_1 <= 2mylib.sales)); /* Invalid library name */
%check_rows(%str(table_1 0));               /* Missing comparison operator */
%check_rows(%str(table_1 >= 50 // 3));      /* Invalid arithmetic operator */

Sending mixed signals

Types of messages/exceptions

Condition	Severity	Meaning
`note`	low	execution was successful but returns additional information
`warning`	medium	execution encountered some problem but the job was run
`error`	high	execution was stopped as there was no way to continue
`abend`	very high	execution was stopped and aborted

Notes are the most innocuous messages in SAS - they simply inform you of the status of the program.
Warnings are less dire than errors but goes ahead and runs the program anyways. Some warnings are harmless, but some indicate grave problems.
Errors bring programs to a halt and you must attend to them.
Abend is not actually an SAS message but it is a possible parameter in the macro. This will throw an error and abort the program.

Generate example tables

data one 
     two 
     bad_records
     good_records;
    do i = 1 to 89;
        if (i <= 20) then output one;
        if (i <= 20) then output two;
        if (i <= 2) then output bad_records;
        if (i <= 100) then output good_records;
    end;
    stop;
run;
     
/* table `one` has 20 rows */
/* table `two` has 20 rows */
/* table `bad_records` has 2 rows */
/* table `good_records` has 89 rows */

Severity parameter

The default value for severity param is to throw an error.

/* example of a passed test  */
/* severity set to default */
%check_rows(%str(one = two));

NOTE: check_rows: Success!
NOTE: check_rows: one = two
NOTE: check_rows: 20 = 20

The code can continue if you set severity = warning.

/* example of a failed test with a warning */
%check_rows(%str(bad_records = 0), 
            severity = warning);

WARNING: check_rows: Expression Failed!
WARNING: check_rows: Not True: bad_records = 0 
WARNING: check_rows: Not True: 2 = 0

The same test case will fail if the severity param is set to the default value.

/* example of a failed test with an error */
/* severity set to default */
%check_rows(%str(bad_records = 0));

ERROR: check_rows: Expression Failed!
ERROR: check_rows: Not True: bad_records = 0
ERROR: check_rows: Not True: 2 = 0

Severity set to abend

Although abend is not a real SAS message, it is an option for %check_rows();

Be careful when using abend

If severity is set to abend and the test fails, it will stop processing immediately and terminate your SAS Enterprise Guide/Studio.

Thus, it is crucial that you are careful when using abend.

Use abend selectively

It is worthwhile to use abend in automated SAS programs becuase you want the failure to grab your attention.

A simple example

In this example, we get excel data from clinics to update our database.

%let clinic_number = 123 456 789 888; 

%macro read_excel;
%do i = 1 %to %sysfunc(countw(&clinic_number)); 
    %let num = %scan(&clinic_number, &i); 
    proc import dbms = xlsx
        out = clinic_&num.
        file = "data/clinic_&num..xlsx"
        REPLACE;
    run;
%end;
%mend;
%read_excel; 

/* make sure the datasets imported are good */
%check_rows(%str(clinic_123 > 100), severity = warning);
%check_rows(%str(clinic_456 > 100), severity = warning);

/* concatenate the newly imported tables */
data all_clinics;
  set clinic_123 clinic_456 clinic_789 clinic_888;
run;

Now, we can include tests to make sure that all_clinics is what we think it is.

%check_rows(%str(clinic_123 + clinic_456 + clinic_789 + clinic_888 = all_clinics));
%check_rows(%str(all_clinics = 0), severity = abend);

Another simple example

We need to divide $25,000 between practitioners based on total complexity.

%let tot_funds = 25000; 
proc sql;
    create table prac_tot as
    select sum(total_complexity) as tot_total_complexity
    from prac_list;
quit;

/* Calculate total for each practitioner */
data prac_funding;
    drop tot_total_complexity;
    if (_n_ = 1) then set prac_tot;
    set prac_list;
    prac_funding = round(product(tot_funds, total_complexity / tot_total_complexity), 0.01);
run;

prac_list

  pracnum total_complexity
1   A5234           1317.7
2   B2519           1457.2
3   C6868           1145.2
4   D5261           1004.5
5   E1801           1258.7
6   F9182           1332.2
7   G8123            994.1
8   H6010           1011.0

prac_funding

  pracnum total_complexity prac_funding
1   A5234           1317.7      3460.13
2   B2519           1457.2      3826.44
3   C6868           1145.2      3007.16
4   D5261           1004.5      2637.70
5   E1801           1258.7      3305.20
6   F9182           1332.2      3498.20
7   G8123            994.1      2610.39
8   H6010           1011.0      2654.77

But, there is a not so obvious problem

The not so obvious problem

The previous workflow resulted in a very minor but important error. The total allocation does not equal to $25,000. Here is a rewrite of the previous code that includes a check.

%let tot_funds = 25000; 
data prac_funding   (drop=check_prac_tot)
     total_not_match  (keep=check_prac_tot);

    drop tot_total_complexity;
    retain check_prac_tot  0;

    if (_n_ = 1) then set prac_tot;
    set prac_list  end=eof;
    prac_funding = round(product(tot_funds, total_complexity / tot_total_complexity), 0.01);
    check_prac_tot + prac_funding;
    output prac_funding;

    if (eof) then do;
        if (check_prac_tot ne &tot_funds) then output total_not_match;
    end;
run;

%check_rows(%str(total_not_match = 0));

total_not_match

  check_prac_tot
1       24999.99

ERROR: check_rows: Expression Failed!
ERROR: check_rows: Not True: total_not_match = 0
ERROR: check_rows: Not True: 1 = 0

Missing rows for by variable in merge

Combining datasets is a very common step. Consider the following tables:

  pracnum complexity_score
1   A5234               78
2   B2519               90
3   C6868               85
4   D5261               40

  pracnum rate_score
1   A5234     1317.7
2   B2519     1457.2
3   C6868     1145.2
4   D5261     1004.5
5   E1801     1258.7
6   F9182     1332.2

When we merge the two datasets the resulting table will have missing values. This may cause unexpected results and SAS does not let you know this occured.

/* Merge the two data sets */
data final;
  merge complexity_table rate_table;
  by pracnum;
run;

  pracnum complexity_score rate_score
1   A5234               78     1317.7
2   B2519               90     1457.2
3   C6868               85     1145.2
4   D5261               40     1004.5
5   E1801                .     1258.7
6   F9182                .     1332.2

Consider adding a test for this common pitfall.

%check_rows(%str(final = complexity_table), severity = warning);

WARNING: check_rows: Expression Failed!
WARNING: check_rows: Not True: final = complexity_table 
WARNING: check_rows: Not True: 6 = 4

Duplicate by values

The complexity_table has duplicate values.

  pracnum complexity_score
1   A5234               78
2   B2519               90
3   B2519               85
4   B2519               40

  pracnum rate_score
1   A5234     1317.7
2   B2519     1457.2

When you merge these two datasets, we get additional observations. Again, this may cause unexpected results and SAS does not let you know this occured.

/* Merge the two data sets */
data combined_table;
  merge complexity_table rate_table;
  by pracnum;
run;

  pracnum complexity_score rate_score
1   A5234               78     1317.7
2   B2519               90     1457.2
3   B2519               85     1457.2
4   B2519               40     1457.2

Again, consider adding tests for this common pitfall.

%check_rows(%str(combined_table = rate_table));

ERROR: check_rows: Expression Failed!
ERROR: check_rows: Not True: combined_table = rate_table 
ERROR: check_rows: Not True: 4 = 3

Missing values

Getting the sum when a variable is missing leads to the resultant being missing as well. Again, SAS does not let you know this occured.

data total_num_prac; 
  interior_prac = .; fraser_health_prac = 8790; northern_health_prac = 9896;
  total_prac = interior_prac + fraser_health_prac + northern_health_prac;
run;

  interior_prac fraser_health_prac northern_health_prac total_prac
1             .               8790                 9896          .

In this case, we can create a new table that only contains missing values and then perform a test on this new table.

proc sql;
  create table missing_values_tbl as
  select total_prac
  from total_num_prac
  where total_prac is missing;
quit;

%check_rows(%str(missing_values_tbl = 0))

ERROR: check_rows: Expression Failed!
ERROR: check_rows: Not True: missing_values_tbl = 0 
ERROR: check_rows: Not True: 1 = 0

Installation

If you are with the Ministry of Health, you already have access to this macro via SAS without installation!

If you are not a Ministry of Health, you will need to install it via GitHub.

# clone the forked repo into your computer 
git clone https://github.com/andr3wli/testmacrolib.git

After you have downloaded the repo locally (somewhere SAS can access), update the sasautos to the correct path.

options insert=(sasautos="/Desktop/testmacrolib/check");

Source code for the slides can be found on GitHub

Source code for the macro can be found on Github

Find me here 👇

Twitter

GitHub

Website

Thank You!

Questions? 😊

Session information

quarto::quarto_version()

[1] '1.3.340'

sessioninfo::session_info(include_base = TRUE)

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Canada.utf8
 ctype    English_Canada.utf8
 tz       America/Los_Angeles
 date     2023-06-07
 pandoc   2.19.2 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 base        * 4.2.1   2022-06-23 [1] local
 cli           3.4.1   2022-09-23 [1] CRAN (R 4.2.1)
 compiler      4.2.1   2022-06-23 [1] local
 datasets    * 4.2.1   2022-06-23 [1] local
 digest        0.6.30  2022-10-18 [1] CRAN (R 4.2.1)
 evaluate      0.17    2022-10-07 [1] CRAN (R 4.2.1)
 fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.1)
 graphics    * 4.2.1   2022-06-23 [1] local
 grDevices   * 4.2.1   2022-06-23 [1] local
 htmltools     0.5.3   2022-07-18 [1] CRAN (R 4.2.1)
 jsonlite      1.8.3   2022-10-21 [1] CRAN (R 4.2.1)
 knitr         1.40    2022-08-24 [1] CRAN (R 4.2.1)
 later         1.3.0   2021-08-18 [1] CRAN (R 4.2.1)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.1)
 methods     * 4.2.1   2022-06-23 [1] local
 png           0.1-8   2022-11-29 [1] CRAN (R 4.2.2)
 processx      3.7.0   2022-07-07 [1] CRAN (R 4.2.1)
 ps            1.7.1   2022-06-18 [1] CRAN (R 4.2.1)
 quarto        1.2     2022-07-06 [1] CRAN (R 4.2.3)
 Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.2.1)
 rlang         1.0.6   2022-09-24 [1] CRAN (R 4.2.1)
 rmarkdown     2.17    2022-10-07 [1] CRAN (R 4.2.1)
 rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.1)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.1)
 stats       * 4.2.1   2022-06-23 [1] local
 stringi       1.7.8   2022-07-11 [1] CRAN (R 4.2.1)
 stringr       1.4.1   2022-08-20 [1] CRAN (R 4.2.1)
 tools         4.2.1   2022-06-23 [1] local
 utils       * 4.2.1   2022-06-23 [1] local
 xfun          0.34    2022-10-18 [1] CRAN (R 4.2.1)
 yaml          2.3.6   2022-10-18 [1] CRAN (R 4.2.1)

 [1] C:/Users/AndLi/AppData/Local/Programs/R/R-4.2.1/library

──────────────────────────────────────────────────────────────────────────────