%check_rows
Andrew Li & Mike Atkinson
Unit tests compare the expected output with the actual output.
{SASUnit}
and {FUTS}
are popular frameworks for writing unit tests in SAS.
“A unit test provides a strict, written contract that the piece of code must satisfy” - Wikipedia
As a SAS programmer/analyst, you should consider implementing tests in your workflow.
There are two main applications of SAS:
One off Programs
Standardized and Reusable Programs
%check_rows();
Perform a check on table(s) row counts using the expression provided.
The expression {expr}
parameter captures what you expect the rows to actually be.
expected | actual | tests |
---|---|---|
%check_rows
compares the expected value with actual value of row counts.
description | expected | actual | tests |
---|---|---|---|
number of observations in mylib.clinics
|
20 | 20 | |
number of observations in mylib.clincs
|
20 | 19 |
Valid expressions
=, <>, >=, <=, >, <
+, -, *
Invalid Expressions
Is not a valid library name or table name.
Contains more then one or is missing a comparison operator.
Types of messages/exceptions
Condition | Severity | Meaning |
---|---|---|
note |
low | execution was successful but returns additional information |
warning |
medium | execution encountered some problem but the job was run |
error |
high | execution was stopped as there was no way to continue |
abend |
very high | execution was stopped and aborted |
data one
two
bad_records
good_records;
do i = 1 to 89;
if (i <= 20) then output one;
if (i <= 20) then output two;
if (i <= 2) then output bad_records;
if (i <= 100) then output good_records;
end;
stop;
run;
/* table `one` has 20 rows */
/* table `two` has 20 rows */
/* table `bad_records` has 2 rows */
/* table `good_records` has 89 rows */
The default value for severity param is to throw an error.
The code can continue if you set severity = warning
.
The same test case will fail if the severity param is set to the default value.
Although abend
is not a real SAS message, it is an option for %check_rows();
Be careful when using abend
If severity is set to abend and the test fails, it will stop processing immediately and terminate your SAS Enterprise Guide/Studio.
Thus, it is crucial that you are careful when using abend.
Use abend selectively
It is worthwhile to use abend in automated SAS programs becuase you want the failure to grab your attention.
In this example, we get excel data from clinics to update our database.
%let clinic_number = 123 456 789 888;
%macro read_excel;
%do i = 1 %to %sysfunc(countw(&clinic_number));
%let num = %scan(&clinic_number, &i);
proc import dbms = xlsx
out = clinic_&num.
file = "data/clinic_&num..xlsx"
REPLACE;
run;
%end;
%mend;
%read_excel;
/* make sure the datasets imported are good */
%check_rows(%str(clinic_123 > 100), severity = warning);
%check_rows(%str(clinic_456 > 100), severity = warning);
We need to divide $25,000 between practitioners based on total complexity.
%let tot_funds = 25000;
proc sql;
create table prac_tot as
select sum(total_complexity) as tot_total_complexity
from prac_list;
quit;
/* Calculate total for each practitioner */
data prac_funding;
drop tot_total_complexity;
if (_n_ = 1) then set prac_tot;
set prac_list;
prac_funding = round(product(tot_funds, total_complexity / tot_total_complexity), 0.01);
run;
prac_list
pracnum total_complexity
1 A5234 1317.7
2 B2519 1457.2
3 C6868 1145.2
4 D5261 1004.5
5 E1801 1258.7
6 F9182 1332.2
7 G8123 994.1
8 H6010 1011.0
prac_funding
pracnum total_complexity prac_funding
1 A5234 1317.7 3460.13
2 B2519 1457.2 3826.44
3 C6868 1145.2 3007.16
4 D5261 1004.5 2637.70
5 E1801 1258.7 3305.20
6 F9182 1332.2 3498.20
7 G8123 994.1 2610.39
8 H6010 1011.0 2654.77
But, there is a not so obvious problem
The previous workflow resulted in a very minor but important error. The total allocation does not equal to $25,000. Here is a rewrite of the previous code that includes a check.
%let tot_funds = 25000;
data prac_funding (drop=check_prac_tot)
total_not_match (keep=check_prac_tot);
drop tot_total_complexity;
retain check_prac_tot 0;
if (_n_ = 1) then set prac_tot;
set prac_list end=eof;
prac_funding = round(product(tot_funds, total_complexity / tot_total_complexity), 0.01);
check_prac_tot + prac_funding;
output prac_funding;
if (eof) then do;
if (check_prac_tot ne &tot_funds) then output total_not_match;
end;
run;
%check_rows(%str(total_not_match = 0));
Combining datasets is a very common step. Consider the following tables:
pracnum complexity_score
1 A5234 78
2 B2519 90
3 C6868 85
4 D5261 40
pracnum rate_score
1 A5234 1317.7
2 B2519 1457.2
3 C6868 1145.2
4 D5261 1004.5
5 E1801 1258.7
6 F9182 1332.2
When we merge the two datasets the resulting table will have missing values. This may cause unexpected results and SAS does not let you know this occured.
The complexity_table
has duplicate values.
pracnum complexity_score
1 A5234 78
2 B2519 90
3 B2519 85
4 B2519 40
pracnum rate_score
1 A5234 1317.7
2 B2519 1457.2
When you merge these two datasets, we get additional observations. Again, this may cause unexpected results and SAS does not let you know this occured.
Getting the sum when a variable is missing leads to the resultant being missing as well. Again, SAS does not let you know this occured.
data total_num_prac;
interior_prac = .; fraser_health_prac = 8790; northern_health_prac = 9896;
total_prac = interior_prac + fraser_health_prac + northern_health_prac;
run;
interior_prac fraser_health_prac northern_health_prac total_prac
1 . 8790 9896 .
If you are with the Ministry of Health, you already have access to this macro via SAS without installation!
If you are not a Ministry of Health, you will need to install it via GitHub.
After you have downloaded the repo locally (somewhere SAS can access), update the sasautos to the correct path.
Source code for the slides can be found on GitHub
Source code for the macro can be found on Github
Questions? 😊
[1] '1.3.340'
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.2.1 (2022-06-23 ucrt)
os Windows 10 x64 (build 19045)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Canada.utf8
ctype English_Canada.utf8
tz America/Los_Angeles
date 2023-06-07
pandoc 2.19.2 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
base * 4.2.1 2022-06-23 [1] local
cli 3.4.1 2022-09-23 [1] CRAN (R 4.2.1)
compiler 4.2.1 2022-06-23 [1] local
datasets * 4.2.1 2022-06-23 [1] local
digest 0.6.30 2022-10-18 [1] CRAN (R 4.2.1)
evaluate 0.17 2022-10-07 [1] CRAN (R 4.2.1)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.1)
graphics * 4.2.1 2022-06-23 [1] local
grDevices * 4.2.1 2022-06-23 [1] local
htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.1)
jsonlite 1.8.3 2022-10-21 [1] CRAN (R 4.2.1)
knitr 1.40 2022-08-24 [1] CRAN (R 4.2.1)
later 1.3.0 2021-08-18 [1] CRAN (R 4.2.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1)
methods * 4.2.1 2022-06-23 [1] local
png 0.1-8 2022-11-29 [1] CRAN (R 4.2.2)
processx 3.7.0 2022-07-07 [1] CRAN (R 4.2.1)
ps 1.7.1 2022-06-18 [1] CRAN (R 4.2.1)
quarto 1.2 2022-07-06 [1] CRAN (R 4.2.3)
Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.1)
rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.1)
rmarkdown 2.17 2022-10-07 [1] CRAN (R 4.2.1)
rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
stats * 4.2.1 2022-06-23 [1] local
stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.1)
stringr 1.4.1 2022-08-20 [1] CRAN (R 4.2.1)
tools 4.2.1 2022-06-23 [1] local
utils * 4.2.1 2022-06-23 [1] local
xfun 0.34 2022-10-18 [1] CRAN (R 4.2.1)
yaml 2.3.6 2022-10-18 [1] CRAN (R 4.2.1)
[1] C:/Users/AndLi/AppData/Local/Programs/R/R-4.2.1/library
──────────────────────────────────────────────────────────────────────────────