Wednesday, July 1, 2009

Arrrrggghh! SAS is Evil!

Just another day (or two) of torturing data. Like I mentioned a couple of days ago, a week back I decided to update a data set to include the last year or so of data (the data sources I use were recently updated). Like most "simple" jobs, it's turned out to be much more of a hairball than I expected. Although the program I used was fairly simple to rewrite, I realized that I had to update not one, not tWo, but THREE datasets in order to bring everything up to the present.

Caution: SAS Geekspeak ahead

One of the data sets is pretty large (it was about 70 gigabytes, but with the updates and indexing I've done, it's almost 100 gig). So, adding the new data and checking it took quite a while (no matter how efficiently you code things, SAS simply takes a long time to read a 70 gigabyte file). I thought I had everything done except for the final step. Unfortunately, the program kept crashing due to "insufficient resources."

For the unitiated, when manipulating data (sorting, intermediate steps on SQL select statements, etc...) SAS sets up temporary ("scratch") files. They're supposed to be released when SAS terminates, but unfortunately, my system wasn't doing that. So, I had over 180 gigabytes of temporary files clogging up my hard drive. This means that there wasn't enough disk space on my 250 gigabyte drive for SAS to manipulate the large files I'm using.

Of course, I only realized this when my program crashed AFTER EIGHT HOURS OF RUNNING! TWICE!

I've now manually deleted all the temporary files, and I'm running the program overnight to see if this fixes the problem.

Ah well - if it was easy, anyone could do it.

update (next morning): Phew! It ran - it seems the unreleased temporary files were the issue. On to the next problem.