Coding 'n stuff
Querying Census Data
I was building a tool to generate some random data for testing with the goal that the data is readable. I wanted to get actual city/state/zip code information so I began looking at the census.gov website for the type of data that is available. I stumbled across several “names files” that contain first and last names and their frequency. The last name file was about 2MB in size – perfect for generating some random customer names.
I opened the file in notepad and started looking through the file to find how common my last name is in comparison to others. Sure, the Find… feature in notepad works – but a SQL query would be better at pulling out the data that I wanted.
Since the format of the file was fixed record size (not comma or tab delimited), I realized that the easiest solution would be to use NeoData to create a map for the fields in the file and import directly into a SQL server table.
Each record contains a record like this:
01 NAME-RECORD. 05 NAME PIC X(15). 05 FREQ PIC 9.999. 05 FILLER PIC X. 05 CUM-FREQ PIC ZZ.999. 05 FILLER PIC X(2). 05 RANK PIC 9(5).
So I fired up NeoData and created a sequential file mapping using the COBOL description above, and added a line sequential endpoint pointing to the dist.all.txt file.
Next, I created a SQL mapping with the name field set as the primary key (since it is unique in this file) and added a SQL Endpoint to point to a test database:
Now it is just a matter of running the converter, to convert from the line sequential input to the SQL database:
I now have 88,799 records in my SQL table! Now for the fun part – fire up SQL Management studio and run some queries. I wanted to see how some of the names of my coworkers compare to each other. I used the following query:
SELECT [NAME] ,[FREQ] ,[CUM-FREQ] ,[RANK] FROM [VidRent].[dbo].[NAME-RECORD] WHERE [NAME] IN ( 'HOLLIS', 'ANDERSON', 'COOK', 'PAULSON', 'ROBSON', 'LANGER' ) ORDER BY [RANK] ASC
And it gave me the following results:
It turned out to be a trivial exercise to map this Census data into SQL Server using the data conversion feature in NeoData, and now the data is in a form that is easy to consume for my data generator. So when you have a text file that can easily be described with a COBOL record, you can easily convert it into SQL Server using NeoData following steps similar to these.