Kelly's Space

Coding 'n stuff

Querying Census Data

I was building a tool to generate some random data for testing with the goal that the data is readable.  I wanted to get actual city/state/zip code information so I began looking at the census.gov website for the type of data that is available.  I stumbled across several “names files” that contain first and last names and their frequency.  The last name file was about 2MB in size – perfect for generating some random customer names.

I opened the file in notepad and started looking through the file to find how common my last name is in comparison to others.  Sure, the Find… feature in notepad works – but a SQL query would be better at pulling out the data that I wanted.

Since the format of the file was fixed record size (not comma or tab delimited), I realized that the easiest solution would be to use NeoData to create a map for the fields in the file and import directly into a SQL server table.

Each record contains a record like this:

       01 NAME-RECORD.
          05 NAME     PIC X(15).
          05 FREQ     PIC 9.999.
          05 FILLER   PIC X.
          05 CUM-FREQ PIC ZZ.999.
          05 FILLER   PIC X(2).
          05 RANK     PIC 9(5).

 

So I fired up NeoData and created a sequential file mapping using the COBOL description above, and added a line sequential endpoint pointing to the dist.all.txt file.

image

Next, I created a SQL mapping with the name field set as the primary key (since it is unique in this file) and added a SQL Endpoint to point to a test database:

image

Now it is just a matter of running the converter, to convert from the line sequential input to the SQL database:

image

I now have 88,799 records in my SQL table!  Now for the fun part – fire up SQL Management studio and run some queries.  I wanted to see how some of the names of my coworkers compare to each other.  I used the following query:

SELECT [NAME]
      ,[FREQ]
      ,[CUM-FREQ]
      ,[RANK]
  FROM [VidRent].[dbo].[NAME-RECORD]
  WHERE [NAME] IN ( 'HOLLIS', 'ANDERSON', 'COOK', 'PAULSON', 'ROBSON', 'LANGER' )
  ORDER BY [RANK] ASC

And it gave me the following results:

image

It turned out to be a trivial exercise to map this Census data into SQL Server using the data conversion feature in NeoData, and now the data is in a form that is easy to consume for my data generator.  So when you have a text file that can easily be described with a COBOL record, you can easily convert it into SQL Server using NeoData following steps similar to these.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: