Kelly's Space

Coding 'n stuff

Category Archives: NeoData

Querying Census Data

I was building a tool to generate some random data for testing with the goal that the data is readable.  I wanted to get actual city/state/zip code information so I began looking at the website for the type of data that is available.  I stumbled across several “names files” that contain first and last names and their frequency.  The last name file was about 2MB in size – perfect for generating some random customer names.

I opened the file in notepad and started looking through the file to find how common my last name is in comparison to others.  Sure, the Find… feature in notepad works – but a SQL query would be better at pulling out the data that I wanted.

Since the format of the file was fixed record size (not comma or tab delimited), I realized that the easiest solution would be to use NeoData to create a map for the fields in the file and import directly into a SQL server table.

Each record contains a record like this:

       01 NAME-RECORD.
          05 NAME     PIC X(15).
          05 FREQ     PIC 9.999.
          05 FILLER   PIC X.
          05 CUM-FREQ PIC ZZ.999.
          05 FILLER   PIC X(2).
          05 RANK     PIC 9(5).


So I fired up NeoData and created a sequential file mapping using the COBOL description above, and added a line sequential endpoint pointing to the dist.all.txt file.


Next, I created a SQL mapping with the name field set as the primary key (since it is unique in this file) and added a SQL Endpoint to point to a test database:


Now it is just a matter of running the converter, to convert from the line sequential input to the SQL database:


I now have 88,799 records in my SQL table!  Now for the fun part – fire up SQL Management studio and run some queries.  I wanted to see how some of the names of my coworkers compare to each other.  I used the following query:

  FROM [VidRent].[dbo].[NAME-RECORD]

And it gave me the following results:


It turned out to be a trivial exercise to map this Census data into SQL Server using the data conversion feature in NeoData, and now the data is in a form that is easy to consume for my data generator.  So when you have a text file that can easily be described with a COBOL record, you can easily convert it into SQL Server using NeoData following steps similar to these.