I am looking for a data file which will allow me to append my dataset (non-GIS) with an urban/rural indicator and a US geographic area (e.g. NE, SW). I have city, state and zipcode on my file. I do not need to map this data.
Can anyone point me in the right direction?
Wow, nobody's touched this one for two months now. OK, I'll give you my ideas on it, though they are slanted by the fact that I'm a GIS guy not a database pro.
The first step is to "geocode" your data. Basically place it on a map using the data you have (city/state/zip) The
problem I see here is that you say you only have City/State/Zip, not actuall street address. This will limit the accuracy of your geocoding. And that's going to be a problem for your urban/rural classification. Basically, the most accurate you could do with this data is to put the record in the center of a zip code polygon. The problem is, that zip code polygon may be part rural, part urban. How do you know if it is in the urban or rural part? Geocoding is esspecially problematic in rural areas where traditional street addresses really don't apply. I once tried to geocode Dairys in the central valley in California. Every dairy had an address complete with city, i'll bet 1% of those dairies were actually inside the city limits. Amazing they ever get any post. If you're not familiar with geocoding, think about what Mapquest does when you ask it to map an address.
Next step: overlay your geocoded data on a dataset broken down by region (NW,SW etc.) simply select all the geocoded data within each of the polygons, and update the database table with the appropriate value (NW etc). Oh, you'll need to create that column in your database table first.
Repeat using an urban/rural dataset.
Keep in mind the factors limiting your accuracy: accuracy of geocoded locations, accuracy of rural/urban and region data sets, and any error you pick up if those three data sets are not in the same coordinate system.
You could obtain a list of all the zip codes in the US, classify each zip code as either rural or urban, and either NW,SW,NE,SE. Then it's a pretty simple database issue. For each of your records calulate the urban/rural and region columns using your national dataset.
Accuracy here is limited by the scale of your data: the Zip code level. This may well be accurate enough if you're analyzing patterns on a national level.
check out this info from the US Census Bereau TIGER data FAQ:
Q16: Does the ZIP Code file you have available for downloading contain a
complete and up-to-date list of ZIP Codes?
The ZIP file doesn't include all ZIP Codes because it was based only on those
areas for which we had city-style addresses (which leaves out a lot of rural
areas). The file was created as a byproduct of another operation (a data
product based on the ZIP codes we collected with the 1990 Census data). For
this Census data product, we took advantage of the fact that we had some
ZIP Code data as a consequence of trying to collect addresses for the Census
questionnaire mailout so we published data for those ZIP Codes. This ZIP
internal point file is basically the lat/long for the ZIP Codes in that product.
We put it on the Web in case anyone found it useful. It was not intended to
be an authoritative source on ZIP Codes.
Note that the task of creating a lat/long or polygon file of all ZIP Codes is
not as easy as it seems since ZIP Codes are not designed to be polygons and
can't easily be forced into them - particularly in rural areas. To our knowledge
there are no official internal point or polygon files available. However, because
of the demand for ZIP Code data and maps, the Census Bureau, for Census 2000,
created a new statistical area called the ZIP Code Tabulation Area (ZCTA).
ZCTAs are close area approximations of U.S. Postal Service ZIP Codes and
were designed to overcome the difficulties of defining clear boundaries to which
data could be attached. For more information on ZCTAs go to:
For more information on ZIP Codes the U.S. Postal Service site may be of interest:
here's the source for the quote: http://www.census.gov/cgi-bin/geo/tigerfaq?Q16