Friday, April 10, 2015

Exercise 6: Data Normalization, Geocoding, and Error Assessment

Goals and objectives

The goal of this lab was to gain skills in geocoding by using sand mine locations in Wisconsin we acquired through the DNR, which before used in geocoding had to be normalized. This exercise is the first of our class multipart suitability and risk model project, with the study area focusing on Trempealeau county and their surrounding counties. The objective we were to reach during this exercise were to normalize the mines excel file we received, use the ESRI geocoding service to geocode our mines, add the PLSS feature class to use in locating mines, manually locate any remaining mines with the PLSS grid, compare our geocodings with other students and with the actual mine locations, and then write out technical report.

Methods

Before any geocoding could be done we had to normalize the mine location data table given to us (denoted by our professor in the DNR file by our student ID) so I created a new excel document and copied over many of the fields already present, but also split several of them such as the addresses into subcategories in order for ArcMap to be able to read them. After locating as many of the 18 mines to my best ability using the available addresses I then used the Address inspector to check for any inconsistencies in the locational data or if there were any suitable nearby locations. Then for any mines that did not have a suitable street address I had to use the PLSS grid along with a base map and google earth to hone in on which mines were suitable candidates and then manually use the address locator to give them an address value.

Once our individual sets of 18 geocoded mines were finalized we compiled our data in a collaborative class folder that we could all access each other’s mine location shapefiles from in order to use the merge tool to combine them all together into a single shapefile and use them to compare to the actual mine locations just recently given to us. For our distance analysis between our geocoded and the actual locations we used the point distance tool (or spatial join tool) to find the closest mine relative to our geocoded and after using a query SQL statement to just show our own geocoded mines in relation to their nearest actual mine locations I created a table showing the distance between them for use in this report.


After finalizing all the previous steps I created two maps showing the collection of student geocoded mines and the actual mine locations, and then my personal geocoded mines along with the queried actual mine locations in order to visually show how far off our data was from the actual mine locations. 

Results

Figure 1
Figure 2

Figure 3

Figure 4

Figure 5



Discussion

The types of errors that were present in the distances data were both inherent and operational, specifically with the differential age of maps and data points which might cause discrepancies in mines with both location and whether or not they are active or still present in base maps. Also with this project being very user input intensive there was a lot of attribute data input errors in the student made normalized tables along with format translation between the existing DNR excel sheet and the normalized sheet.

To measure whether or not certain mine locational points are accurate and correct it is possible to calculate and use their root mean square error (RMSE) which measures the spatial x,y coordinate discrepancies between the geocoded mines versus the actual mine locations. 


Conclusion

Overall the process of collecting addresses data and geocoding them was quite difficult, complex, and time consuming, mostly due to how jumbled the data table was that we aquired from the DNR and also the fact that our class normalized their sets of mines indipendantly without a standardized format. Working on a team as classmates to normalize and geocode the entirety of the DNR data proved to be quite a test of skill and technique for us all, and I especially saw firsthand how important being thorough and following the standardized format and syntax of working with data is, and am glad I was able to share in this endeavor.

No comments:

Post a Comment