Matteo Luccio has nearly twenty years of experience as a magazine writer and editor. He has a master’s degree in political science from MIT and he co−founded the public policy magazine Oregon’s Future, which he edited for four years. For the past thirteen years he has edited and written … More »
Making New Maps from Old Ones – The fastest way to update and aggregate GIS vector data is here
March 27th, 2014 by Matteo Luccio
Data! More data! Still more data! The exploding appetite for enriching GIS datasets with more and better data to support decision making is contributing to the rising demand for custom datasets. It is clear that richly attributed, custom datasets will soon become the “coin of the GIS realm.” In addition to the increasing availability of precision data, the demand for more, better, and faster GIS data conflation is also driven by the National Geospatial-Intelligence Agency’s recent directive to suppliers to aggregate existing data to meet stringent NGA requirements.
New tools and different methods are now required to create the more comprehensively-attributed, custom datasets that are replacing the “good enough” datasets that sufficed in the past. Some early adopters in the GIS market already use a new technology: automated conflation, or “intelligent aggregation” of GIS data. They seem to think that it will give them a competitive advantage that will differentiate the winners from the losers. This article reviews the concept of conflation and two toolkits that can help any GIS professional be more efficient and accurate.
It’s all about “What’s Where?”
More data, more accuracy, more responsiveness to scheduling and financial requirements, intelligently aggregated into truly beautiful, fit-for-purpose datasets are possible right now. MapMerger and GeoMedia Fusion are “tool boxes” that can be purchased via the Internet, opened up the same day, and rather immediately revolutionize data aggregation, data merging, attribute transfer, synchronization of data, and alignment of polygons, right on the desktop. This translates into a new capability to deliver more and better datasets, less expensively than any you have ever produced before automated conflation. Everyone will conflate at some time. The winners in the GIS game will be the early adopters who see these capable programs now and incorporate them into their workflow today.
Maps charted and steered the discovery of the world. Maps recorded the basis of political agreements that allowed peace to replace conflict on an international scale. On a personal level, from the time we were children, we have all been looking at maps because they served our purposes. Remember that first treasure map you made? Perhaps you scribed it in lemon juice, to create the phenomenon of “invisible ink” so that others could not find your treasure? As we grew older, we used maps to study the geography of the world and eventually used a map the first time we drove off to college. All along that journey, you often made notes or additions to one map from other sources of information. That process illustrates today’s “attribution” of features that exist in one dataset to a different dataset. Today, thousands of times per day, planning and management teams at all levels of government and industry require “attribution” of information from many different sources into a dataset that they can use for decision making. These decision making “map” files are now called GIS datasets and they’re absolutely essential to life as we know it.
Conflation typically refers to the merging of two vector datasets of the same feature type that cover the same area, so that the “best of both” may be preserved in a third dataset that is created in the process of conflation. The two datasets being conflated may have been created at different times and with different standards of precision so as to produce a single dataset with superior positional accuracy, richer attribution, or both. For example, a transportation dataset containing roads with excellent spatial accuracy but little attribute information can be merged with another road dataset with rich attribute information but poor spatial accuracy to produce a map that is both spatially accurate and attribute-rich. The new data file that results can be used to support decision making because it contains accurate information from both datasets together, revealing important spatial relationships among all the information in the new dataset that was most likely not discoverable in the two distinct datasets until they were conflated.
Conflation is the process we use to add and connect features to a network or “densify” a particular area or environment. Conflation can align important features in one distinct dataset to accurately depicted geographic features in another dataset. Conflation can very accurately disclose changes made from one generation of datasets. Conflation is an important part of data aggregation, data merging, and data management. However, until recently, conflation has been done manually. Conflation requires many, many decisions involving different comparisons of different sorts of information. The many decisions required make it a labor-intensive process, which has traditionally made conflation expensive and a last resort. When conflating manually, all that “picking” through the data produces first-edition products that are replete with errors, which can only be discovered through careful, manual editing of large amounts of data. Even after all that manual work, the GIS files that result from conflation often help save money, create the best environments and overall best serve the purpose for which they were created. Nevertheless, we have probably all thought, at times, that there ought to be a way to automate the conflation process.
Automated GIS conflation is, in fact, available today. It is a process in which the computer performs mountains of calculations and comparisons that heretofore gave big headaches to any GISP assigned to a large manual conflation project.
Two approaches to a very difficult problem
In the current marketplace, there are two solutions that offer to automate the entire conflation process. Each one offers a well thought-out approach to a very difficult problem. These two products each claim to be able to help GIS professionals create a new competitive edge and they each deliver on the promise. One of the two “tool boxes” available, called MapMerger, is optimized to work seamlessly with ESRI’s ArcGIS software as an “extension” to the ArcGIS suite. The second product is GeoMedia Fusion, a fully integrated element of Intergraph’s GeoMedia GIS suite.
At first look, MapMerger and GeoMedia Fusion appear similar. They both run on popular GIS platforms and promise a significant improvement in the speed and accuracy with which GIS professionals can conflate disparate datasets. They both use sophisticated algorithms and they both have similar workflows. They both use software “tools” to establish “links” or “matches” between features in one dataset with a precise spatial location in another dataset, then conflate the data and transfer the attributes to their matched features. In both cases, the conflation is completed by editing these links or matches and bringing the data together into a third, or “result,” dataset. Any GIS professional can use either of these products to a very significant advantage over a colleague who chooses to ignore the capabilities that either one offers.
However, these two products operate quite differently. Intergraph is well known in the GIS market for supplying a wide range of GIS data management products. MapMerger is supplied by ESEA, whose GIS team has been singly-focused on developing and refining this product since 1992. MapMerger is both a product that organizations can use to perform conflation on their own—if they have an IT staff with GIS experience—and a conflation service. The MapMerger expert team, with experienced ESEA GIS professionals, provides conflation services for its clients. We will now take a look at both of these capable applications of GIS technology and review a comparison of them.
Many capabilities and features
At ESEA’s request, a GIS professional (GISP) systematically compared his experience using GeoMedia Fusion and MapMerger to perform the same three conflation operations: transferring attributes from one dataset to another, densifying an existing network (that is, adding and connecting features to create a more robust routable network), and aligning two polygon datasets. To perform this comparison, the GISP installed the latest versions of both software suites on the same PC and used the same datasets and the same guidance with both software products. He timed how long it took him to complete each procedure using each product, checked the changes in geometry and attribution, and examined the accuracy of the resultant dataset. He then reported his objective findings, as well as his subjective user comments, on such topics as the software’s ease of use and workflow.
The first exercise consisted of transferring road attributes from US Census Tiger data to a specific county’s GIS dataset, retaining the original, more accurate county geometry. If the Tiger data contained feature geometry not found in the county data, then those features were added to the resultant dataset. Using MapMerger, the exercise took the GISP 45 minutes; he directly matched and transferred 512 features, added 110 Tiger features to the county dataset, and left 42 county features unchanged because they did not have a match in the Tiger data. His checks verified that the resultant dataset was accurate and that all features and their attributes were conflated successfully. He reported that the MapMerger tools and workflow performed the task quickly and efficiently, thanks to the streamlined user interface. In particular, the ability to adjust the conflation parameters through quick trial and error greatly expedited the process, while maximizing the number of matches.
When the GISP repeated the exercise using GeoMedia Fusion, it took him six and a half hours. He spent much of that time reviewing links and attempting to edit them to create an accurate result. He also spent a significant amount of time reviewing the software’s documentation while trying to navigate properly the numerous dialog windows required to reach the conflation output. He linked and transferred 575 features and exported to a separate feature class 73 county features that were not found in the Tiger data. However, the exercise was not successful because it did not maintain the county data geometry, and unlinked county data was excluded from the resulting conflated feature class. The GISP’s checks revealed that, in addition to the correct geometries, the result also included duplicate geometries from both datasets, while much of the tabular attribute data did not transfer due to issues with linking. The GISP found that the “certainty ratings” that the software provided for each link were very often incorrect.
The second exercise consisted of adding 1,314 alleyways to 5,542 routable Chicago streets to create a result dataset that would be routable and contain the additional geometries and their attribution. In MapMerger, this operation took the GISP 15 minutes and resulted in automatic fixing of overshoots (dangles) and gaps, as well as the splitting of address ranges when added features created new intersections. Using GeoMedia Fusion, this exercise took the GISP 150 minutes and resulted in fixing overshoots but not gaps. MapMerger was able to successfully split linear street features automatically, with 100 percent accuracy, at intersections with the new alley segments. GeoMedia Fusion was not able to dynamically split address ranges when new alleyways were added to the network. GeoMedia’s inaccuracies with dynamic address ranges needed to be corrected manually to enable accurate routing–a very time consuming and laborious process.
The third exercise simulated a realistic use case by the U.S. Forest Service. This requirement called for the alignment of 44 forest harvest units (polygons) to 249 forest roads (lines), so that the resulting dataset would contain polygon boundaries aligned precisely with the forest roads. Using MapMerger, this exercise took the GISP 20 minutes to complete and resulted in the perfect manipulation of 44 features with 100 percent accuracy. The GISP worked for 180 minutes using GeoMedia Fusion, to accomplish this exercise. However, he did not complete it successfully because, although the software was able to conflate lines to polygons, it was not able to conflate polygons to lines. Subsequently, the results produced by GeoMedia Fusion were wholly inaccurate.
The evaluator’s overall conclusion from this comparison was that both MapMerger and Geomedia fusion provide accurate conflation in all three examples. The evaluator observed that MapMerger required less time with less of a burden on the operator for intervention, input, and interpretation than was required working through the same exercise with GeoMedia Fusion.
Regarding the difference in the two methods of conflation, the evaluator noted:
MapMerger makes many mathematical calculations to measure distances and create accurate spatial adjustments, then automatically captures both the new custom dataset that results, finally providing a “Match Information File” that allows the user to go back and analyze what has been done. GeoMedia Fusion, on the other hand, makes a best guess at matching features on the basis of proximity, and then requires a human operator to accept or reject each of its choices manually, feature by individual feature. MapMerger’s algorithms automate the process, and create a new, custom, fit-for-purpose dataset that is more accurate, and does not require the user to accept or reject the software’s decisions.
This fact may allow less experienced operators to achieve higher fidelity conflation. MapMerger’s Match Information File can be applied to future conflations when the data is updated, so that the user will never need to conflate the same features twice. You will never need to update a dataset’s contents more than once!
Additionally, MapMerger allows large datasets to be sub-divided and worked in parallel by multiple operators and then easily and accurately re-assembled into a single dataset for “write out” of the final result. The GISP did not find this very important capability in GeoMedia Fusion. Finally, MapMerger’s “Match Strategy Library” enables users to choose different matching strategies for different geographic areas, while GeoMedia Fusion’s rule-based strategy applies the same rules across the entire dataset.
MapMerger provides immediate, visual feedback on successful matches, by displaying an easy-to-understand graphical chart, and a Match Status Table to ensure that 100 percent of features are captured in the conflation process. GeoMedia Fusion, instead, only gives feedback on a link-by-link basis, requiring the user to constantly manually zoom in and out and pan the dataset in order to decide whether to accept or reject each and every match.
Data, data, data!
It is clear that rich attribution of feature information from one dataset to another, creating robust spatial datasets with up-to-date geometries, is becoming the “coin of the realm.” The examples outlined in this article emphasize the importance of being able to create fit-for-use datasets by enriching existing data to support decision-making.
Creating new, more comprehensively-attributed datasets will require different tools and different methods than what has sufficed in the past. The early adopters in the GIS market already know that automated conflation of GIS data is going to be the competitive advantage that will differentiate the winners from the losers. Whether you subscribe to automated conflation services, or purchase a software license, or create a partnership with one of these capable automated conflation sources, conflation is going to be a part of the future of GIS.
More data, more accuracy, more responsiveness to schedule and financial requirements—intelligently aggregated into truly beautiful, custom, fit-for-purpose datasets—are possible right now. MapMerger and GeoMedia Fusion are both “tool boxes” that can be used to improve the quality and speed of data aggregation and, therefore, significantly lower its production costs. Everyone who creates or maintains GIS vector data will conflate at some time. The winners in the GIS game will turn out to be the early adopters who see these capable programs now and incorporate them into their workflow today. Those who fail to see or understand the value of conflation have missed the boat.