Susan Smith has worked as an editor and writer in the technology industry for over 16 years. As an editor she has been responsible for the launch of a number of technology trade publications, both in print and online. Currently, Susan is the Editor of GISCafe and AECCafe, as well as those sites’ newsletters and blogs. She writes on a number of topics, including but not limited to geospatial, architecture, engineering and construction. As many technologies evolve and occasionally merge, Susan finds herself uniquely situated to be able to cover diverse topics with facility. « Less
Susan Smith has worked as an editor and writer in the technology industry for over 16 years. As an editor she has been responsible for the launch of a number of technology trade publications, both in print and online. Currently, Susan is the Editor of GISCafe and AECCafe, as well as those sites’ … More »
Pitney Bowes Brings Location-Based Technology to Big Data Environments with New Cloudera Partnership
February 9th, 2017 by Susan Smith
Pitney Bowes Inc., a global technology company that provides innovative products and solutions to power commerce that acquired the mapping company MapInfo some years ago, announced last week that it has entered a partnership with Cloudera to deploy geospatial processing and data quality solutions to end users on top of Cloudera Enterprise. Clients will now have access to powerful location-based technology to enrich their Big Data investments.
According to company materials, Cloudera clients will now have the ability to not only tackle the volume, velocity and variety of big data, but they will also be able to manage the veracity of it. Currently Pitney Bowes is deploying four Cloudera Certified Technology products that will ensure clients are accessing the highest quality location data to make more accurate and successful business decisions, including Pitney Bowes Spectrum Geocoding for Big Data, Spectrum Location Intelligence for Big Data, Spectrum Data Quality for Big Data and the Spectrum Technology Platform.
In a conversation with Joe Francica, Managing Director, Geospatial Industry Solutions at Pitney Bowes Inc., GISCafe Voice asked about the new partnership with Cloudera.
What precipitated the relationship with Cloudera?
What we’ve been trying to do for at least 12 months is establish Pitney Bowes as a supplier of location intelligence solutions to the user of Big Data. It was client driven in many ways, we had a few clients moving to a Apache Hadoop environment and we found it necessary to have and develop a relationship with some of these organizations that provide and develop integration and deployment services like Cloudera. But we also have an established relationship with Hortonworks that drives actionable intelligence with Connected Data Platforms that maximize the value of all data—data-in-motion and data-at-rest. The company focuses on the development and support of Apache Hadoop, a framework that allows for the distributed processing of large data sets across clusters of computers.
We now have two relationships with these platform service providers and have certified on their Apache Hadoop frameworks. Hadoop is an Open Source technology being looked at by many organizations because of their ability to process very large data streams or very large data repositories. What these organizations are trying to do in a distributed processing environment is implement Hadoop or whatever the latest flavor is; Sparc, etc.
We took our Spectrum technology and certified it on both Cloudera and Hadoop frameworks. For many years, our Spectrum technology has been a primarily server-based deployment for data management solutions. With it we’ve done data quality, address validation and verification, data cleansing as well as the typical spatial processes of geocoding, reverse geocoding, routing and basic mapping and map analysis. We’ve taken a particular set of those modules from Spectrum and certified them within those two platforms, Cloudera and Hortonworks. We radically reduced the processing time for both data quality and geospatial processing, so one example is a client with billions of records to process, wanting to geocode. We could do it on Spectrum using Hadoop and were able to reduce that time dramatically many times over. While we’re not a big data solution provider, we have the advantage of being able to enable these big data environments to do both data quality and geospatial processing.
We’re enabling these Big Data workflows with the processing that we would typically do in our Spectrum technology but now we’ve implemented them for a Big Data framework like Hadoop.
While these Big Data frameworks allow you to do much more, what does the role of Spectrum become?
The Spectrum modules are essentially APIs into Hadoop so we’ve certified each of these APIs: our address matching, data normalization, universal name parser, on the data quality side. On the location intelligence side, are our global geocoding API, our location intelligence API which does spatial processes like finding, spatial polygons, spatial joins, and then our other API is routing. So we’ve taken both the data quality APIs and location intelligence APIs and put a wrapper around them into a software development kit. That SDK is delivered as a certified solution for Hortonworks. We have certified the modules which has rapidly reduced the time needed for processing.
Our clients may have had to wait hours to geocode records, 100 million records for geocoding, we’ve gone from 12 hours for geocoding to 30 minutes.
If our clients want us to do something every month, instead of waiting 12 hours they can do it quickly. If they want to do a spatial process, like an insurance company may want to find the distance of a firehouse from an individual property to calculate risk, they can now do it on a more regular basis. That time is so radically reduced that they won’t hesitate and ask, shall I run this process or not?
With our significant data resources, such as demographic data and new very unique data sets we’re creating such as to the firehouse or distance to the coasts, we’re able to run these calculations in a big data environment since we can now append these distance calculations to any individual record. If it’s an individual property and we want to know the distance to the coast to identify the flood plain, we can now append that distance to the individual record. So it’s taking our data coastline data, and calculating a distance to that coastline and retrieving the answer and appending that answer to the individual client record.
And that’s just one data element. Given our history as a data supplier not just a software company, we now have a huge treasure trove of data products, any number of which can be used as ancillary data that you can append to customer records. The way we’re doing that now is by officially doing it internally by using Hadoop to do what we call our Human Geoenrichment Module to create datasets for clients.
We are supporting clients in Hadoop framework but also using it ourselves to support data product development
Most clients are now doing this. They’re now investigating whether it’s Hadoop or Sparc and trying to speed their processes. We partnered with Cloudera and Hortonworks because they’re experts in deployments in a multi-clustered environment. It helps us to go to market with a reputable service partner, and no it’s upon us to tell the market what you can do with the location based technology.
There is still this challenge of why do we want to do location based intelligence, but it’s becoming less of a hurdle as more location based data is collected: everything mobile to social media that is becoming ubiquitous.
In terms of adding them to clients’ subscription programs, how does that happen, or rather, what kind of payment schedule is the new Cloudera arrangement on?
We’re going to be flexible in terms of how we deploy the technology. Our Commerce Cloud is configured for a subscription-based pricing model for our cloud micro-services. We can run our solution on premise as a subscription-based pricing model today. We do have a relationship with Amazon Web Services so if somebody’s running a Hadoop cluster in Amazon we can work in that environment too. It all depends on the environment – it depends whether the customer wants to work on a public or private cloud. We’re all working through the business models right now. One reason we stood up for Commerce Cloud and put our APIs out there is that everybody’s moving toward that subscription type service.
The competition is also working on the same things. We have presented at OGC meetings. The location power segment of their meetings is specifically devoted to Big Data for everyone involved in geospatial technology.
Organizations are very adept at implementing open source technology. We’re really moving forward in a dramatic and speedy way in making certain our enterprise solutions run in these new environments.
How does the U.S. compare with other countries in deploying the Cloudera and Hadoop technologies?
We do work with multi-nationals, and we see that the U.S.-based companies are moving a bit quicker than Europe. But based on work we did in Europe last year they’re recognizing big data environments are necessary. This is less so in the Asia/Pacific region, but that’s changing as well. We see uptake in the U.S. market more so than in any other regions.
Categories: cloud, cloud network analytics, data, geospatial, GIS, handhelds, indoor location technology, location based sensor fusion, location based services, location intelligence, MapInfo, Open Source, OpenGeo, Pitney Bowes, utilities