GISCafe Voice Susan Smith
Susan Smith has worked as an editor and writer in the technology industry for over 16 years. As an editor she has been responsible for the launch of a number of technology trade publications, both in print and online. Currently, Susan is the Editor of GISCafe and AECCafe, as well as those sites’ … More » Basis Technology’s Rosette NLP Software Makes Sure Nothing is Lost in TranslationSeptember 30th, 2021 by Susan Smith
In an interview with Robert Wall, Account Executive at Basis Technology, GISCafe president Sanjay Gangal asked about Basis Technology’s Rosette natural language processing software (NLP) that has been widely used in the defense, intelligence and civilian sectors for quite some time.
When asked to describe his role at Basis and what the company does, Wall replied: “I represent Basis Technology’s federal team, supporting customers across the defense, intelligence and civilian agencies in the U.S. and several of our partner nations. We provide natural language processing software that analyzes text in over 50 languages, extracting key information to improve the accuracy and timeliness of decisions that affect business and national security. Our software, Rosette, helps government and commercial customers efficiently discover the information they need and maximize the value of their largest messiest and most diverse data streams. Basis supports a broad range of use cases across government and commercial industry, and as we expanded to new markets, we’ve seen that the interplay of human and artificial intelligence is at the core of many organizations’ strategic plans, and we believe that the customers in the geospatial world can truly benefit from our products, which leverage AI to augment and enhance analytical efforts.” And how is Rosette different from other natural language processing software? “In the world of human language technology, there’s like really two main categories of tech. There’s text analytics and there’s machine translation,” said Wall. “For the last 25 years, Basis Technology has been a leader in multilingual text analytics. Unlike machine translation, our platform Rosette analyzes content in its original language, which preserves all the context and quite literally ensures nothing is lost in translation. So when it comes to disambiguating references to key people, organizations and places that are mentioned in text, context is critical. For example, if a non-Arabic speaking analyst enters a string of Arabic text that includes a mention of the Syrian city, Ar-Raqqa, into a machine translation engine they’re probably going to come away with a different sense of the text than was intended by the author. Without an explicit reference to Syria nearby, it’s unlikely that it would be treated properly by the machine translation engine as a place because, Ar-Raqqa, literally has a meaning, a literal meaning within the language. So its translation to “tenderness” would seem out of place when everything around it is talking about ISIS. By contrast, whether without a proximal reference to Syria, Rosette would determine that, Ar-Raqqa, based on its surrounding context is a proper noun, a place, which should be transliterated from Arabic script to Latin characters rather than translated to its literal meaning. Within its range of capabilities, Rosette also enables conceptual as opposed to keyword or key phrase-based search, helping analysts more quickly and completely discover what they’re looking for. It’s easily adaptable to the content and jargon of new domains. It allows analysts to annotate or correct data rapidly to retrain its models, and it transforms the messy high-volume inputs at the top of the data funnel into structured outputs that are readily usable through the analysts’ applications of choice. In essence, Rosette pulls the signal from the noise, so the analysts can spend their time and talent on actual analysis. Right now, every agency across the intelligence community is dealing in some way, shape or form, with the challenges of scalably integrating unstructured content from both publicly available and classified sources in traditional analytic tradecraft. But dealing with that challenge, it doesn’t mean that like you have to throw out all of the tools that analysts are used to using or the infrastructure that supports those tools. Case in point, Rosette is an API with a set of functional modules. It’s been designed to integrate into and augment organization’s existing capabilities. That makes it much easier, quicker and less risky to undertake the business of enterprise modernization, and it enables existing systems to triage, enrich and fully exploit high volume multilingual text.” An telling example of how Basis Technology’s Rosette can be used: “Rosette has a broad suite of leading capabilities for extracting knowledge from text, but there’s one element of our offerings that I think is probably our most differentiating feature,” said Wall. “Rosette has the particular distinction of being the world’s leading capability for matching the names of people, organizations and places across their many variations, languages and scripts. And why is this so important? Well, in 2013 despite being watch-listed by both the FBI and CIA following his radicalization in Chechnya, Tamerlan Tsarnaev re-entered the U.S. His passage into the country didn’t depend on falsified documents or sophisticated methods to get around U.S. screening processes. Border screeners missed him because his name once transliterated from Cyrillic characters to Latin characters was spelled two different ways, and neither of which was an exact match for his name as shown on the watchlist. Because Customs and Border Protection couldn’t reconcile the various ways in which his name might be spelled, he was able to kill and maim innocent people at the Boston Marathon and since that event, CBP has leveraged Rosette for name matching at US border checkpoints, specifically at U.S. airports to ensure that nothing like the Boston Marathon bombing ever happens again because of a failure to properly recognize the name of a known threat. And we’re proud to serve CBP and the intelligence community helping identify, resolve, connect and understand the people, places and organizations worldwide that most directly affect our national security.” “How does the technology apply to geospatial missions?” asked Gangal. “Even though we tend to associate geospatial analysis with mostly visual information, there’s still a lot of heavy lifting done by those analysts in researching and locating information and names in foreign languages,” Wall said. “Rosette makes that part of the mission easier. So imagine searching for a place by what it sounds like, or matching a scribbled down reference to that place, which in all likelihood, is probably misspelled, to the correct entry in a geo-database, or not having to search with wild cards, risking completely missing what you’re looking for. Those are all use cases for which Rosette was designed. Rosette disambiguates references to key points of interest around the world across the many sources of data that analysts use, minimizing both false positives and false negatives as analysts search for and gather data. So whether derived from open source or classified collection, procured from industry, or exchanged among foreign partners, an increasing amount of that data exists in languages other than English, and Basis continues to work with our partners to extend and accelerate capabilities for extracting geospatially-oriented features from text, leverage Active Learning to enable analysts to easily correct source data and retrain AI models when needed, and to just make enterprise IT infrastructure better at processing and exploiting multilingual text to support intelligence analysis.” Wall noted in closing that advances in the application of AI and machine learning are necessary to maintaining global situational awareness, characterizing threats, protecting our borders, all that good stuff. Basis Technology will be demonstrating some of our newest capabilities at the GEOINT Symposium this coming in October, many of which address key challenges in the GEOINT workflow. “Whether you are attending that event or not, and interested in hearing more, I invite you to check out our website at www.basistech.com. You can also reach out to me at rwall@basistech.com.” Tags: climate change, data, geospatial, GIS, imagery, Infrastructure, intelligence, location, mapping, maps Categories: 3D Cities, analytics, asset management, Basis Technology, Big Data, climate change, cloud, data, disaster relief, emergency response, field GIS, geospatial, GIS, government, public safety, sensors, Video Interview |