FME Blog

Trends Product & Technology

FME 2019 sneak peek: machine learning and natural language processing

Mark Ireland

December 13, 2018•9 min

Natural Language Processing (NLP) is a technique for computer learning of natural human language. Technically, natural language is any normal language that has evolved, to be used in a natural way of speaking. So NLP is all about understanding and processing human words. This article shows one particular aspect of NLP: the ability to mine information and categorize it, based on prior examples. FME2019 will include two new transformers

Hola amigos! How’s it going? All tickety-boo? There’s a new bit of FME for 2019 that’s dead brilliant, and if you ‘old your ‘osses I’ll tell youse about it.

Well that’s a unique opening line for a blog post! My usual style is quite formal and (I hope) a lot clearer. But I wanted to start out with more natural speech because today I’m covering Natural Language Processing (NLP); new functionality coming up in FME2019.

NLP is a technique for computer learning of natural human language. Technically, natural language is any normal language that has evolved. It doesn’t have to be slang. But not everyone writes in a formal manner, so often you want to process text that contains unusual phrases.

NLP can even involve a computer generating human-like speech! But today I wanted to show one particular aspect of it: the ability to mine information and categorize it, based on prior examples.

Let’s take a look how…

FME Natural Language Processing: The Scenario

To test FME’s new capabilities I needed a source of information – in NLP it’s called a corpus – and luckily I found one in the form of product reviews. Each review is matched to a label to define whether it is a positive or negative review:

__label__1 Very disappointed!: This is just AWFUL!
__label__2 Good book: Well written.

So label 1 means a negative review and label 2 means a positive review. I can use that to have FME learn what makes a review positive or negative, and then feed it unlabelled reviews for it to categorize for me. That’s often called sentiment analysis…

FME Natural Language Processing: The Transformers

FME2019 has two new transformers: the NLPTrainer and the NLPClassifier. The NLPTrainer is what I feed the labelled reviews, from which to build a model. The NLPClassifier is fed new reviews, and compares them to its review model to classify them as positive or negative.

Since this is fairly new, beta functionality, and because written words on a blog can be more permanent than might be expected, I thought I’d demonstrate using a video:

I forgot to mention a few items. Firstly that the output from the transformer also includes a summary feature, with information about the accuracy and key words used.

Secondly, that NLP is (mostly) language agnostic. It assumes an English-like sentence structure, but could work just as well on data stored in other languages. I do imagine you must carry out the training in the same language you are going to test against!

Finally, you can’t add to the model. You can overwrite it with new training, but not add to it. So you’d probably keep the original corpus, add to that, and recreate the model when necessary.

Anyway, hopefully the video helped you understand what I’m talking about (pun intended). But though NLP is interesting, what might FME users do with it?

FME Natural Language Processing: Examples

I always like to give a few examples of what new technology can be used for. Sometimes my thoughts and ideas lead nowhere, and I don’t mention them. Today I am going to mention these, to help steer you away from what I think are dead-ends.

Data Classification and QA

Classification? Well… obviously. This is what the above video already shows. I think this is the most likely use in FME.

One idea is taking weather forecasts and classifying them. For example, I wonder if I could train a model on what conditions lightning can occur in. Then I run new forecasts through the NLPClassifier to see if today’s conditions are favourable for lightning to occur (at which point I could issue a warning). I see a lot of possibility there.

It also made me wonder if NLP could support data QA. At first I though of an address database. If I train NLP on the difference between a good and bad address, might it help to pick up future problems as they happen? It might; but addresses are very structured and – as I understand it – NLP is all about unstructured, human speech. So although I haven’t tried it, I believe it’s better to stick to the standard transformers (Tester, AttributeValidator) for QA’ing structured data, and use NLP when the input is written sentences.

Classifying and QA’ing data with Natural Language Processing improves the relevance of the output by improving the quality of the input. But what if the NLP analysis is the output…

Business Intelligence Products

Have you ever thought about creating BI products with FME? You wouldn’t be the first! In fact a prior blog post featured a partner (setld) doing just that:

setld, FME and the 4 Vs of Big Data: Building Business Intelligence “Products”

One key sentence in that article says data is evaluated “against a word value lookup table (that setld maintains) in order to rank the top 100 news pieces”.

I won’t claim to know their full methodology, but to me the lookup table they maintain is equivalent to the NLP model that FME can now build. Although it might not be a 1:1 replacement, these new transformers might be able to automate some of their lookup table maintenance.

Basically making a product from NLP output is a real possibility. But it can also help internal processes…

Marketing

The Safe Software marketing team must have triggers to report on new FME-related content. But Google Alerts – as far as I can tell – are just keyword searches:

Yeah… sorry Google, but that’s not the right FME. Of course, that’s understandable since their alerts aren’t trained to our needs. But why shouldn’t our marketing team create an NLP model and run future alerts through the NLPClassifier, to filter out the ones that aren’t the FME we are interested in? If you work at a company with a marketing team, you could help them out by doing the same.

The NLP examples I’ve mentioned so far have all been non-spatial. So could we incorporate geography into NLP…

Spatial NLP

Let’s say you were mapping Twitter alerts about natural disasters. NLP could assess how relevant a tweet is, before adding its information to your map. For example, I guess a suitably trained model would be able to tell the difference between “Help! My house is on FIRE!” and “Yikes! My boss is going to FIRE me!” Basically you add a layer of filtering before the data gets onto your map, by teaching your computer to assess the context of the word “fire” in the tweet.

Interestingly – as this article mentions – you might also analyze language for hints about location. For example, given the tweet: “Tornado in Springfield! North of the Cottonwood River” NLP could be able to identify “Springfield” and “Cottonwood River” as being place names (I believe that’s called Named Entity Recognition).

Of course there are many Springfields in the US, but a well-trained model might even be able to tell which Springfield it is by reference to the Cottonwood River.

But why NLP? Why not human interpretation? Because we’re talking about automated systems. Yes, a human could interpret such messages, but not at scale, not at speed, and not automatically. But with NLP, FME Server can!

What I really wondered is whether spatial data itself could be used as the input! For example if I train an NLP model using point features labelled with coordinate system, could I get the NLPClassifier to identify the coordinate system of unlabelled data?! Probably not. That again would be structured data, plus I think NLP only works with words, not numbers. But it’s fun to let the imagination run wild sometimes!

FME Natural Language Processing: Summary

So that was a rough guide to upcoming Natural Language Processing functionality in FME2019.

In general, we can say that much FME use takes raw data and makes useful information from it; whether it’s translating format, restructuring the data, or filtering content. When you look at FME that way, really it’s all about business intelligence. Even spatial data and mapping is about getting the right information to the right people, in order to make better business decisions.

NLP can be a big help with that.

So far I have little idea of what structure an NLP model takes, or what some of the transformer parameters do; so you should take my suggestions as general thoughts, rather than definite rules.

What I hope I gave you is a basic understanding, after which you’ll find it easier to experiment.

Incidentally, if you watched the video all the way to the end, what do you think of FME package files? Pretty cool, eh? That’s going to be a huge development in how FME is delivered and updated. I think it may actually have the biggest impact of all the updates planned for 2019.

I don’t know if NLP is fully in the latest beta, because of how it’s packaged, but if you want to try it out, then get in touch. The same applies if you have any general questions. As we might say in the East Midlands, bungem ovva ear me duck!