Skip to content

How to Turn Unstructured Data into Decisions with FME

Unstructured data accounts for 80% of the world's information. Here's how to use AI and FME to process unstructured text, images, documents, and more, and gain actionable insights in an automated, scalable way.
Engineers Working Engineers Working

Unstructured data—meaning any information that doesn’t have a predefined model or format, such as text, images, media, documents, and much more—is everywhere. In fact, IDC estimates 80% of the world’s data is unstructured. Scanned forms, satellite imagery, field reports… It’s all incredibly valuable information, but it’s difficult to automate, organize, search, or glean insights from these datasets—until now.

With the help of AI and FME, processing unstructured data no longer involves hours of manual labor. With the right preprocessing, it’s possible to extract insight from this data the same way you would from a database or spreadsheet.

In a recent webinar titled “Taming the Chaos: How to Turn Unstructured Data into Decisions,” we discussed how to process unstructured data types and unlock insights using FME.

Key takeaways:

  • Use AI models within FME to extract insights from unstructured data at scale. Summarization, classification, metadata extraction, and other processing can all be automated, saving manual effort and resulting in powerful workflows.
  • Write clear, structured AI prompts with defined roles and output formats. Include enumerated options and confidence scores to enable auditability and consistency.
  • Semantic search makes unstructured text usable. Enable end users to query documents using plain language and find contextually relevant results, not just exact matches.

How to do AI Prompt Engineering in FME (It’s different!)

Since AI prompts in FME are part of automated, repeatable workflows, they need to be clear, specific, and structured.  This isn’t a regular ChatGPT conversation, but a prompt that must result in accuracy and consistency. Some best practices:

  • Be clear and specific, and provide context.
  • Define role-based prompts (e.g., “You are a document classification assistant.”)
  • Ask it to structure the output in JSON format for easy downstream processing in FME.
  • Provide enumerated options (e.g. a list of categories the model can choose from).
  • Ask the model for an explanation or confidence score, and store that in an attribute for quality testing purposes.
  • Refine prompts iteratively through testing.

A good AI prompt in FME includes all of the above: start by telling the AI its role and the context, tell it what to do with specific constraints (such as classifying text into one of five categories), and define the JSON structure you want for the output, including an attribute for an explanation and a confidence score.

An example AI prompt in FME.

 


Demo: Summarizing Large PDF Reports

Automating document summarization with AI and FME drastically reduces human effort while improving data accessibility. In the first live demo in the webinar, we showed how to process a 100+ page financial report using the OpenAI Connector in FME. The workflow:

  1. Uploaded the PDF to OpenAI via FME
  2. Used a structured prompt to extract key metrics
  3. Flattened JSON output into attributes
  4. Cleaned and pivoted the data for use in Excel or dashboards

This allowed a document that might take hours to analyze manually to be processed in minutes, enabling faster business decisions.

Learn more in our tutorial: Getting Started with AI in FME: Extracting Insights from Unstructured Documents


Demo: Auto-Classifying and Organizing 1000 PDFs

In the next demo, we organized 1000 random PDFs with no clear file names or structure. Using a combination of FME’s file reader, OpenAI for content analysis, JSONFlatteners, and attribute management tools, the workflow:

  • Categorized documents by type (e.g., legal, financial)
  • Extracted keywords and summaries
  • Renamed files automatically
  • Sorted them into categorized folders

Refining prompts improved consistency in categories, file names, and language detection.


Demo: Semantic Search with Embeddings

The next demo covered how to enable semantic search by embedding unstructured text and storing it in a Postgres database. The use case involved maintenance logs in Word format, which were turned into searchable vectors. Steps:

  1. Read Word tables using FME’s new Microsoft Word Reader
  2. Turn each entry into an embedding using OpenAI
  3. Store embeddings in Postgres
  4. Build a search interface using plain-language queries

This lightweight, fast vector search allowed users to find relevant results based on meaning, not just exact keywords—surfacing insights like “erosion” when searching for “metal decay.” The demo also addressed data privacy by using local models (e.g., Ollama), and the solution was scalable for use cases like customer chats, legal docs, or field notes.


Demo: Image Metadata Extraction and Trust Scoring

The final demo focused on photo metadata validation, using images submitted through a Survey123 app. The goal was to determine whether a reported issue was trustworthy. Steps:

  • Extract Exif metadata from submitted images
  • Compare it to reported locations and timestamps
  • Run LLM analysis to check if the image matches the report
  • Calculate a confidence score based on metadata consistency

FME generated PDF reports for high, medium, and low-confidence submissions, helping field crews prioritize real issues.

Metadata is often messy but valuable, so it’s important to clean it, enrich it, and use it to validate claims and reduce false leads.


Learn More

FME works with any data and any AI, making workflows truly flexible. It’s easier than ever to transform unstructured data into actionable insights, which means there is a lot of potential for the 80% of data that was previously unusable by traditional data processing pipelines.

Get started with our free resources such as the FME Academy and Accelerator courses.

Safe product icons
Learn FME in 90 minutes. Get started today!

Real change is just a platform away.