Skip to main content

Text extraction methods and integration in Thinkwise

  • August 1, 2024
  • 2 replies
  • 57 views

Forum|alt.badge.img

 

Hi,

We’re seeking advice on extracting text from documents. Specifically, we’re interested in:

1. What methods are best for extracting text from various document types? For example, should we use OCR for scanned documents, or are there other methods for Office documents and PDFs?


2. How can we incorporate text extraction functionality into Thinkwise? Is there a system task or approach already available for this?

 

Any insights or solutions for this one? Thanks.

This topic has been closed for replies.

2 replies

Mark Jongeling
Administrator
Forum|alt.badge.img+23

Hi,

The platform can integrate with a 3rd party that can extract text from documents, Jasper has mention a couple here:

For example, you can opt to use Klippa DocHorizon.

I believe this can be done using a Web connection inside a process flow. You submit the file and send it to Klippa for OCR, and it'll return the contents of the file back for you to process into your application.

I'm curious about other solutions of our Community members 😄


Forum|alt.badge.img

Well, I’m curious too… 😀 Thanks a lot for the answer.