Skip to main content

 

Hi,

We’re seeking advice on extracting text from documents. Specifically, we’re interested in:

1. What methods are best for extracting text from various document types? For example, should we use OCR for scanned documents, or are there other methods for Office documents and PDFs?


2. How can we incorporate text extraction functionality into Thinkwise? Is there a system task or approach already available for this?

 

Any insights or solutions for this one? Thanks.

Hi,

The platform can integrate with a 3rd party that can extract text from documents, Jasper has mention a couple here:

For example, you can opt to use Klippa DocHorizon.

I believe this can be done using a Web connection inside a process flow. You submit the file and send it to Klippa for OCR, and it'll return the contents of the file back for you to process into your application.

I'm curious about other solutions of our Community members 😄


Well, I’m curious too… 😀 Thanks a lot for the answer.