Blog

Natural language search using generative AI

1 year ago
November 22, 2023
4 replies
410 views

Remco Kort
Administrator
9 replies

Search is one of the most used functions of any software application. But there are many ways to implement it. In this video I want to highlight a new way of implementing search using the large language model embedding integration that the Thinkwise Software Factory provides. This allows users to search by using natural language.

https://youtu.be/pwPyN6RCbp4

Did this topic help you find an answer to your question?

+16

Freddy
Thinkwise Local Partner Brasil
536 replies
1 year ago
December 4, 2023

Hi @Remco Kort,

Do you have more information on how to create the base vectors for the searchable content? We have for example a database full of knowledge items and services.. each consists of a title, intro and full description.

I have questions doubts on how to structure the embedding request..

TW only support one string text input right? So in this case I should concatenate title, intro and full description?
How does TW/OpenAI treat HTML tags? Do they need to be stripped upfront?
There is max input (tokens), are these characters? If so what need to happen when you have an item that supersedes this max input?
Is there a smart way to validate if a vector is still up to date? Or you just need to track updates on the content that was embedded?

Do you have more input or information to share?

LEF — Lift. Empower. Fulfill. [lef.digital]

Remco Kort
Author
Administrator
9 replies
1 year ago
December 11, 2023

Freddy wrote:

Hi @Remco Kort,

I have questions doubts on how to structure the embedding request..

TW only support one string text input right? So in this case I should concatenate title, intro and full description?
How does TW/OpenAI treat HTML tags? Do they need to be stripped upfront?
There is max input (tokens), are these characters? If so what need to happen when you have an item that supersedes this max input?
Is there a smart way to validate if a vector is still up to date? Or you just need to track updates on the content that was embedded?

Do you have more input or information to share?

Correct, i think that would probably work for your use case. But I believe that in the ESG app made by Thinkwise they use larger amounts of data, so maybe ask them (Sander Kiesbrink, Anne Buit) as well.
As far as i am aware thinkwise will send the process action input to openAI. So I imagine that html will be send to openAI. I have not done this personally so I don't know how openAI would treat this data.
If you have an item that exceeds the max amount of tokens you will most likely need to find a way to chop the data up into smaller chunks. How to do this will depend on your specific data, maybe cut it into chapters, paragraphs, etc. Tokens are not the same as characters, for more information see: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
I currently just regenerate it after every change made to the data. Perhaps you could store a vector generation datetime and compare that to the updated_on field of your data with a system flow or task if you want more certainty.

+16

Freddy
Thinkwise Local Partner Brasil
536 replies
1 year ago
December 11, 2023

A small update after some testing:

You have to strip HTML tags, as some elements are forbidden, and everything you send over is treated as text and thus used to match.
When you have a larger dataset to check, the similarity check takes quite some time. I moved the check from the function to the view and that speeds things up quite a bit.. but nevertheless a search agains 255 documents (vector-embeddings) still takes up to 3 seconds.. so I would advise to do the similarity check in the background to reduce loading time for the user.

for the rest it works quite nice. Good to have this connector out of the box.

@Anne Buit @Sander Kiesbrink any tips on treating larger documents?

LEF — Lift. Empower. Fulfill. [lef.digital]

tiago
Captain
50 replies
10 months ago
August 30, 2024

At our customer we have a database of 155000 products.
The user needs to check if a product is already registered before asking to register a new product.

You can imagine it's hard to make sure the product bdoes not exist, especially because over time there have been changes in the standardization of the registration.

Would the solution in this thread be applicable for the described usecase?

Have a search term (the new product) and check that against the existing products?

@Freddy you mentioned it was a bit slow with 155 items, do you think 155.000 would be too much for a good user experience?

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Natural language search using generative AI

4 replies

Reply

Latest answered questions

Process flow task loop universal 5 sucessfull executes means 5 times to cancel

Trailing zeros in grid display

Move file connector corrupting file on Azure files

The maps component, when drawing a route on it, puts a dot on Null Island

Expand cube fields in Universal vs Windows for an editable cube

Cookie policy

Cookie settings

Reply

Related Topics

add an additional languageicon

Latest answered questions

Popular Tags

Sign up

Login to the Thinkwise Software Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings