Solved

Bulk import CSV / Excel data

  • 2 December 2019
  • 4 replies
  • 93 views

Userlevel 2
Badge +5

I'm trying to let the end user import ‘large’ sets of data (CSV/Excel) into the database via the Web GUI. It's not that large as in this (https://community.thinkwisesoftware.com/development-13/best-practice-for-inserting-large-amounts-of-rows-into-one-table-837) thread, but large enough to let the web interface time-out. The problem then is that the import is partially done: It could have imported for example 3k of 50k records. The end-user doesn't know what to do: What part is partially imported? And how to fix it?

I'm trying to speed this up, but I can't really increase it's performance. When I take a look at the debugger in de Windows GUI I see a lot of database-calls per row. For each row the default and layout concept are triggered, and the record is inserted per piece. Imagine you want to insert 50k records, and each cycle takes only 5ms, but when it's multiplied with 50k it's 250 seconds. From a user point of view this is absolutely unacceptable (besides it probably crashes in the web interface).

I have thought of disabling the logic concepts, but the problem still remains. And disabling these concepts introduces other problems, so it's not really a good idea to do.

I'm not sure if it's an feature request, but this part (the Import) could be improved a lot. If I compare a 50k set that is imported via SqlBulkCopy.WriteToServer it takes only two seconds. Due to the fact that this operation is set-based and does not need unneccesary roundtrips to the server. My suggestion would be to do have a “Bulk” Excel/CSV import that is set-based, ignoring the concepts and inserting set-based, not loop-based.

Are there any other possibilities within the Thinkwise platform to increase the performance of the Import feature, in specific the web GUI?

icon

Best answer by Vincent Doppenberg 3 December 2019, 10:51

Hi René,

I think that both asynchronous imports and transactional imports are nice feature requests. An idea has already been posted for asynchronous imports, you can vote on it here. I recommend posting an idea for transactional imports, if you would like to see this feature.

While asynchrony will solve the timeout issue that you are facing, I don't think it's a complete solution to the problem, which is that imports are simply quite slow when the number of records is very high (i.e. >10k). I think that there's a trade-off here between applying application logic to each record and importing records very quickly. I don't think that either option is a one size fits all solution, so it might be worthwhile to submit an idea to support both and making it configurable.

It is already possible to avoid the timeout by the way. You can set the executionTimeout option in the web.config of the Web GUI. It is not ideal, since this is an application-wide setting and it won't prevent the Web GUI from being blocked for a long time, but it should allow the import to finish.

View original

4 replies

Userlevel 4
Badge +2

Hi René,

The Web GUI does not support bulk imports in a set-based manner. I think there are a few things you can do to speed up the process, but if the order of magnitude is 50k records, then I think our natively supported features will always remain slow. So, if you want native support for fast, bulk imports, then this would indeed be a feature request.

With that said, I think you can actually implement this feature fairly easily yourself. You can for instance create a task with a file upload parameter to accept a CSV file and write it to a location where the database can access it. Then, with a process flow, automatically call another task which performs a bulk insert statement, using the uploaded CSV file.

If you don't want the database server to have access to the file system, then you can probably still achieve a reasonable level of performance by sending the CSV data itself to the task (by using a Read File connector in the process flow), parsing it to a tabular structure and using the select into statement.

Userlevel 2
Badge +5

Hi René,

The Web GUI does not support bulk imports in a set-based manner. I think there are a few things you can do to speed up the process, but if the order of magnitude is 50k records, then I think our natively supported features will always remain slow. So, if you want native support for fast, bulk imports, then this would indeed be a feature request.

With that said, I think you can actually implement this feature fairly easily yourself. You can for instance create a task with a file upload parameter to accept a CSV file and write it to a location where the database can access it. Then, with a process flow, automatically call another task which performs a bulk insert statement, using the uploaded CSV file.

If you don't want the database server to have access to the file system, then you can probably still achieve a reasonable level of performance by sending the CSV data itself to the task (by using a Read File connector in the process flow), parsing it to a tabular structure and using the select into statement.

Hi,

Problem with developing a solution ourself is that it's to complex for the end user. The simplicity of a generic import is easier to understand for an end user. Next to CSV, also Excel files should be supported. Besides that I'm bound to an Azure platform, so file handling isn't the best solution. Building a custom import for generic formats (Excel/CSV) is not the optimal use of the Thinkwise platform I think.

Regarding the feature request, it should be bulk import or not having a time out (which is the big major issue, solved by an asynchronous import that does not time out, and is transactional (!)). Two complete different approaches, I'm not sure yet what is the best. Any thoughts on that?

Userlevel 4
Badge +2

Hi René,

I think that both asynchronous imports and transactional imports are nice feature requests. An idea has already been posted for asynchronous imports, you can vote on it here. I recommend posting an idea for transactional imports, if you would like to see this feature.

While asynchrony will solve the timeout issue that you are facing, I don't think it's a complete solution to the problem, which is that imports are simply quite slow when the number of records is very high (i.e. >10k). I think that there's a trade-off here between applying application logic to each record and importing records very quickly. I don't think that either option is a one size fits all solution, so it might be worthwhile to submit an idea to support both and making it configurable.

It is already possible to avoid the timeout by the way. You can set the executionTimeout option in the web.config of the Web GUI. It is not ideal, since this is an application-wide setting and it won't prevent the Web GUI from being blocked for a long time, but it should allow the import to finish.

Userlevel 3
Badge +3

In case to should build async import in the Web GUI itself.

 

You should think about the load an action like this will cause on your Azure environment, especially when multiple users will run it separately. This will be very costly moments if you Azure environment manage to scale up fast enough.

 

Regards Erik

Reply