Skip to main content
Solved

Indicium API performance - large table transfer


Hugo Nienhuis
Superhero
Forum|alt.badge.img+12

For BI purposes we want to make a call on the Indicium API and retrieve a large table from our application. The table has over 22 million records. We did some testing and the throughput is appr. 1 million records per 12,5 minutes, so we estimate almost 5 hours total transfer time. We want to speed up the process; it is a nightly job. But, we also are looking into transferring the data during office hours, but the gui becomes inresponsive during the process.

 

My questions

  1. how can we increase the performance?
  2. is it possible to create a second endpoint that will relieve the pressure on the gui endpoint?
  3. if we would upload the data to another API, would that be faster or give less strain on indicium?

 

 

 

Best answer by Vincent Doppenberg

Hello ​@Hugo Nienhuis,

I would say that, as a RESTful API, Indicium is not the recommended medium for transferring 22 million records from the database to some BI system. This is simply not the type of operation that Indicium is optimized for. Indicium operates on the assumption that data is retrieved one page at a time or at least with a top statement of a few hundred or a few thousand records. We are actually thinking of having Indicium enforce a maximum number of records to be returned per request and automatically providing a link to the next ‘page’. The reason for this is that Indicium needs to build the full response object in memory before serializing it to the client as JSON.

JSON itself is also not ideal at the scale of 22 million records, since it introduces quite a lot of overhead by including column names for every record. Comparatively, even a format like CSV is more efficient for large data transfers.

You wrote:

We did some testing and the throughput is appr. 1 million records per 12,5 minutes, so we estimate almost 5 hours total transfer time.

I’m not sure how big your test sample was, but I don’t think you can extrapolate it to 5 hours for 22 million records, as you’ll most likely run into memory problems.

Using pagination to request 500 records at a time would prevent memory problems, but causes the overhead of sending the request to be duplicated 44000 times, so this would most likely slow down the throughput a fair bit.

If you insist on using Indicium for this purpose, then it would be a good idea to consider pagination, if the client allows this. Aside from that it’s important to investigate where the bottlenecks are. A memory bottleneck for example could lead to swapping which will greatly reduce throughput. But there could also be CPU bottleneck during serialization to JSON or a bottleneck on the database server. Depending on that, it’s possible that some measures could be taken to improve the transfer speed. Having the Indicium that serves the GUI on a separate web server will ensure its stability, unless it’s the database server that’s having a hard time with the processing. Scaling Indicium and requesting multiple pages in parallel can also greatly increase throughput, if the BI client is able to do such a thing.

With that said, I recommend using a different solution altogether, rather than trying to make this work via Indicium. There are ETL/ELT solutions available that are designed specifically to handle this type of operation. Some good examples are Azure Data Factory or SQL Server Integration Services. But even exporting the data to CSV and then importing it would be preferable.

I hope this helps.

View original
Did this topic help you find an answer to your question?

5 replies

Forum|alt.badge.img+17
  • Moderator
  • 766 replies
  • May 14, 2025

Is it really necessary to get all 22 million records each night, i can’t imagine all of them have changed during the day? 

Making a copy Db solely used for the BI process would probably also work, depending on the use case.
You could either backup and restore a copy DB each night or only sync the changed records each night.
If there is a trace column on the table you could filter on that or add a checkbox to indicate if a row was updated. 

 


Hugo Nienhuis
Superhero
Forum|alt.badge.img+12

Unfortunately it is not that simple; we are dealing with a large amount of records. We make a lot of calculations within a time window between 20h00 and 24h00; one of the results is this 22mln record table, which is build up from zero. This is done monthly at night, not every night. However, we need to transfer the data to our BI. Up till now, they have been using the API to retrieve various tables/data. The amount of data is seriously increasing and we need a better performant solution to transfer all data (there are also several nightly jobs apart from this one). So my questions regarding this are: how fast can indicium provide/transfer all the data, what configuartions would be possible, is it possible to speed this up or is indicium not the right choice for handling this amount of data? Would a better app plan on Azure make a huge difference?

We are also looking into dumping the table to blob storage and transfer it in an alternative way.

 

I just hope to get some answers here so we can decide which way to go.


An interesting topic, which I am curious about as well. We also get more and more request for analytic data and that requires passing on a lot of information from our central application. Also resulting in numbers of millions.

The viability of Indicium as the connector is part of our considerations.I wonder what the answers to Hugo’s questions will be.


Forum|alt.badge.img+4

Hello ​@Hugo Nienhuis,

I would say that, as a RESTful API, Indicium is not the recommended medium for transferring 22 million records from the database to some BI system. This is simply not the type of operation that Indicium is optimized for. Indicium operates on the assumption that data is retrieved one page at a time or at least with a top statement of a few hundred or a few thousand records. We are actually thinking of having Indicium enforce a maximum number of records to be returned per request and automatically providing a link to the next ‘page’. The reason for this is that Indicium needs to build the full response object in memory before serializing it to the client as JSON.

JSON itself is also not ideal at the scale of 22 million records, since it introduces quite a lot of overhead by including column names for every record. Comparatively, even a format like CSV is more efficient for large data transfers.

You wrote:

We did some testing and the throughput is appr. 1 million records per 12,5 minutes, so we estimate almost 5 hours total transfer time.

I’m not sure how big your test sample was, but I don’t think you can extrapolate it to 5 hours for 22 million records, as you’ll most likely run into memory problems.

Using pagination to request 500 records at a time would prevent memory problems, but causes the overhead of sending the request to be duplicated 44000 times, so this would most likely slow down the throughput a fair bit.

If you insist on using Indicium for this purpose, then it would be a good idea to consider pagination, if the client allows this. Aside from that it’s important to investigate where the bottlenecks are. A memory bottleneck for example could lead to swapping which will greatly reduce throughput. But there could also be CPU bottleneck during serialization to JSON or a bottleneck on the database server. Depending on that, it’s possible that some measures could be taken to improve the transfer speed. Having the Indicium that serves the GUI on a separate web server will ensure its stability, unless it’s the database server that’s having a hard time with the processing. Scaling Indicium and requesting multiple pages in parallel can also greatly increase throughput, if the BI client is able to do such a thing.

With that said, I recommend using a different solution altogether, rather than trying to make this work via Indicium. There are ETL/ELT solutions available that are designed specifically to handle this type of operation. Some good examples are Azure Data Factory or SQL Server Integration Services. But even exporting the data to CSV and then importing it would be preferable.

I hope this helps.


Hugo Nienhuis
Superhero
Forum|alt.badge.img+12

Thank you for your reply Vincent. I will discuss this in our team and find a way forward.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings