Hello @Hugo Nienhuis,
I would say that, as a RESTful API, Indicium is not the recommended medium for transferring 22 million records from the database to some BI system. This is simply not the type of operation that Indicium is optimized for. Indicium operates on the assumption that data is retrieved one page at a time or at least with a top statement of a few hundred or a few thousand records. We are actually thinking of having Indicium enforce a maximum number of records to be returned per request and automatically providing a link to the next ‘page’. The reason for this is that Indicium needs to build the full response object in memory before serializing it to the client as JSON.
JSON itself is also not ideal at the scale of 22 million records, since it introduces quite a lot of overhead by including column names for every record. Comparatively, even a format like CSV is more efficient for large data transfers.
You wrote:
We did some testing and the throughput is appr. 1 million records per 12,5 minutes, so we estimate almost 5 hours total transfer time.
I’m not sure how big your test sample was, but I don’t think you can extrapolate it to 5 hours for 22 million records, as you’ll most likely run into memory problems.
Using pagination to request 500 records at a time would prevent memory problems, but causes the overhead of sending the request to be duplicated 44000 times, so this would most likely slow down the throughput a fair bit.
If you insist on using Indicium for this purpose, then it would be a good idea to consider pagination, if the client allows this. Aside from that it’s important to investigate where the bottlenecks are. A memory bottleneck for example could lead to swapping which will greatly reduce throughput. But there could also be CPU bottleneck during serialization to JSON or a bottleneck on the database server. Depending on that, it’s possible that some measures could be taken to improve the transfer speed. Having the Indicium that serves the GUI on a separate web server will ensure its stability, unless it’s the database server that’s having a hard time with the processing. Scaling Indicium and requesting multiple pages in parallel can also greatly increase throughput, if the BI client is able to do such a thing.
With that said, I recommend using a different solution altogether, rather than trying to make this work via Indicium. There are ETL/ELT solutions available that are designed specifically to handle this type of operation. Some good examples are Azure Data Factory or SQL Server Integration Services. But even exporting the data to CSV and then importing it would be preferable.
I hope this helps.