Question

sending a pdf with the http connector

  • 12 February 2022
  • 7 replies
  • 134 views

Userlevel 3
Badge +7

I am trying to upload a pdf file to an external server. The http-connector should sent a multipart/form-data message, consisting of an application/json part and an application/pdf part.

I have constructed the message and I am able to upload a file to the server. However, the pdf file seems to be corrupted or somehow transferred incorrectly.

I am reading the pdf file as a varbinary and since I am unable to insert a binary in the content-variable of the http-connector, I am forced to convert the varbinary to a varchar. I use the following code to do so:

convert(varchar(max), @file_data, 1)

To evaluate I have sent the message from the Thinkwise http-connector as well as from Postman to RequestBin.com. The messages arrive differently:

 

Postman, raw, looks like this (only part of the data copied)

0000000 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 0000010 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 32 34 33 39 0000020 37 33 36 39 30 35 31 32 39 35 31 33 37 37 36 38 0000030 31 37 34 32 0d 0a 43 6f 6e 74 65 6e 74 2d 44 69 0000040 73 70 6f 73 69 74 69 6f 6e 3a 20 66 6f 72 6d 2d 0000050 64 61 74 61 3b 20 6e 61 6d 65 3d 22 70 61 79 6c 0000060 6f 61 64 22 0d 0a 0d 0a 64 69 74 20 69 73 20 74 0000070 65 6b 73 74 31 0d 0a 2d 2d 2d 2d 2d 2d 2d 2d 2d 0000080 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 0000090 2d 2d 2d 32 34 33 39 37 33 36 39 30 35 31 32 39 00000a0 35 31 33 37 37 36 38 31 37 34 32 0d 0a 43 6f 6e 00000b0 74 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e 00000c0 3a 20 66 6f 72 6d 2d 64 61 74 61 3b 20 6e 61 6d 00000d0 65 3d 22 66 69 6c 65 22 3b 20 66 69 6c 65 6e 61 00000e0 6d 65 3d 22 4f 42 56 2d 32 32 32 32 30 35 31 30 00000f0 30 20 2d 20 43 68 61 6e 74 61 6c 73 20 43 6f 75 0000100 6e 74 65 72 20 20 2e 70 64 66 22 0d 0a 43 6f 6e 0000110 74 65 6e 74 2d 54 79 70 65 3a 20 61 70 70 6c 69 0000120 63 61 74 69 6f 6e 2f 70 64 66 0d 0a 0d 0a 25 50 0000130 44 46 2d 31 2e 37 0d 25 e2 e3 cf d3 0d 0a 34 32 0000140 20 30 20 6f 62 6a 0d 3c 3c 2f 4c 69 6e 65 61 72 0000150 69 7a 65 64 20 31 2f 4c 20 39 36 37 30 38 2f 4f 0000160 20 34 34 2f 45 20 37 31 31 36 30 2f 4e 20 35 2f 0000170 54 20 39 36 33 34 36 2f 48 20 5b 20 34 38 35 20 0000180 32 31 32 5d 3e 3e 0d 65 6e 64 6f 62 6a 0d 20 20 0000190 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 00001a0 0d 0a 35 35 20 30 20 6f 62 6a 0d 3c 3c 2f 44 65

and 'pretty’ it looks like this (also partially copied):

----------------------------243973690512951377681742. Content-Disposition: form-data; name="payload". . dit is tekst1. ----------------------------243973690512951377681742. Content-Disposition: form-data; name="file"; filename="OBV-222205100 - Chantals Counter .pdf". Content-Type: application/pdf. . %PDF-1.7.%..... 42 0 obj.<</Linearized 1/L 96708/O 44/E 71160/N 5/T 96346/H [ 485 212]>>.endobj. . 55 0 obj.<</DecodeParms<</Columns 5/Predictor 12>>/Filter/FlateDecode/ID[<BEE714990D59B546AF20ED184B313FB3><BEE714990D59B546AF20ED184B313FB3>]/Index[42 29]/Info 41 0 R/Length 81/Prev 96347/Root 43 0 R/Size 71/Type/XRef/W[1 3 1]>>stream. h.bbd`.``b``Z "..$C9....".sA$.^...........2`..$ ..#.$..;...4..X...j..........`C.". endstream.endobj.startxref. 0. %%EOF. . 70 0 obj.<</Filter/FlateDecode/I 166/Length 130/S 110>>stream. h.b```f``.c`a``[. ....@1..h``h....k.y.pC.....\.....B%..u.w4..Hz..^...

 

When sent from the HTTP connector it looks like this:

--abcde12345 Content-Disposition: form-data; name="payload" { "name": "test.pdf", "extract": true, "extractionTypes": [ "TEXT_TAGS" ] } --abcde12345 Content-Disposition: form-data; name="file" Content-Type: application/pdf
0x255044462D312E370D25E2E3CFD30D0A343220302…..E6E5DDE5ED5

 

I fear that sending a pdf file this way is not possible, or am I doing something wrong here?

 


7 replies

Userlevel 5
Badge +10

It is difficult to analyze your case based on the information we have at our disposal. Here are some bits of advice and maybe you can figure it out based on this. It’s best to send the file as varbinary, and instead of converting the file to varchar it’s better to convert the headers to varbinary. Below an example for you to work with. 
 

declare @line_break varbinary(2) = 0x0D0A;
declare @separator varbinary(100) = convert(varbinary(100), '--abcde12345');
declare @final_separator varbinary(100) = convert(varbinary(100), '--abcde12345--');

declare @request_content varbinary(max) =
@separator + -- Start each block with the multipart separator
@line_break + -- Follow each separator with a line break
convert(varbinary(max), 'Content-Disposition: form-data; name="payload"') + -- The headers
@line_break + -- Always follow the headers with a double line break
@line_break +
convert(varbinary(max), 'dit is tekst1') + -- The data
@line_break + -- Always follow the data with a line break before the next (or final) separator
@separator +
@line_break +
convert(varbinary(max), 'Content-Disposition: form-data; name="file"; filename="OBV-222205100 - Chantals Counter .pdf". Content-Type: application/pdf') +
@line_break +
@line_break +
@pdf_file +
@line_break +
@final_separator -- The final separator should the the separator plus two additional dashes at the end

 

Userlevel 3
Badge +7

Thanks Erwin, I’ll try that and post the results.

Userlevel 3
Badge +7

Using datatype varbinary in stead of varchar for the content variable worked out fine. I was able to upload the pdf in the correct format using the code from above.

However, I had to import the pdf-file using sql (openrowset, single_blob). I couldn't get it to work with the read file connector. Does it import a file differently?

Userlevel 5
Badge +10

I see the last question was not replied on yet. Do you still need help on this? 

Userlevel 3
Badge +7

Well, it would be nice to know why I can not import a pdf-file using the read file connector. As stated, I import the file in the template with the openrowset statement and that works fine, but it would be better using the connector. So, if that doesn't work, it would be nice if it was on some todo list 😉

 

Userlevel 7
Badge +19

The Read file connector is not the same as Importing data. The connector will only return the file data to be used in further processing. In SQL, you can convert the received file data to characters in a varchar(max) for example. Thereafter with some more code, it is possible to process the file data into tables. Does that answer the question?

The Universal GUI (/Indicium) does support the import of files and is quite powerful, maybe that can help out. More info on it here: https://docs.thinkwisesoftware.com/docs/indicium/importapi

Userlevel 3
Badge +7

The import api is certainly useful, but does not 'import’ a pdf-file. The main problem is still that I seem to need the bulk insert to read and process the data from the pdf file correctly. No matter how I configure the read file connector, it does not seem to work in this case. I do not need to import the data, I need to upload it to another server in the correct format. To do that I need to process the file data in a template because it has to be inserted in a multipart/form-data as described above.

 

What it all comes down to: why can't I read and then process this pdf correctly with the connector, when it does work with the bulk insert?

Reply