Solved

FTP Connector encoding

  • 2 August 2022
  • 3 replies
  • 117 views

Userlevel 2
Badge +5

Can you tell me how to encode the File data for the FTP connector when using the FTP method 'Upload file (STOR)’?

It seems that the connector uses the UFT-16 Little Endian although the source for the data comes from a VARCHAR.

icon

Best answer by Vincent Doppenberg 16 August 2022, 15:42

View original

This topic has been closed for comments

3 replies

Userlevel 6
Badge +4

Hello Onno,

We can't tell you how you should encode the file data for the FTP connector, because the question is how you want your file data to be encoded. There is no right or wrong answer here, it all depends on the situation.

With that said, encoding is almost never relevant or useful in the various file connectors that are supported by the Thinkwise platform. This question usually arises when an existing file is read by a Read File connector, stored in a VARCHAR process variable, written/uploaded by a Write File or FTP connector and then the final result is a corrupted file which is not the same as the original. If this is the case, then the real question is “How do I prevent files from becoming corrupted by the FTP connector?”. The answer is to use a VARBINARY process variable, instead of a VARCHAR process variable and leave the encoding input parameter blank.

The concept of encoding is only relevant when transferring back and forth between binary and text representations. When dealing with files, it is always better to take a “binary all the way” approach, which eliminates the question of encoding entirely. The encoding input parameter is only useful when dealing with plain-text files (such as .txt, .ini, .json) and only in the cases where you must work with a VARCHAR value (e.g. you already have a VARCHAR value and want to write a plain-text file or you have a plain-text file and you want to do text-based operations on its contents in SQL). Then the question becomes, what is the encoding of your plain-text file or what do you want the encoding of your plain-text file to be? The answer is typically UTF-8 or ANSI, but it really depends on the situation.

I hope this explains a few things. Please let me know if you have further questions about this.

Userlevel 2
Badge +5

Vincent,

Sometimes the encoding is relevant for the process.

In this case it is regarding a XML file. As you know, the header of the XML file contains the encoding: <?xml version="1.0" encoding="utf-8"?>.

Therefor we must be able to have the file encoded properly. However this is not possible with the FTP connector.

We also can not just change the encoding, due to the complete process at customer’s side.

Userlevel 6
Badge +4

Hello Onno,

The key question here is where the XML comes from.

Are you reading an existing XML file with a Read file connector and then uploading it with an FTP connector? If so, then the XML file should already be encoded correctly and reading it into a VARBINARY process variable and then uploading it with the FTP connector will not alter the encoding in any way.

The other option is that there is no existing XML file, but you have a text representation of the XML. For example, an XML or VARCHAR column in the database or a FOR PATH query that produces XML for you. If this is the case, then it’s a bit more tricky because by default VARCHAR uses the ANSI encoding (Windows code page 1252) and NVARCHAR uses the UTF-16 encoding in SQL Server. As of SQL Server 2019 however, there is support for UTF-8 encoding via collations. So you can actually perform UTF-16 NVARCHAR conversions to UTF-8 VARCHAR and then convert to VARBINARY to get a UTF-8 encoded binary representation of the XML text.

CREATE TABLE #temp_utf8
(
col VARCHAR(MAX) COLLATE LATIN1_GENERAL_100_CI_AS_SC_UTF8
)

INSERT INTO #temp_utf8 VALUES(N'谢谢')

SELECT CAST(t1.col AS VARBINARY(200))
FROM #temp_utf8 t1

I have just tested this myself and the example correctly returns the UTF-8 encoded bytes for those characters.

I hope this helps, please let me know if you have additional questions.