Solved

Base64 encoding


  • Sidekick
  • 11 replies

I'm having some trouble getting things to work. For a webservice we need to base64 encode our message before transmitting it. This worked fine untill we encountered a diacritical mark (ë, î, ú, etc) in the message we want to encode. Then the webservice we're contacting start throwing errors.

I've created a test script to show the exact problem and to provide a starting point to hopefully find a solution to this issue. I contains a sample for a correctly encoded message and an invalid one.

	declare @source varbinary(max)
,@encoded varchar(max)

--SAMPLE MESSAGES ONE WITH AND ONE WITHOUT DIACRITICAL MARKS
DECLARE @message varchar(max) = 'Dit zijn diekrieten ë, î en ú.';
DECLARE @message2 varchar(max) = 'Dit zijn geen diekrieten e, i en u.';

--KNOW CORRECT BASE64 RESULTS AS CALCULATED WITH THE OPENSSL LIBRARY
DECLARE @message_base64 nvarchar(max) = 'RGl0IHppam4gZGlla3JpZXRlbiDDqywgw64gZW4gw7ou';
DECLARE @message2_base64 nvarchar(max) = 'RGl0IHppam4gZ2VlbiBkaWVrcmlldGVuIGUsIGkgZW4gdS4=';

--BASE64 ENCODE THE MESSAGE WITH DIACRITICAL MARK
--THIS RESULTS IN A INCORRECT BASE64
SET @source = convert(varbinary(max), @message)
SET @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)')

SELECT @encoded as message, @message_base64 as known_correct_base64;

--BASE64 ENCODE THE MESSAGE WITHOUT DIACRITICAL MARK
--THIS RESULTS IN A CORRECT BASE64
SET @source = convert(varbinary(max), @message2)
SET @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)')

SELECT @encoded as message2, @message2_base64 as known_correct_base64;

The way we calculaten the base64 in normally contained in a function. This function was based of a solution found using the powers of google. Unfortunatly it only works without diacritical marks.

When you run the results for @message thought a service like https://emn178.github.io/online-tools/base64_decode.html then it shows the result is not a utf-8 string. 

Who knows how we can adjust for this? We need te have a valida base64 encoded message because otherwise we're not able to process te information to the external servoce.

icon

Best answer by Anne Buit 8 July 2022, 09:18

View original

6 replies

Userlevel 7
Badge +19

Hi Bas,

Searching a bit online I came across a possible solution: https://dba.stackexchange.com/questions/212160/why-do-i-get-incorrect-characters-when-decoding-a-base64-string-to-nvarchar-in-s

The given solution gives a Tabled-valued function that you can use to decode the incoming Base64. It'll return a table with the correct decoded message:

CREATE FUNCTION dbo.ConvertBase64EncodedUTF8ToUTF16LE
(
@Base64EncodedUTF8String VARCHAR(8000)
)
RETURNS TABLE
AS RETURN

SELECT
CONVERT(NVARCHAR(500),
CONVERT(XML,
'<?xml version="1.0" encoding="UTF-8"?>' +
CONVERT(VARCHAR(500),
CONVERT(XML, @Base64EncodedUTF8String)
.value('.','varbinary(max)')
)
)
) AS [DecodedValue];

GO


select *
from dbo.ConvertBase64EncodedUTF8ToUTF16LE('RGl0IHppam4gZGlla3JpZXRlbiDDqywgw64gZW4gw7ou')
-- Dit zijn diekrieten ë, î en ú.

select *
from dbo.ConvertBase64EncodedUTF8ToUTF16LE('RGl0IHppam4gZ2VlbiBkaWVrcmlldGVuIGUsIGkgZW4gdS4=')
-- Dit zijn geen diekrieten e, i en u.

Hope this helps!

Hi Mark,

unfortunatly you example is for decoding a base64 encoded string. We're trying to correctly encode a string. You example will probably be usefull somewhere in the future as we'll probably run into an issue where we need to decode sometime rather sooner then later.

What I need to do is encode the @message string in my example and i already know that @message_base64 is the expected outcome when the @message is utf-8.

 

As far as I understand through reading and googling is that i'm not getting that output because the @message is utf-16 encoded. This isn't a problem as long as there are no diacritical marks in the string (as utf-8 and utf-16 encode those characters exactly the same). But as soon at a diacritical mark is included the encode/decode process fails because the webservice i'm sending it to expects the message to be utf-8 encoded.

Userlevel 6
Badge +4

Hi Bas,

Perhaps you can collate the @message to UTF8. Does the following query pass the test?

declare @data varchar(max) = 'lorem ipsum' collate Latin1_General_100_CI_AI_SC_UTF8

select cast(@data as varbinary(max))
for xml path(''), BINARY BASE64

 

Userlevel 2
Badge +3

My best guess would be to collate to latin1_general_bin2 with @data being nvarchar(max) as input for the casting - this should put out utf16LE into the base64 result.

Userlevel 6
Badge +4

Ok, it turns out that declaring a variable with collation is not enough. You have to actually store it in this collation to have the encoding of utf-8 take effect.

The following example gives the correct results.

declare @source varbinary(max)
,@encoded varchar(max);

-- Create a temp table to store strings as utf8
create table #table_utf8 (
message varchar(max) collate Latin1_General_100_CI_AI_SC_UTF8,
message2 varchar(max) collate Latin1_General_100_CI_AI_SC_UTF8
);

--SAMPLE MESSAGES ONE WITH AND ONE WITHOUT DIACRITICAL MARKS
DECLARE @message varchar(max) = 'Dit zijn diekrieten ë, î en ú.';
DECLARE @message2 varchar(max) = 'Dit zijn geen diekrieten e, i en u.';

--KNOW CORRECT BASE64 RESULTS AS CALCULATED WITH THE OPENSSL LIBRARY
DECLARE @message_base64 nvarchar(max) = 'RGl0IHppam4gZGlla3JpZXRlbiDDqywgw64gZW4gw7ou';
DECLARE @message2_base64 nvarchar(max) = 'RGl0IHppam4gZ2VlbiBkaWVrcmlldGVuIGUsIGkgZW4gdS4=';

-- Store the data in the table
insert into #table_utf8 values (@message, @message2);

--BASE64 ENCODE THE MESSAGE WITH DIACRITICAL MARK
--THIS RESULTS IN A (NOW) CORRECT BASE64
SET @source = convert(varbinary(max), (select message from #table_utf8));
SET @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)');

SELECT @encoded as message, @message_base64 as known_correct_base64;

--BASE64 ENCODE THE MESSAGE WITHOUT DIACRITICAL MARK
--THIS RESULTS IN A CORRECT BASE64
SET @source = convert(varbinary(max), (select message2 from #table_utf8));
SET @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)');

SELECT @encoded as message2, @message2_base64 as known_correct_base64;

drop table #table_utf8;

 

Please do take into account that the solution offered by Anne only works on SQL Server 2019 and up. SInce we’re running 2017 we used an external microservice to solve this issue and are planning an upgrade in the upcoming months.

Reply