Skip to main content
Solved

Base64 encoding


Bas
Vanguard
Forum|alt.badge.img+3

I'm having some trouble getting things to work. For a webservice we need to base64 encode our message before transmitting it. This worked fine untill we encountered a diacritical mark (ë, î, ú, etc) in the message we want to encode. Then the webservice we're contacting start throwing errors.

I've created a test script to show the exact problem and to provide a starting point to hopefully find a solution to this issue. I contains a sample for a correctly encoded message and an invalid one.

	declare @source varbinary(max)
			,@encoded varchar(max)

--SAMPLE MESSAGES ONE WITH AND ONE WITHOUT DIACRITICAL MARKS
	DECLARE @message varchar(max) = 'Dit zijn diekrieten ë, î en ú.';
	DECLARE @message2 varchar(max) = 'Dit zijn geen diekrieten e, i en u.';

--KNOW CORRECT BASE64 RESULTS AS CALCULATED WITH THE OPENSSL LIBRARY
	DECLARE @message_base64 nvarchar(max) = 'RGl0IHppam4gZGlla3JpZXRlbiDDqywgw64gZW4gw7ou';
	DECLARE @message2_base64 nvarchar(max) = 'RGl0IHppam4gZ2VlbiBkaWVrcmlldGVuIGUsIGkgZW4gdS4=';

--BASE64 ENCODE THE MESSAGE WITH DIACRITICAL MARK
--THIS RESULTS IN A INCORRECT BASE64
    SET @source = convert(varbinary(max), @message)
    SET @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)')

    SELECT @encoded as message, @message_base64 as known_correct_base64;  

--BASE64 ENCODE THE MESSAGE WITHOUT DIACRITICAL MARK
--THIS RESULTS IN A CORRECT BASE64
    SET @source = convert(varbinary(max), @message2)
    SET @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)')

    SELECT @encoded as message2, @message2_base64 as known_correct_base64;  

The way we calculaten the base64 in normally contained in a function. This function was based of a solution found using the powers of google. Unfortunatly it only works without diacritical marks.

When you run the results for @message thought a service like https://emn178.github.io/online-tools/base64_decode.html then it shows the result is not a utf-8 string. 

Who knows how we can adjust for this? We need te have a valida base64 encoded message because otherwise we're not able to process te information to the external servoce.

Best answer by Anne Buit

Ok, it turns out that declaring a variable with collation is not enough. You have to actually store it in this collation to have the encoding of utf-8 take effect.

The following example gives the correct results.

declare @source varbinary(max)
		,@encoded varchar(max);

-- Create a temp table to store strings as utf8
create table #table_utf8 (
    message varchar(max) collate Latin1_General_100_CI_AI_SC_UTF8,
    message2 varchar(max) collate Latin1_General_100_CI_AI_SC_UTF8
);

--SAMPLE MESSAGES ONE WITH AND ONE WITHOUT DIACRITICAL MARKS
DECLARE @message varchar(max) = 'Dit zijn diekrieten ë, î en ú.';
DECLARE @message2 varchar(max) = 'Dit zijn geen diekrieten e, i en u.';

--KNOW CORRECT BASE64 RESULTS AS CALCULATED WITH THE OPENSSL LIBRARY
DECLARE @message_base64 nvarchar(max) = 'RGl0IHppam4gZGlla3JpZXRlbiDDqywgw64gZW4gw7ou';
DECLARE @message2_base64 nvarchar(max) = 'RGl0IHppam4gZ2VlbiBkaWVrcmlldGVuIGUsIGkgZW4gdS4=';

-- Store the data in the table
insert into #table_utf8 values (@message, @message2);

--BASE64 ENCODE THE MESSAGE WITH DIACRITICAL MARK
--THIS RESULTS IN A (NOW) CORRECT BASE64
SET @source = convert(varbinary(max), (select message from #table_utf8));
SET @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)');

SELECT @encoded as message, @message_base64 as known_correct_base64;  

--BASE64 ENCODE THE MESSAGE WITHOUT DIACRITICAL MARK
--THIS RESULTS IN A CORRECT BASE64
SET @source = convert(varbinary(max), (select message2 from #table_utf8));
SET @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)');

SELECT @encoded as message2, @message2_base64 as known_correct_base64;

drop table #table_utf8;

 

View original
Did this topic help you find an answer to your question?
This topic has been closed for comments

Mark Jongeling
Administrator
Forum|alt.badge.img+23

Hi Bas,

Searching a bit online I came across a possible solution: https://dba.stackexchange.com/questions/212160/why-do-i-get-incorrect-characters-when-decoding-a-base64-string-to-nvarchar-in-s

The given solution gives a Tabled-valued function that you can use to decode the incoming Base64. It'll return a table with the correct decoded message:

CREATE FUNCTION dbo.ConvertBase64EncodedUTF8ToUTF16LE
(
  @Base64EncodedUTF8String VARCHAR(8000)
)
RETURNS TABLE
AS RETURN

    SELECT 
        CONVERT(NVARCHAR(500),
                CONVERT(XML,
                        '<?xml version="1.0" encoding="UTF-8"?>' +
                        CONVERT(VARCHAR(500),
                                CONVERT(XML, @Base64EncodedUTF8String)
                                  .value('.','varbinary(max)')
                               )
                       )
               ) AS [DecodedValue];

GO


select *
from dbo.ConvertBase64EncodedUTF8ToUTF16LE('RGl0IHppam4gZGlla3JpZXRlbiDDqywgw64gZW4gw7ou')
-- Dit zijn diekrieten ë, î en ú.

select *
from dbo.ConvertBase64EncodedUTF8ToUTF16LE('RGl0IHppam4gZ2VlbiBkaWVrcmlldGVuIGUsIGkgZW4gdS4=')
-- Dit zijn geen diekrieten e, i en u.

Hope this helps!


Bas
Vanguard
Forum|alt.badge.img+3
  • Vanguard
  • July 7, 2022

Hi Mark,

unfortunatly you example is for decoding a base64 encoded string. We're trying to correctly encode a string. You example will probably be usefull somewhere in the future as we'll probably run into an issue where we need to decode sometime rather sooner then later.

What I need to do is encode the @message string in my example and i already know that @message_base64 is the expected outcome when the @message is utf-8.

 

As far as I understand through reading and googling is that i'm not getting that output because the @message is utf-16 encoded. This isn't a problem as long as there are no diacritical marks in the string (as utf-8 and utf-16 encode those characters exactly the same). But as soon at a diacritical mark is included the encode/decode process fails because the webservice i'm sending it to expects the message to be utf-8 encoded.


Anne Buit
Community Manager
Forum|alt.badge.img+5
  • Community Manager
  • July 7, 2022

Hi Bas,

Perhaps you can collate the @message to UTF8. Does the following query pass the test?

declare @data varchar(max) = 'lorem ipsum' collate Latin1_General_100_CI_AI_SC_UTF8

select cast(@data as varbinary(max)) 
for xml path(''), BINARY BASE64 

 


Ricky
Superhero
Forum|alt.badge.img+8
  • Superhero
  • July 7, 2022

My best guess would be to collate to latin1_general_bin2 with @data being nvarchar(max) as input for the casting - this should put out utf16LE into the base64 result.


Anne Buit
Community Manager
Forum|alt.badge.img+5
  • Community Manager
  • July 8, 2022

Ok, it turns out that declaring a variable with collation is not enough. You have to actually store it in this collation to have the encoding of utf-8 take effect.

The following example gives the correct results.

declare @source varbinary(max)
		,@encoded varchar(max);

-- Create a temp table to store strings as utf8
create table #table_utf8 (
    message varchar(max) collate Latin1_General_100_CI_AI_SC_UTF8,
    message2 varchar(max) collate Latin1_General_100_CI_AI_SC_UTF8
);

--SAMPLE MESSAGES ONE WITH AND ONE WITHOUT DIACRITICAL MARKS
DECLARE @message varchar(max) = 'Dit zijn diekrieten ë, î en ú.';
DECLARE @message2 varchar(max) = 'Dit zijn geen diekrieten e, i en u.';

--KNOW CORRECT BASE64 RESULTS AS CALCULATED WITH THE OPENSSL LIBRARY
DECLARE @message_base64 nvarchar(max) = 'RGl0IHppam4gZGlla3JpZXRlbiDDqywgw64gZW4gw7ou';
DECLARE @message2_base64 nvarchar(max) = 'RGl0IHppam4gZ2VlbiBkaWVrcmlldGVuIGUsIGkgZW4gdS4=';

-- Store the data in the table
insert into #table_utf8 values (@message, @message2);

--BASE64 ENCODE THE MESSAGE WITH DIACRITICAL MARK
--THIS RESULTS IN A (NOW) CORRECT BASE64
SET @source = convert(varbinary(max), (select message from #table_utf8));
SET @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)');

SELECT @encoded as message, @message_base64 as known_correct_base64;  

--BASE64 ENCODE THE MESSAGE WITHOUT DIACRITICAL MARK
--THIS RESULTS IN A CORRECT BASE64
SET @source = convert(varbinary(max), (select message2 from #table_utf8));
SET @encoded = cast('' as xml).value('xs:base64Binary(sql:variable("@source"))', 'varchar(max)');

SELECT @encoded as message2, @message2_base64 as known_correct_base64;

drop table #table_utf8;

 


Bas
Vanguard
Forum|alt.badge.img+3
  • Vanguard
  • July 25, 2022

Please do take into account that the solution offered by Anne only works on SQL Server 2019 and up. SInce we’re running 2017 we used an external microservice to solve this issue and are planning an upgrade in the upcoming months.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings