Find illegal XML characters in your database

  • 19 May 2020
  • 1 reply
  • 812 views

This is an Archived topic. The solution is available in the Thinkstore inside the Software Factory.

Goal

Many application have functionality for importing or exporting data in XML format. Sometimes you will get an error about an illegal XML character:

XML parsing: line 1, character 2967663, illegal xml character

And then you will have to try to find that character, which is very hard. This solution can help you out.

Solution

This solution creates a stored procedure search_illegal_xml_characters which searches your entire database for illegal XML characters. It is also possible to search in one specific table.

The procedures uses a function, remove_illegal_characters_for_xml, which is also created in this solution. Important: this function does NOT remove illegal characters from your database. It only removes those characters from a string.

Dynamic model add_search_illegal_xml_characters

This dynamic model code creates stored procedure search_illegal_xml_characters and function remove_illegal_characters_for_xml.

If you want to use this procedure as task, create your own task which start the procedure.

Control procedure remove_illegal_characters_for_xml

This control procedure adds a template to function remove_illegal_characters_for_xml.

Template remove_illegal_characters_for_xml

This template provides code for function remove_illegal_characters_for_xml.

Control procedure proc_search_illegal_xml_characters

This control procedure adds three templates to procedure search_illegal_xml_characters.

Template declarations

This template declares needed variables.

Template collect_results

This template searches in a field for an illegal XML character.

Template print_results

This template prints the results to the screen.

In the attachment you will find the code of this solution.


This topic has been closed for comments

1 reply

For future reference, when using this functionality:

USE [PROJECT_SF_ONT]
GO
-- I already know `subroutine_parmtr` has a problem; IAM complained about syncing that table
DECLARE @RC int
DECLARE @project_id [dbo].[project_id] = 'MY_PROG'
DECLARE @project_vrs_id [dbo].[project_vrs_id] = 'TROUBLED_VERSION'
DECLARE @tab_id [dbo].[tab_id] = 'subroutine_parmtr'

EXECUTE @RC = [dbo].[task_search_illegal_xml_characters]
@project_id
,@project_vrs_id
,@tab_id
GO

-- gives a list of subroutine_parmtr records that have a problem, and gives the specific field
-- U+001F:

-- now, for some context, print that field with and without the forbidden characters
select dbo.replace_illegal_characters (alias_subroutine_parmtr_id)
, alias_subroutine_parmtr_id,*
from subroutine_parmtr sp
where project_vrs_id = 'TROUBLED_VERSION'
and dbo.replace_illegal_characters (alias_subroutine_parmtr_id)<>alias_subroutine_parmtr_id

This is what a forbidden character ( U+001F in this case) might look like:” “ (ok, that does not help a lot, it gets filtered out when saving the post)

I have had illegal characters turn up in a variety op places. In the last case they were in the not used and otherwise empty ‘alias_subroutine_parameter_id'.

After editing the record in the SF, ctrl-a and delete the value, the illegal character did not show up when running the query.

When all is clear, the result ends in:


<msg id="no_illegal_xml_character_found"></msg>