Help get this topic noticed by sharing it on Twitter, Facebook, or email.

ContentNote import failing with Unicode characters - Note can't be saved because it contains HTML tags or unescaped characters

I need to import text containing accented/special characters (e.g. á, ä, ü, ß, etc) into the ContentNote object.

The following statement does load, but it is not Unicode friendly, therefore in Salesforce the accented characters are corrupted (question marks on solid rhombus background)


SELECT
CAST('' AS nchar(18)) AS [ID],
CAST('' AS nvarchar(255)) AS Error,
'Note title' as Title,
CAST('áäüß' AS varbinary(max)) AS Content
INTO ContentNote_Load_1

-- Content field gets to be: 0xE1E4FCDF

exec sf_bulkops 'insert', 'salesforce', 'ContentNote_Load_1'


The following statement, which is supposed to conserve the special characters in the Content field, invariably fails:


SELECT
CAST('' AS nchar(18)) AS [ID],
CAST('' AS nvarchar(255)) AS Error,
'Note title' as Title,
CAST(N'áäüß' AS varbinary(max)) AS Content
INTO ContentNote_Load_2

-- Content field gets to be: 0xE100E400FC00DF00

exec sf_bulkops 'insert', 'salesforce', 'ContentNote_Load_2'

The error message is always: "Note can't be saved because it contains HTML tags or unescaped characters that are not allowed in a Note.".

I am aware of the need to escape special characters like < > & ' ", and also to replace line breaks with <br>, but in this case these characters are not present in the source text. Ref: SFDC help article

Here on the forum, I also read a thread where it was suggested to convert to varchar before converting to varbinary, but these are also corrupting the source text.

Did anybody have the same issue? How did you managed to get around it?

My DBamp version is a bit old V3.1.1, but if someone could confirm that a newer version doesn't have this, I will happily upgrade.

Any suggestions greatly appreciated. Many thanks!
1 person has
this problem
+1
Reply
  • If you use the salesforce API, create a note with those characters, and do a select from ContentNote where Id='your new id', what does the Content field look like ?
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly happy, confident, thankful, excited sad, anxious, confused, frustrated indifferent, undecided, unconcerned

  • If I create the ContentNote record in the application, I am able to query by the record ID. Unfortunately the Content field is always returned as NULL.

    select * from openquery(salesforce, 'select * from ContentNote where id = ''0691w000000MVfO''')


    OR

    select * from salesforce...ContentNote where id = '0691w000000MVfO'


    ... both result in empty Content column.



    However, if I run the same query in workbench, I can see the Content field and also a TextPreview field, which is not available with the linked server.



    So with this I got one step further, to the vexing stage of encodings: I can see that the Content is base64 encoded, which I missed somehow during the research.

    Salesforce apparently uses UTF-8 encoding for the text, which is encoded to base64 in the workbench output. My guess is that the import will also need base64 encoded UTF-8. On SQL server, my content is in nvarchar(max) fields, which are UTF-16. And the road from UTF-16 to UTF-8 to base64 is about as pleasant as falling down a flight of stairs...
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. kidding, amused, unsure, silly happy, confident, thankful, excited sad, anxious, confused, frustrated indifferent, undecided, unconcerned