Badly formed HTML in Aras FormattedText Fields

オフライン

I have the following scenario.

I have a report action bound to an item type which performs a REST call to the Microsoft SSRS Report Server with a delimited set of IDs.

The RDL Report performs an SQL query against a fixed ItemType to retrieve the meta-data for the given IDs.

Problem #1: 

Several of the fields are of type FormattedText - although the relevant value may only be a few lines of text, Aras packs so much HTML and JavaScript around them that these Fields can exceed 100k characters.  If generating an XLS report, there is a hard limit on the length of any given cell (32767 characters) - if this limit is exceeded, the report blows up with a HTTP Response Code error.

Potential Solution #1.

I wrote a simple SQL Function to convert the formatted text field string to XML and back again to a NVARCHAR(max) which effectively strips all XML elements from the text...the *magic* happens in this statement.

  -- convert to XML and back to VARCHAR(max) again
  SET @result = CONVERT(XML, @result , 1).value('.', 'NVARCHAR(max)');

Problem #2:

In several data sets the FormattedText fields contain HTML string content which is not valid XML.   In this case, the conversion function blows up.

Examples:

  • undefined namespace prefixes:  '<o:p>...</o:p>'  (note: the prefix can vary)
  • invalid attribute definitions:  '<br \="" ... >

This means I have to identify and correct each special case with time-consuming REPLACE(Column, Find, Replace) calls - a few of those on 100k+ character strings starts to blow up the query times.  Only when the string is *clean* can I try to convert it to XML.  For example,

  SET @result = REPLACE(REPLACE(@result, '<o:p','<p'), '</o:p', '</p');
  SET @result = REPLACE(REPLACE(@result, '<u5:p','<p'), '</u5:p', '</p');
  SET @result = REPLACE(@result, '\=""', '');

I could go and fix each *special case* with one-time SQL scripts, but I cannot *guarantee* that Aras will not write new fields with invalid XML.

So my question is can Aras take a look at the formatted fields and *guarantee* that they will no longer write poorly-formed XML(HTML).  Maybe they could provide an out-of-the-box SQL functionality to make it possible properly prepare FormattedText fields for export.

NOTE: Maybe this has already been resolved - our current customer project is still using Aras 11 SP9. 
In that case, we just need to clean the old data and we are good to go.

Parents
  • Hi JWT007,

    Are you seeing this behavior on standard ItemTypes? If so, could you specify which one(s), so I can take a look at this in a more current version to see if this is still present?

    Chris

  • Hi Christopher, thanks for the quick response.

    Unfortunately (if I am not mistaken) the only OOTB FormattedText property in the Aras InnovatorSolutions database is on the "Help" ItemType - which is empty by default.

    So while I am sure I could fudge some broken data :) I am not sure that would prove anything.

    In our customer project we have some item types that rely heavily on the FormattedText fields - but all in custom item-types.

  • I see. I set up a custom ItemType with a formatted text property then and tested this in a local 12.0 SP4 instance. While I'm not seeing anything unexpected, I'm also testing it with relatively simple data.

    Are you able to elaborate on how the formatted text field is being used and what kind of data you're trying to retrieve? Are you trying to access a list of values stored inside of an HTML table for instance?

    Chris

Reply
  • I see. I set up a custom ItemType with a formatted text property then and tested this in a local 12.0 SP4 instance. While I'm not seeing anything unexpected, I'm also testing it with relatively simple data.

    Are you able to elaborate on how the formatted text field is being used and what kind of data you're trying to retrieve? Are you trying to access a list of values stored inside of an HTML table for instance?

    Chris

Children
  • Nothing special some fields have a bold header .. few are more than a few lines long.

    Here an example (have to obfuscate a bit to not get in trouble with the customer :P)

    To be implemented by  FOO BAR team for FOO-XYZ (ref. ABC_CFG_001-PTF Configuration management throught MMI).
    To be implemented by FOO L2 team for FOO-AAA.
    FOO-AAA is not operational yet and not handed over to OPS, but must be kept aligned to FOO-XYZ. 

    In the database is the following:

    <html> <head></head> <body > <span lang="EN-GB" style="mso-ansi-language:EN-GB">To be implemented by <span style="mso-spacerun:yes"> </span>FOO BAR team for FOO-XYZ (ref. ABC_CFG_001-PTF Configuration management throught MMI).<br />To be implemented by FOO L2 team for FOO-AAA.<br />FOO-AAA is not operational yet and not handed over to OPS, but must be kept aligned to FOO-XYZ. <o:p></o:p></span></body> </html>

    In this case, the problem is the undefined namespace prefix 'o'  ("<o:p></o:p>")

  • Thanks for the clarification. The trailing <o:p></o:p> at the end looks like a empty paragraph. I tried reproducing this in my 12.0 SP4 instance also trying to add the empty paragraph at the end, and I'm not seeing the same behavior. The HTML itself looks more condensed though this could be because I don't have multiple languages enabled in my environment. Additionally, it seems any trailing whitespace is trimmed from the field before being saved.

    I'd argue your intuition was correct that this has already been addressed in more recent versions. It's difficult to say though. As you're more familiar with the exact data causing the issue, I'd be interested to see if you could reproduce this in a newer instance if you get the chance.

    As a separate matter, I feel there might be better ways to store this data than a formatted text field. If the FOO BAR team or FOO-XYZ are actual items in the database, using a formatted text field limits the ability to actually link these Items where they're being referenced. Would it make more sense to store this data as a relationship? I could see this being a relationship called Implemented By that has an item property to store the Team doing the implementation, an item property to store who the implementation is being done for, and a text property to store any additional comments. As a customer project with existing data, I understand this might be out of your hands, but I could see it making life easier down the line. :) 

    Chris

  • Hi Chris,  thanks for checking on this :).

    The occurrences of this problem were a few records in ca. 21k rows.   This table has about 6 formatted fields...and the problem(s) occurred in each of the columns but on different records - *when* exactly the bad data was introduced I can't say... the modified date doesn't help me because I think if these fields do not change, they are probably saved as-is on updates.

    In this case, the fields contain free-form data (fallback, implementation, rollback plans).  

    To be honest, the customer would be ecstatic if they could create/edit this data in MS Word and copy/paste it with formatting into these fields. :) ...but they are making do with the Aras Editor at the moment. 

    I opened the question/discussion because I thought it might be good if Aras takes a look at how a "<o:p></o:p>" even happens in a document that otherwise has no namespace prefixes.  If they feel that a namespace prefix is required then it should also be ensured that the corresponding namespace definition is added to the element or one of its parents.

    Generally the problem of exporting Aras FormattedText fields to Aras is the bigger problem.  You can very quickly run into the problem of field-length restrictions...some of the fields I checked had large Javascript blocks packed into the field data.

    I guess another option would be a hidden backing VARCHAR field which gets populated with a non-formatted copy of the field value on create/save... from the client/server side I would have more options when converting the field to text than I do from the database side.

    Cheers,

    Jeff