REST APIs generate illegal XML when files contain invalid characters like 0x1b, 0x08

Description

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31
Build Identifier:

When using zanata maven client to push or pull, if one text flow contains Unicode character: 0x1b, resteasy marshalling/unmarshalling will fail. But upload through server UI will not suffer from this problem.

Reproducible: Always

Steps to Reproduce:
1. create a gettext project/version
2. mvn zanataush

Actual Results:
org.jboss.resteasy.plugins.providers.jaxb.JAXBUnmarshalException: javax.xml.bind.UnmarshalException

  • with linked exception:
    [org.xml.sax.SAXParseException; lineNumber: 325; columnNumber: 7; An invalid XML character (Unicode: 0x1b) was found in the element content of the document.]

Expected Results:
push ok

Server resteasy version is different from client.

Environment

None

Activity

Show:

Former user 20 April 2018 at 00:24

We are closing all the old issues to have more clarity in our backlog for Zanata project. Feel free to re-open or leave a comment if you require our attention on your Jira.

Bugzilla Migration 31 July 2015 at 01:47

Sean Flanigan commented on 2014-09-17 22:00:06 -0400:

Just for reference, the workaround was to download the affected document from the web interface (fortunately, it was a PO file, so it could be downloaded that way) and search for the offending character:

grep --color='auto' -P -n '\x08' *.po

Bugzilla Migration 31 July 2015 at 01:47

Chester Cheng commented on 2014-09-17 21:56:50 -0400:

I got a similar error, because of a hidden character in the translation.

==========
$ mvn org.zanata:zanata-maven-pluginull -Dzanata.encodeTabs=false
(...)
[ERROR] Operation failed: javax.xml.bind.UnmarshalException

  • with linked exception:
    [org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 35273; An invalid XML character (Unicode: 0x8) was found in the element content of the document.]

To retry from the last document, please set the following option(s):

-Dzanata.fromDoc="Memory"

.
[INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------ [INFO] Total time: 19.387 s
[INFO] Finished at: 2014-09-18T11:44:11+10:00
[INFO] Final Memory: 19M/170M
[INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.zanata:zanata-maven-plugin:3.3.2ull (default-cli) on project standalone-pom: Zanata mojo exception: javax.xml.bind.UnmarshalException
[ERROR] - with linked exception:
[ERROR] [org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 35273; An invalid XML character (Unicode: 0x8) was found in the element content of the document.]
[ERROR] -> [Help 1]
[ERROR]

Bugzilla Migration 31 July 2015 at 01:47

Sean Flanigan commented on 2014-04-27 21:31:31 -0400:

Good idea. Yes, it's worth a try.

JSON can probably escape any problematic characters. There may be portability issues with some characters, but we should be able to choose implementations which are compatible:

http://stackoverflow.com/a/8676021/14379
https://en.wikipedia.org/wiki/JSON#Data_portability_issues
http://www.bennadel.com/blog/2576-testing-which-ascii-characters-break-json-javascript-object-notation-parsing.htm

Bugzilla Migration 31 July 2015 at 01:47

Patrick Huang commented on 2014-04-27 19:11:40 -0400:

if we use json instead will it help?

Details

Assignee

Reporter

Labels

Components

Priority

More fields

Created 31 July 2015 at 01:47
Updated 20 April 2018 at 01:36