REST APIs generate illegal XML when files contain invalid characters like 0x1b, 0x08
Description
Environment
Activity

Former user 20 April 2018 at 00:24
We are closing all the old issues to have more clarity in our backlog for Zanata project. Feel free to re-open or leave a comment if you require our attention on your Jira.

Bugzilla Migration 31 July 2015 at 01:47
Sean Flanigan commented on 2014-09-17 22:00:06 -0400:
Just for reference, the workaround was to download the affected document from the web interface (fortunately, it was a PO file, so it could be downloaded that way) and search for the offending character:
grep --color='auto' -P -n '\x08' *.po

Bugzilla Migration 31 July 2015 at 01:47
Chester Cheng commented on 2014-09-17 21:56:50 -0400:
I got a similar error, because of a hidden character in the translation.
==========
$ mvn org.zanata:zanata-maven-pluginull -Dzanata.encodeTabs=false
(...)
[ERROR] Operation failed: javax.xml.bind.UnmarshalException
with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 35273; An invalid XML character (Unicode: 0x8) was found in the element content of the document.]
To retry from the last document, please set the following option(s):
-Dzanata.fromDoc="Memory"
.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19.387 s
[INFO] Finished at: 2014-09-18T11:44:11+10:00
[INFO] Final Memory: 19M/170M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.zanata:zanata-maven-plugin:3.3.2ull (default-cli) on project standalone-pom: Zanata mojo exception: javax.xml.bind.UnmarshalException
[ERROR] - with linked exception:
[ERROR] [org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 35273; An invalid XML character (Unicode: 0x8) was found in the element content of the document.]
[ERROR] -> [Help 1]
[ERROR]

Bugzilla Migration 31 July 2015 at 01:47
Sean Flanigan commented on 2014-04-27 21:31:31 -0400:
Good idea. Yes, it's worth a try.
JSON can probably escape any problematic characters. There may be portability issues with some characters, but we should be able to choose implementations which are compatible:
http://stackoverflow.com/a/8676021/14379
https://en.wikipedia.org/wiki/JSON#Data_portability_issues
http://www.bennadel.com/blog/2576-testing-which-ascii-characters-break-json-javascript-object-notation-parsing.htm

Bugzilla Migration 31 July 2015 at 01:47
Patrick Huang commented on 2014-04-27 19:11:40 -0400:
if we use json instead will it help?
Details
Details
Assignee

Reporter

Labels
Components
Priority

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31
Build Identifier:
When using zanata maven client to push or pull, if one text flow contains Unicode character: 0x1b, resteasy marshalling/unmarshalling will fail. But upload through server UI will not suffer from this problem.
Reproducible: Always
Steps to Reproduce:
1. create a gettext project/version
2. mvn zanataush
Actual Results:
org.jboss.resteasy.plugins.providers.jaxb.JAXBUnmarshalException: javax.xml.bind.UnmarshalException
with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 325; columnNumber: 7; An invalid XML character (Unicode: 0x1b) was found in the element content of the document.]
Expected Results:
push ok
Server resteasy version is different from client.