Create a REST endpoint that accepts files for pushing.

Description

Description of problem:

It would be useful to have a REST endpoint that just accepts files, and the server takes care of processing and storing the file.

This would also allow for a more thin and streamlined client.

Subtasks

ZNTA-1006

Update {Source,Translation}DocumentUpload to support all DocumentTypes including XLIFF, Properties

Linked issues

relates to

ZNTA-1031

Overhaul Zanata REST API

ZNTA-242

RFE: Remove Project Type, Zanata should recognise file types individually

Activity

Show:

Former user 20 April 2018 at 00:23

We are closing all the old issues to have more clarity in our backlog for Zanata project. Feel free to re-open or leave a comment if you require our attention on your Jira.

Sean Flanigan 14 April 2016 at 04:24

I have copied the last few comments to the new issue ZNTA-1031, since they really relate to overhauling the REST API. Enhancing the FILE endpoint was completed last July: https://bugzilla.redhat.com/show_bug.cgi?id=1186972

Sean Flanigan 31 March 2016 at 06:17

As for versioning, we definitely need to bump the API version when making breaking changes. But it doesn't seem like there is a single ideal solution. Here are several imperfect ones: http://www.troyhunt.com/2014/02/your-api-versioning-is-wrong-which-is.html

The most RESTful solution for purists is probably to create a specific versioned vendor media type for each one of our endpoints (eg application/vnd.zanata.sourcedoc.v2+json), but it will take discipline to use the correct media type consistently. (Call this option 0, although it is similar to Troy's option 3.)

If the caller doesn't specify a media type, what will they get?

For our current endpoints, they should probably get something compatible with the API as it is now (pre-versioning), at least while we continue support for the non-versioned requests.
But for new "versioned" endpoints like this one, if the media type is missing, should they get 406 Not Acceptable (for GET without "Accept") / 415 Unsupported Media Type (for PUT/POST without "Content-Type")? What if both headers are missing on a POST?

A more common variant is to define a single top-level media types (eg application/vnd.zanata.v2+json) used by all endpoints, but where the current "v" could be different for each endpoint. (This seems to be Troy's option 3, but note that the returned media type doesn't match the requested media type: http://www.troyhunt.com/2014/02/your-api-versioning-is-wrong-which-is.html#comment-1938364813 )

But I don't think this is very pure, because if you want to know what "application/vnd.zanata.v2+json" looks like, I must first ask you "where are you using it?".

An even less pure approach, but simpler and perhaps more pragmatic, is to ignore media types and simply append /v1, /v2 to the endpoint as it evolves. (Troy's option 1.) This should work fine, as long as the endpoint has a compatible URL structure.

Perhaps we should give the purist solution a try first, but consider switching to URL versioning if it proves too cumbersome.

Sean Flanigan 31 March 2016 at 05:47

If we want uploads to go quickly, or if we don't want users with slow HTTP connections monopolising the database pool, we should consider streaming each upload directly to a (single) temporary file, and then initiating a background process to process it afterwards. As soon as the upload finishes saving, we could return 202 Accepted (indicating that processing has started) with a Location header pointing to an upload job which can be polled for completion/success/failure status.

If we do that, we will need to implement some form of queue push-back, so that a user won't push 100 files, notice that they're not visible yet, and decide to push them again and again. (The REST clients we control should probably wait for processing to complete successfully before pushing more files, but we also need to think about rogue third-party REST clients.)

If we want to implement a streaming parser for certain file types, I think we should make them separate stories.

Okapi's API seems to be compatible with streaming, but OkapiFilterAdapter may need some changes (partly because of our HTextFlow/Target datamodel), and individual Okapi filter implementations may or may not implement streaming.
For Gettext, the JGettext library may need to be overhauled (or replaced, perhaps by Okapi?) to support streaming.
For Properties, well, it might not be worth it. Keeping virtually 100% compatibility with Java Properties by using essentially the same implementation may be more important than supporting improbably large files.

Carlos Munoz 31 March 2016 at 04:27

A few points to keep in mind:

These endpoints (one for source and one for translations?) should be used to accept all accepted file types.
Ideally they would use some kind of streaming to avoid having to store files on the server. On some cases, storing temporary files will be inevitable, so we might need to find a way for the system to clean up.
Be scalable both in terms of large files, as well as multiple calls to the service (e.g. small memory footprint wherever possible)

Technical considerations:

Use headers to control the API version. The Content-Type header is a good candidate for this.
Use json-api as the response format.
Use a an asynchronous approach where the file is submitted and the response contains information about how to query the status of the upload.
... or, use streaming to make sure there is no web server disconnect. This might be more difficult as not all formats allow for streaming.
Have a migration strategy for old projects.