Overhaul Zanata REST API
Description
Activity
Sean Flanigan 14 April 2016 at 04:32
@Carlos Munoz Do you want to add anything to the description?
Sean Flanigan 14 April 2016 at 04:21
@Sean Flanigan added a comment - 31/Mar/16 4:17 PM
As for versioning, we definitely need to bump the API version when making breaking changes. But it doesn't seem like there is a single ideal solution. Here are several imperfect ones: http://www.troyhunt.com/2014/02/your-api-versioning-is-wrong-which-is.html
The most RESTful solution for purists is probably to create a specific versioned vendor media type for each one of our endpoints (eg application/vnd.zanata.sourcedoc.v2+json), but it will take discipline to use the correct media type consistently. (Call this option 0, although it is similar to Troy's option 3.)
If the caller doesn't specify a media type, what will they get?
For our current endpoints, they should probably get something compatible with the API as it is now (pre-versioning), at least while we continue support for the non-versioned requests.
But for new "versioned" endpoints like this one, if the media type is missing, should they get 406 Not Acceptable (for GET without "Accept") / 415 Unsupported Media Type (for PUT/POST without "Content-Type")? What if both headers are missing on a POST?
A more common variant is to define a single top-level media types (eg application/vnd.zanata.v2+json) used by all endpoints, but where the current "v" could be different for each endpoint. (This seems to be Troy's option 3, but note that the returned media type doesn't match the requested media type: http://www.troyhunt.com/2014/02/your-api-versioning-is-wrong-which-is.html#comment-1938364813 )
But I don't think this is very pure, because if you want to know what "application/vnd.zanata.v2+json" looks like, I must first ask you "where are you using it?".
An even less pure approach, but simpler and perhaps more pragmatic, is to ignore media types and simply append /v1, /v2 to the endpoint as it evolves. (Troy's option 1.) This should work fine, as long as the endpoint has a compatible URL structure.
Perhaps we should give the purist solution a try first, but consider switching to URL versioning if it proves too cumbersome.
Sean Flanigan 14 April 2016 at 04:21
@Sean Flanigan added a comment - 31/Mar/16 3:47 PM
If we want uploads to go quickly, or if we don't want users with slow HTTP connections monopolising the database pool, we should consider streaming each upload directly to a (single) temporary file, and then initiating a background process to process it afterwards. As soon as the upload finishes saving, we could return 202 Accepted
(indicating that processing has started) with a Location
header pointing to an upload job which can be polled for completion/success/failure status.
If we do that, we will need to implement some form of queue push-back, so that a user won't push 100 files, notice that they're not visible yet, and decide to push them again and again. (The REST clients we control should probably wait for processing to complete successfully before pushing more files, but we also need to think about rogue third-party REST clients.)
If we want to implement a streaming parser for certain file types, I think we should make them separate stories.
Okapi's API seems to be compatible with streaming, but OkapiFilterAdapter may need some changes (partly because of our HTextFlow/Target datamodel), and individual Okapi filter implementations may or may not implement streaming.
For Gettext, the JGettext library may need to be overhauled (or replaced, perhaps by Okapi?) to support streaming.
For Properties, well, it might not be worth it. Keeping virtually 100% compatibility with Java Properties by using essentially the same implementation may be more important than supporting improbably large files.
Sean Flanigan 14 April 2016 at 04:20
@Carlos Munoz added a comment - 31/Mar/16 2:27 PM
A few points to keep in mind:
These endpoints (one for source and one for translations?) should be used to accept all accepted file types.
Ideally they would use some kind of streaming to avoid having to store files on the server. On some cases, storing temporary files will be inevitable, so we might need to find a way for the system to clean up.
Be scalable both in terms of large files, as well as multiple calls to the service (e.g. small memory footprint wherever possible)
Technical considerations:
Use headers to control the API version. The Content-Type header is a good candidate for this.
Use json-api as the response format.
Use a an asynchronous approach where the file is submitted and the response contains information about how to query the status of the upload.
... or, use streaming to make sure there is no web server disconnect. This might be more difficult as not all formats allow for streaming.
Have a migration strategy for old projects.
Overhaul Zanata's REST API for file pushing/pulling - improve performance, scalability, maintainability.
In the case of source files, instead of having the client split the file into pieces which are saved to the database before being joined and written to the file system, we may want to stream the file directly to a temp directory, and then perform processing asynchronously.
(Split out from ZNTA-491).