Progressively degrading performance saving MT to database

Description

MT performance seems to worsen the longer it continues, for larger documents.
The time it takes to process the translations in a (Hibernate?) request at org.zanata.service.impl.TranslationServiceImpl#translate(...) seems to increase from 500~1000 every use, suggesting there's a cache or leak in this area.

The debugging shows:

02:53:44,330 INFO [stdout] (pool-11-thread-1) Starting batch 0-1000 02:53:44,336 INFO [stdout] (pool-11-thread-1) Sending batch 0-1000 02:53:46,993 INFO [stdout] (pool-11-thread-1) Received response [1000] (2654) 02:53:46,993 INFO [stdout] (pool-11-thread-1) Saving in a batch 02:53:46,993 INFO [stdout] (pool-11-thread-1) Saving batch 0-100 02:53:46,993 INFO [stdout] (pool-11-thread-1) Transaction... 02:53:46,996 INFO [stdout] (pool-11-thread-1) Creating save requests... 02:53:49,518 INFO [stdout] (pool-11-thread-1) Done creating (2522) 02:53:49,519 INFO [stdout] (pool-11-thread-1) Committing translations... 02:53:49,599 INFO [stdout] (pool-11-thread-1) Processing translations... 02:53:56,063 INFO [stdout] (pool-11-thread-1) Done processing (6463) 02:53:56,141 INFO [stdout] (pool-11-thread-1) Done committing translations. 02:53:56,510 INFO [stdout] (pool-11-thread-1) Done transaction 100-100(9517) 02:53:56,510 INFO [stdout] (pool-11-thread-1) Saving batch 100-200 02:53:56,510 INFO [stdout] (pool-11-thread-1) Transaction... 02:53:56,511 INFO [stdout] (pool-11-thread-1) Creating save requests... 02:53:59,692 INFO [stdout] (pool-11-thread-1) Done creating (3180) 02:53:59,692 INFO [stdout] (pool-11-thread-1) Committing translations... 02:53:59,694 INFO [stdout] (pool-11-thread-1) Processing translations... 02:54:06,895 INFO [stdout] (pool-11-thread-1) Done processing (7201) 02:54:06,897 INFO [stdout] (pool-11-thread-1) Done committing translations. 02:54:07,005 INFO [stdout] (pool-11-thread-1) Done transaction 200-200(10495) 02:54:07,005 INFO [stdout] (pool-11-thread-1) Saving batch 200-300 02:54:07,005 INFO [stdout] (pool-11-thread-1) Transaction... 02:54:07,005 INFO [stdout] (pool-11-thread-1) Creating save requests... 02:54:10,214 INFO [stdout] (pool-11-thread-1) Done creating (3209) 02:54:10,214 INFO [stdout] (pool-11-thread-1) Committing translations... 02:54:10,216 INFO [stdout] (pool-11-thread-1) Processing translations... 02:54:18,078 INFO [stdout] (pool-11-thread-1) Done processing (7862) 02:54:18,079 INFO [stdout] (pool-11-thread-1) Done committing translations. 02:54:18,236 INFO [stdout] (pool-11-thread-1) Done transaction 300-300(11231) 02:54:18,236 INFO [stdout] (pool-11-thread-1) Saving batch 300-400 02:54:18,236 INFO [stdout] (pool-11-thread-1) Transaction... 02:54:18,237 INFO [stdout] (pool-11-thread-1) Creating save requests... 02:54:21,858 INFO [stdout] (pool-11-thread-1) Done creating (3621) 02:54:21,858 INFO [stdout] (pool-11-thread-1) Committing translations... 02:54:21,860 INFO [stdout] (pool-11-thread-1) Processing translations... 02:54:31,698 INFO [stdout] (pool-11-thread-1) Done processing (9838) 02:54:31,700 INFO [stdout] (pool-11-thread-1) Done committing translations. 02:54:31,934 INFO [stdout] (pool-11-thread-1) Done transaction 400-400(13698)

Explanation: (^^ means return to)

org.zanata.service.impl.MachineTranslationServiceImpl#addMachineTranslationsToDoc
Sending batch 0-1000 sending a batch of 1000 textflows to MT provider
Received response [ batch size] (time) time taken to receive the result

org.zanata.service.impl.MachineTranslationServiceImpl#saveTranslationsInBatches
Saving in a batch immediately before the batch loop
Saving batch x-y immediately before sublist
Transaction... immediately before try { transaction.run(() ->

org.zanata.service.impl.MachineTranslationServiceImpl#makeUpdateRequestsForBatch
Creating save requests... before process loop
Done creating (time) time taken to create the list of TransUnitUpdateRequest

org.zanata.service.impl.MachineTranslationServiceImpl#saveTranslationsInBatches ^^
Committing translations... immediately before translationService.translate(targetLocale.getLocaleId(), updateRequests);

org.zanata.service.impl.TranslationServiceImpl#translate(org.zanata.common.LocaleId, java.util.List<org.zanata.webtrans.shared.model.TransUnitUpdateRequest>, boolean)
Processing translations... before for (TransUnitUpdateRequest request : translationRequests) {
Done processing (time) time taken to process the requests in loop

org.zanata.service.impl.MachineTranslationServiceImpl#saveTranslationsInBatches ^^
Done committing translations. immediately after translationService.translate
Done transaction 100-100(time) loop iteration finished
Done batches all results processed

Environment

None

Activity

Show:

Former user 31 July 2018 at 08:20
Edited

An "entityManager.clear();" at the end of MachineTranslationServiceImpl#saveTranslationsInBatches seems to work, cutting the increasing cycle from 10+ seconds to 1.5, but breaks when adding attribution at
MachineTranslationServiceImpl#prefillProjectVersionWithMachineTranslation entityManager.merge(doc);

It seems there is a refresh or reload required somewhere, but I don't know where.

Further to this, adding attribution as a new transaction after the first has completed seems risky. If the transaction fails, perhaps being a large transaction based on doc size, there may be a lot of unattributed translations.

Ready for Release

Details

Assignee

Reporter

Labels

Tested Version/s

Components

Sprint

Fix versions

Affects versions

Priority

More fields

Created 31 July 2018 at 03:16
Updated 10 August 2018 at 01:43
Resolved 8 August 2018 at 00:16

Flag notifications