Details
-
Suggestion
-
Status: Planned for version
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Customers with a high number of imports suffer from resource starvation on the Jira instance and, eventually, wrong test results.
Context
Xray uses aggregated results for calculations, i.e., when importing the second result, Xray will compare it with the first and save the results; when importing the 45th, Xray will compare it with the aggregated 44 previous results and save the final result. This works fine unless customers do actions that destroy the aggregate results, like changing the fixversion or the test environment of test executions that were already part of the aggregated results, making it impossible to trust it and causing Xray have to recalculate everything one by one for the Tests affected. This kind of behavior (changing fixversion or test environment) is a lot more frequent than expected.
For instance, a customer creates a Test Execution with no fixversion, and executes it to "EXECUTING", later, their script searches for the executions in "EXECUTING" status, changes the fixversion, and sets the final results. This means that Xray would have destroyed the aggregated result for each test associated with these executions.
The consequence is that the Jira instance suffers from starvation, and a lot of times, Xray or the instance itself is restated by the Jira admins, having Xray lose the current work and thus causing wrong Test results.
Main complaints
Users complain about resource starvation caused by Xray;
Users complain about Test Plans with wrong results;
Usually, the causes are some or all of the following:
- Customers with up to 120 imports per day, sometimes 5-10 in a five-minute span ( and sometimes with the same results for the same scope)
- Human error creating the scripts that import test results
- A script calls Xray REST, and a timeout is reached. The script tries repeatedly until there are no errors or the instance becomes unresponsive. The timeout does not mean the calculation hasn't started, so Xray ends up with a lot of results for the same test
- Number of Tests Runs starts pilling up for each Test
- Some actions, such as changing a fix version, linking to a new Test Plan, or removing an already executed test execution, removing test executions from testplans, etc will cause a reset, i.e., force Xray to load all the test runs and recalculate the Test Run Status.
- When multiple imports for the same tests arrive at the same time, Xray has a mechanism of recalculating everything if there is race condition saving the data, i.e, the older results arrive last to write into the BD, causing the reset all as well
Possible solutions
While the Automated Archiving Feature slightly mitigates this, we think that there are multiple steps to take to fix this and avoid this situation.
- Refactor the code path that handles the work that is on the thread of the REST call, thus reducing the probability of having false timeouts
- Make the SQL queries involved in the calculation faster
- Prevent the same test from being imported more than once at the same time
- Have a mechanism for recovering from a restart that caused threads to stop in the middle of calculation work
- Have a persisted queue of Tests+scope+result being imported and have a service running every 1-5 minutes that would read from that queue and calculate. This would avoid the immediate calculation on imports and thus reducing the risk of handling multiple redundant calculation as well as guarantee to continue the calculation in case of an Xray shutdown event
This list requires further study.
Estimation
This would be an L or XL project, has a preliminary study is needed, multiple approaches must be compared and finally a lot to non-regression testing.