Defer updating workflow completion metrics until completion accepted by server#2742
Defer updating workflow completion metrics until completion accepted by server#2742cretz merged 5 commits intotemporalio:masterfrom
Conversation
|
|
||
| if (context.isCancelRequested()) { | ||
| workflowStateMachines.cancelWorkflow(); | ||
| metricsScope.counter(MetricsType.WORKFLOW_CANCELED_COUNTER).inc(1); |
There was a problem hiding this comment.
These completion counters are now tracked inside workflow state machines object
temporal-sdk/src/test/java/io/temporal/client/functional/MetricsTest.java
Outdated
Show resolved
Hide resolved
|
@maciejdudko as co-maintainer may want to review as well |
maciejdudko
left a comment
There was a problem hiding this comment.
Tests could be more split up but it's not blocking. LGTM
| assertEquals( | ||
| 2, counts.get(io.temporal.worker.MetricsType.WORKFLOW_COMPLETED_COUNTER).intValue()); | ||
| assertEquals( | ||
| 1, counts.get(io.temporal.worker.MetricsType.WORKFLOW_FAILED_COUNTER).intValue()); | ||
| assertEquals( | ||
| 1, | ||
| counts | ||
| .get(io.temporal.worker.MetricsType.WORKFLOW_CONTINUE_AS_NEW_COUNTER) | ||
| .intValue()); | ||
| assertEquals( | ||
| 1, counts.get(io.temporal.worker.MetricsType.WORKFLOW_CANCELED_COUNTER).intValue()); |
There was a problem hiding this comment.
Nit: splitting each workflow run into separate testcase would more strongly verify that the right counter was incremented in each scenario. Could also add checking presence of WORKFLOW_E2E_LATENCY.
There was a problem hiding this comment.
I split the tests. I wasn't going to and was going to add e2e check in here, but it turns out we calculate e2e as current - start event time, and the latter is extremely skewed with time skipping causing negative values (which Tally ignores), so it was easier to check in separate tests.
What was changed
Defer updating workflow completion metrics until completion accepted by server. This is done by capturing the metrics to update in the task handler result and then providing a post-completion runnable to be invoked if the gRPC method is a success.
This technically changes behavior to no longer record metrics in situations like unhandled commands, which is a more accurate approach.
Checklist