Implement Nexus operation cancellation types#2520
Conversation
| * is canceled. The result of the cancellation independently of the type is a {@link | ||
| * CanceledFailure} thrown from the Nexus operation method. | ||
| */ | ||
| public enum NexusOperationCancellationType { |
There was a problem hiding this comment.
Are we releasing this as GA or will it be Pre-release/Public preview?
There was a problem hiding this comment.
Shouldn't it be Pre-release/Public preview, we talked about it during the nexus crew stand up ~2 weeks ago?
There was a problem hiding this comment.
IIRC we did not mark it as experimental in the Go SDK. I could do so here if you prefer, I forget that conversation.
There was a problem hiding this comment.
We did end up marking it experimental for Go, lets mark it experimental here as well
| * @param cancellationType the cancellation type for the Nexus operation | ||
| * @return this | ||
| */ | ||
| public NexusOperationOptions.Builder setCancellationType( |
There was a problem hiding this comment.
Can we document the default?
| }; | ||
| } | ||
|
|
||
| private void notifyNexusOperationCanceled( |
There was a problem hiding this comment.
We need to be careful we don't copy this bug from child workflows to nexus operations #2160
There was a problem hiding this comment.
essentially not run the event loop if we are notified by an event
There was a problem hiding this comment.
I think this point still needs to be addressed
There was a problem hiding this comment.
Does it? I just removed the call to eventLoop(). Was there some other change required?
There was a problem hiding this comment.
Well removing the call to eventLoop() doesn't address the issue because if we are not notified by an event we do need to run the event loop
There was a problem hiding this comment.
In what situation would we not be notified by an event? I guess I don't totally understand the event loop logic or how to check where the notification came from.
|
Looks like this PR changes |
| } | ||
|
|
||
| public Failure createCancelNexusOperationFailure(Failure cause) { | ||
| Failure canceledFailure = Failure.newBuilder(cause).setSource(JAVA_SDK).build(); |
There was a problem hiding this comment.
The source should only be set to the JAVA_SDK on the very specific code path where the "operation canceled before it was started", not in general
There was a problem hiding this comment.
Well it applies in any case where the SDK is generating the failure
|
|
||
| private void notifyCompleted() { | ||
| setInitialCommandEventId(); | ||
| Failure canceledFailure = |
There was a problem hiding this comment.
Why are we passing a failure back on notifyCompleted? If you look at CancelExternalStateMachine which is basically the same state machine but for workflows we don't pass anything back
There was a problem hiding this comment.
It is to pass back the specific Nexus operation failure. I think CancelExternal always has the same cause.
There was a problem hiding this comment.
The completionCallback takes two parameters. The intention is the first parameter is for communicating success, the second is for communication failure. notifyCompleted is not a failure so it shouldn't pass a value on the second parameter
There was a problem hiding this comment.
otherwise completionCallback should just take one parameter since you are never using the first parameter
There was a problem hiding this comment.
Gotcha. Updated, let me know if that looks better.
temporal-sdk/src/test/java/io/temporal/workflow/nexus/CancelWorkflowAsyncOperationTest.java
Outdated
Show resolved
Hide resolved
| stateMachineSink); | ||
| return () -> { | ||
| if (operation.isCancellable()) { | ||
| operation.cancel(); |
There was a problem hiding this comment.
hm, if NexusOperationCancellationType.ABANDON should we cancel?
There was a problem hiding this comment.
I think if the operation hasn't started, we should still cancel yes. ABANDON indicates that we should not send the RequestCancel command which we do not do if the operation hasn't been started.
There was a problem hiding this comment.
hm so looking at the child workflow cancellation
if (cancellationType == ChildWorkflowCancellationType.ABANDON) {
notifyChildCanceled(completionCallback);
return;
}
// The only time child can be canceled directly is before its start command
// was sent out to the service. After that RequestCancelExternal should be used.
if (child.isCancellable()) {
child.cancel();
return;
}
We don't cancel the child workflow if ChildWorkflowCancellationType.ABANDON
There was a problem hiding this comment.
If the schedule command hasn't been sent, I mean. Not that the operation itself has not started.
There was a problem hiding this comment.
I think most users understand ABANDON as we don't request cancellation of the entity (child workflow or nexus operation) regardless of what state the entity happens to be in
There was a problem hiding this comment.
Oh okay. That makes sense.
| import org.junit.Rule; | ||
| import org.junit.Test; | ||
|
|
||
| public class CancelWorkflowAsyncOperationTest extends BaseNexusTest { |
There was a problem hiding this comment.
Does this work?
| public class CancelWorkflowAsyncOperationTest extends BaseNexusTest { | |
| public class CancelWorkflowAsyncOperationTest { |
SDKTestWorkflowRule should do all the setup for this test
There was a problem hiding this comment.
I did it this way only for the replay test since we cannot use the test rule to get the endpoint name from within the static workflow definition
|
|
||
| NexusOperationHandle<Void> handle = | ||
| Workflow.startNexusOperation(serviceStub::operation, 2000L); | ||
| handle.getExecution().get(); |
There was a problem hiding this comment.
Can we also have a test where we cancel with the different cancellation types before calling handle.getExecution().get(); to trigger the code path when we cancel before the command is sent
|
LGTM! would like a bit more test coverage on the start then immediately cancel code path, but otherwise look good. |
|
|
||
| /** | ||
| * Sets the cancellation type for the Nexus operation. Defaults to WAIT_COMPLETED. Note: | ||
| * EXPERIMENTAL |
There was a problem hiding this comment.
Oh sorry I missed this, for the Java SDK we have an annotation to mark APIs as experimental you can add @Experimental to the API and type
temporal-sdk/src/main/java/io/temporal/common/Experimental.java
What was changed
Implemented cancellation types for Nexus operations which specify what action to take when the caller context is cancelled.
ABANDON-> Do not request cancellation of the operation.TRY_CANCEL-> Initiate a cancellation request and immediately report cancellation to the caller.WAIT_REQUESTED-> Request cancellation of the operation and wait for confirmation that the request was received.WAIT_COMPLETED-> Wait for operation completion. Default.These options are set on NexusOperationOptions which are passed in when starting an operation.
Why?
To give users more control over what happens to their handler workflows when their caller workflows are cancelled.
closes #2247