Skip to content

Implement Nexus operation cancellation types#2520

Merged
pdoerner merged 14 commits intomasterfrom
nexus-cancellation-types
Jun 12, 2025
Merged

Implement Nexus operation cancellation types#2520
pdoerner merged 14 commits intomasterfrom
nexus-cancellation-types

Conversation

@pdoerner
Copy link
Contributor

@pdoerner pdoerner commented May 13, 2025

What was changed

Implemented cancellation types for Nexus operations which specify what action to take when the caller context is cancelled.

  • ABANDON -> Do not request cancellation of the operation.
  • TRY_CANCEL -> Initiate a cancellation request and immediately report cancellation to the caller.
  • WAIT_REQUESTED -> Request cancellation of the operation and wait for confirmation that the request was received.
  • WAIT_COMPLETED -> Wait for operation completion. Default.
    These options are set on NexusOperationOptions which are passed in when starting an operation.

Why?

To give users more control over what happens to their handler workflows when their caller workflows are cancelled.

closes #2247

* is canceled. The result of the cancellation independently of the type is a {@link
* CanceledFailure} thrown from the Nexus operation method.
*/
public enum NexusOperationCancellationType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we releasing this as GA or will it be Pre-release/Public preview?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be GA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be Pre-release/Public preview, we talked about it during the nexus crew stand up ~2 weeks ago?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we did not mark it as experimental in the Go SDK. I could do so here if you prefer, I forget that conversation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did end up marking it experimental for Go, lets mark it experimental here as well

* @param cancellationType the cancellation type for the Nexus operation
* @return this
*/
public NexusOperationOptions.Builder setCancellationType(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document the default?

};
}

private void notifyNexusOperationCanceled(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful we don't copy this bug from child workflows to nexus operations #2160

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essentially not run the event loop if we are notified by an event

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this point still needs to be addressed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it? I just removed the call to eventLoop(). Was there some other change required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well removing the call to eventLoop() doesn't address the issue because if we are not notified by an event we do need to run the event loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation would we not be notified by an event? I guess I don't totally understand the event loop logic or how to check where the notification came from.

@pdoerner pdoerner marked this pull request as ready for review June 11, 2025 14:37
@pdoerner pdoerner requested a review from a team as a code owner June 11, 2025 14:37
@Quinn-With-Two-Ns
Copy link
Contributor

Looks like this PR changes temporal-serviceclient/src/main/protocloud to a very old commit can you please fix?

}

public Failure createCancelNexusOperationFailure(Failure cause) {
Failure canceledFailure = Failure.newBuilder(cause).setSource(JAVA_SDK).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source should only be set to the JAVA_SDK on the very specific code path where the "operation canceled before it was started", not in general

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it applies in any case where the SDK is generating the failure


private void notifyCompleted() {
setInitialCommandEventId();
Failure canceledFailure =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we passing a failure back on notifyCompleted? If you look at CancelExternalStateMachine which is basically the same state machine but for workflows we don't pass anything back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to pass back the specific Nexus operation failure. I think CancelExternal always has the same cause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The completionCallback takes two parameters. The intention is the first parameter is for communicating success, the second is for communication failure. notifyCompleted is not a failure so it shouldn't pass a value on the second parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise completionCallback should just take one parameter since you are never using the first parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Updated, let me know if that looks better.

stateMachineSink);
return () -> {
if (operation.isCancellable()) {
operation.cancel();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, if NexusOperationCancellationType.ABANDON should we cancel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the operation hasn't started, we should still cancel yes. ABANDON indicates that we should not send the RequestCancel command which we do not do if the operation hasn't been started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm so looking at the child workflow cancellation

      if (cancellationType == ChildWorkflowCancellationType.ABANDON) {
        notifyChildCanceled(completionCallback);
        return;
      }
      // The only time child can be canceled directly is before its start command
      // was sent out to the service. After that RequestCancelExternal should be used.
      if (child.isCancellable()) {
        child.cancel();
        return;
      }

We don't cancel the child workflow if ChildWorkflowCancellationType.ABANDON

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the schedule command hasn't been sent, I mean. Not that the operation itself has not started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most users understand ABANDON as we don't request cancellation of the entity (child workflow or nexus operation) regardless of what state the entity happens to be in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay. That makes sense.

import org.junit.Rule;
import org.junit.Test;

public class CancelWorkflowAsyncOperationTest extends BaseNexusTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work?

Suggested change
public class CancelWorkflowAsyncOperationTest extends BaseNexusTest {
public class CancelWorkflowAsyncOperationTest {

SDKTestWorkflowRule should do all the setup for this test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way only for the replay test since we cannot use the test rule to get the endpoint name from within the static workflow definition


NexusOperationHandle<Void> handle =
Workflow.startNexusOperation(serviceStub::operation, 2000L);
handle.getExecution().get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a test where we cancel with the different cancellation types before calling handle.getExecution().get(); to trigger the code path when we cancel before the command is sent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Quinn-With-Two-Ns
Copy link
Contributor

LGTM! would like a bit more test coverage on the start then immediately cancel code path, but otherwise look good.


/**
* Sets the cancellation type for the Nexus operation. Defaults to WAIT_COMPLETED. Note:
* EXPERIMENTAL
Copy link
Contributor

@Quinn-With-Two-Ns Quinn-With-Two-Ns Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I missed this, for the Java SDK we have an annotation to mark APIs as experimental you can add @Experimental to the API and type

temporal-sdk/src/main/java/io/temporal/common/Experimental.java

@pdoerner pdoerner enabled auto-merge (squash) June 12, 2025 17:14
@pdoerner pdoerner merged commit e7e3fa6 into master Jun 12, 2025
18 checks passed
@pdoerner pdoerner deleted the nexus-cancellation-types branch June 12, 2025 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Nexus operation cancellation type

2 participants