feat(metrics): [Trace Metrics 5] Add batch processor for Metrics#4984
Conversation
Instructions and example for changelogPlease add an entry to Example: ## Unreleased
### Features
- [Trace Metrics 5] Add batch processor for Metrics ([#4984](https://github.com/getsentry/sentry-java/pull/4984))If none of the above apply, you can opt out of this check by adding |
ede7fc3 to
d7a5d68
Compare
f3fe6f7 to
b324629
Compare
markushi
left a comment
There was a problem hiding this comment.
Looking good, just one concern within MetricsBatchProcessor.close() which I would like to discuss before merging.
sentry/src/main/java/io/sentry/metrics/DefaultMetricsBatchProcessorFactory.java
Show resolved
Hide resolved
| executorService.submit(() -> executorService.close(options.getShutdownTimeoutMillis())); | ||
| } else { | ||
| executorService.close(options.getShutdownTimeoutMillis()); | ||
| while (!queue.isEmpty()) { |
There was a problem hiding this comment.
h: Hmm if we have some customer emitting metrics in an endless loop, and the app/sdk gets closed - wouldn't this continue to call flushBatch()? Maybe it makes sense to have a isClosed flag to guard against that and early exit here + within add()
There was a problem hiding this comment.
in MetricsApi there's a check that should prevent adding anything after the SDK has closed down:
if (!scopes.isEnabled()) {
options
.getLogger()
.log(SentryLevel.WARNING, "Instance is disabled and this 'metrics' call is a no-op.");
return;
}
There might be an issue where this loops and the SDK can't set enabled to false, gonna verify.
| if (hasScheduled && !forceSchedule) { | ||
| return; | ||
| } | ||
| try (final @NotNull ISentryLifecycleToken ignored = scheduleLock.acquire()) { | ||
| final @Nullable Future<?> latestScheduledFlush = scheduledFlush; |
There was a problem hiding this comment.
Looks like this is the only place where hasScheduled is being used outside the scheduleLock, for consistency it could make sense to move this inside.
| if (hasScheduled && !forceSchedule) { | |
| return; | |
| } | |
| try (final @NotNull ISentryLifecycleToken ignored = scheduleLock.acquire()) { | |
| final @Nullable Future<?> latestScheduledFlush = scheduledFlush; | |
| try (final @NotNull ISentryLifecycleToken ignored = scheduleLock.acquire()) { | |
| if (hasScheduled && !forceSchedule) { | |
| return; | |
| } | |
| final @Nullable Future<?> latestScheduledFlush = scheduledFlush; |
There was a problem hiding this comment.
The idea here was to avoid waiting for the lock when scheduling has already happened anyways.
The code checks whether queue is empty before hasScheduled = false.
Do you think ensuring this check is correct warrants the performance hit from acquiring the lock for each metric added?
There was a problem hiding this comment.
Do you think ensuring this check is correct warrants the performance hit from acquiring the lock for each metric added?
No not at all 😅 I totally overlooked this is being called as part of every metric emission, all good then!
b324629 to
ff0fe29
Compare
| private final @NotNull ISentryExecutorService executorService; | ||
| private volatile @Nullable Future<?> scheduledFlush; | ||
| private static final @NotNull AutoClosableReentrantLock scheduleLock = | ||
| new AutoClosableReentrantLock(); |
There was a problem hiding this comment.
Static lock shared across instances causes concurrency issues
Medium Severity
The scheduleLock field is declared as static final, meaning all instances of MetricsBatchProcessor share the same lock. However, the fields it protects (scheduledFlush, hasScheduled) are instance variables. When multiple processor instances exist (e.g., after SDK reinitialization), one instance's lock acquisition blocks another independent instance, and the hasScheduled state can become incorrectly coordinated across unrelated instances.
|
@sentry review |

📜 Description
💡 Motivation and Context
💚 How did you test it?
📝 Checklist
sendDefaultPIIis enabled.🔮 Next steps