Conversation
|
🔧 Report generated by pr-comment-scanbuild |
e5bce28 to
0918008
Compare
|
I have reasonable performance on most runs: I'm not sure why some runs still blow up for higher numbers of threads. |
e0ac246 to
2421ba9
Compare
- Reserve padded slots - Introduce a register / unregister to retrieve slots - manage a free list
2421ba9 to
50a8d5f
Compare
|
I did run some comparison of native memory usage with different thread filter implementations - data is in the notebook TL;DR there is no observable increase in the native memory usage (the |
…t/thread_filter_squash
If the TLS cleanup fires before the JVMTI hook, we want to ensure that we don't crash while retrieving the ProfiledThread - Add a check on validity of ProfiledThread
- Start the profiler to ensure we have valid thread objects - add asserts around missing thread object - remove print (replacing with an assert)
…t/thread_filter_squash
6171739 to
e30d88f
Compare
- Fix removal of self in timerloop init it was not using a slotID but a thread ID - Add assertion to find other potential issues
e30d88f to
e78a6b2
Compare
| Java_com_datadoghq_profiler_JavaProfiler_filterThreadRemove0(JNIEnv *env, | ||
| jobject unused) { | ||
| ProfiledThread *current = ProfiledThread::current(); | ||
| if (unlikely(current == nullptr)) { |
There was a problem hiding this comment.
I think assert(current != nllptr) should be sufficient, otherwise, we have a bigger problem.
There was a problem hiding this comment.
I think this happens on unloading. JVMTI cleanup can be removed before all threads are finished ?
There was a problem hiding this comment.
We have the assert for debug builds, though we can keep avoiding crashes for release builds. Feel free to answer if you do not agree.
| return; | ||
| } | ||
| int tid = current->tid(); | ||
| if (unlikely(tid < 0)) { |
There was a problem hiding this comment.
Is it possible? or we should just assert
There was a problem hiding this comment.
Good question
I think we are missing the instrumentation of some threads. Could non-java threads call back into java?
It could be nice to have asserts to debug this, though I think I'd prefer the safer path for a prod release.
| return; | ||
| } | ||
| int tid = current->tid(); | ||
| if (unlikely(tid < 0)) { |
| private native void stop0() throws IllegalStateException; | ||
| private native String execute0(String command) throws IllegalArgumentException, IllegalStateException, IOException; | ||
|
|
||
| private native void filterThreadAdd0(); |
There was a problem hiding this comment.
It looks like that filterThreadAdd0 == filterThread0(true) and filterThreadRemove0 == filterThread0(false). Please remove duplications.
There was a problem hiding this comment.
We have definition of DEFAULT_CACHE_LINE_SIZE in dd_arch.h. I would suggest following code for readability and portability.
struct alignas(DEFAULT_CACHE_LINE_SIZE) Slot {
std::atomic<int> value{-1};
char padding[DEFAULT_CACHE_LINE_SIZE - sizeof(value)];
};
There was a problem hiding this comment.
Use DEFAULT_CACHE_LINE_SIZE for readability and portability.
There was a problem hiding this comment.
memory_order_release should be sufficient.
|
|
||
| ThreadFilter::SlotID ThreadFilter::registerThread() { | ||
| // If disabled, block new registrations | ||
| if (!_enabled.load(std::memory_order_acquire)) { |
There was a problem hiding this comment.
I don't see any memory ordering _enabled providing. Could you explain what it releases
There was a problem hiding this comment.
We want the init to be called (so that constructor is finished)
This means other threads might not have the init finished
Though we can always lazily register them later (even if the on thread start path is better)
So this acts as a load barrier. I think it makes sense. Feel free to challenge.
There was a problem hiding this comment.
I think you need memory_order_acquire ordering here to match the release store.
There was a problem hiding this comment.
same as the add, I think we can keep relaxed. Feel free to challenge.
There was a problem hiding this comment.
I think you do need std::memory_order_acquire, just as you did in ThreadFilter::accept()
zhengyu123
left a comment
There was a problem hiding this comment.
I did partial second round reviewing, I think there are many inconsistencies in memory ordering.
There was a problem hiding this comment.
memory_order_acquire instead of memory_order_acq_rel
There was a problem hiding this comment.
Need memory_order_acquire ordering
There was a problem hiding this comment.
This was intentional, the idea is that we either have null, or a valid chunk. The impact of missing one trace is acceptable if we don't see the chunk in the current thread (which is pretty unlikely). wdyt ?
There was a problem hiding this comment.
memory_order_relaxed should be sufficient.
There was a problem hiding this comment.
I think this is a tradeoff. I hope this is a path where acquire is acceptable. If you think otherwise I can adjust.
|
Thanks @zhengyu123 I really appreciate the thorough review. Apologies if I only have a little time to spend on this every week. |
…t/thread_filter_squash In the current merge, I'm removing the active bitmap. The ActiveBitmap needs to be adjusted to the slot logics. We can adjust this by retrieving the address of the slot. This can be simpler than with the bitmap.
There was a problem hiding this comment.
I think you do need std::memory_order_acquire, just as you did in ThreadFilter::accept() and match the release from ChunkStorage* expected = nullptr; if (_chunks[chunk_idx].compare_exchange_strong(expected, new_chunk, std::memory_order_release))
There was a problem hiding this comment.
I'm not sure we do, but let's measure before we debate. I'll run the test to see if this changes anything on perf.
There was a problem hiding this comment.
I think you do need std::memory_order_acquire, just as you did in ThreadFilter::accept()
|
@zhengyu123 are we OK with this version ? |
| // Collect thread IDs from the fixed-size table into the main set | ||
| _thread_ids[i][old_index].collect(threads); | ||
| _thread_ids[i][old_index].clear(); | ||
| } |
There was a problem hiding this comment.
I think memory_order_acq_rel is good here for the old_index. Though we might be talking about something else.
| ProfiledThread *current = ProfiledThread::current(); | ||
| int tid = -1; | ||
|
|
||
| if (current != nullptr) { |
There was a problem hiding this comment.
I remember seeing a crash around this..
Basically JVMTI could unload and delete the profiled thread from under our feet.
|
Notes for future: if we care about the ~100KB, we can reduce the padding. This becomes a tradeoff between memory and risk of false sharing. |
What does this PR do?:
Motivation:
Improve throughput of applications that run on many threads with many context updates.
Additional Notes:
How to test the change?:
For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.Unsure? Have a question? Request a review!