Conversation
|
Could you show the screenshow at least 3-4 mins data? One dot in only one minute is not very clear. |
There was a problem hiding this comment.
Review: Support Virtual-GenAI monitoring
Critical Issues
1. Config file exclude mismatch in server-starter/pom.xml
Exclude says gen-ai-settings.yml but actual file is gen-ai-config.yml.
2. requiredModules() returns empty array
GenAIAnalyzerModuleProvider uses CoreModule in start() but doesn't declare it in requiredModules(). Should return new String[] { CoreModule.NAME }.
3. Module naming convention violation
Existing analyzer modules use lowercase-hyphenated: agent-analyzer, log-analyzer, meter-analyzer. New module genAI-analyzer should be gen-ai-analyzer.
4. Package should be org.apache.skywalking.oap.server.analyzer.genai
Currently uses org.apache.skywalking.oap.meter.analyzer.* which collides with the existing meter-analyzer module's package. Since this is a trace span analyzer (shared across SkyWalking/OTEL/Zipkin receivers), the package should be org.apache.skywalking.oap.server.analyzer.genai.*.
Design Issues
5. Duplicate OAL metric
gen_ai_provider_resp_time and gen_ai_provider_latency_avg both compute from(GenAIProviderAccess.latency).longAvg(). Remove one.
6. totalCost semantics are confusing
Stored value is tokens * costPerM, dashboard divides by 1,000,000. Better to store actual cost by dividing at computation time.
7. Missing NamingControl in VirtualGenAIProcessor
Other virtual processors all use NamingControl to normalize service names. GenAI processor skips this.
8. Tag key inconsistency: gen_ai.stream.ttfr vs timeToFirstToken
Tag says "ttfr", field says "timeToFirstToken", doc doesn't mention this tag at all.
Code Quality Issues
9. GenAIConfigLoader constructor ignores Yaml parameter
Accepts Yaml but creates a new one in loadConfig().
10. fastjson dependency in e2e test
No new dependency version should be added directly in sub-module pom.xml.
Dependencies are managed by BOM. We have decided not to include this repo as it had a lot of critical CVEs before. We have to fix those(re-release patch version), it is too pain.
11. E2E Dockerfile clones unpinned external repo
Dockerfile.provider clones spring-projects/spring-ai-examples without pinning a commit/tag. Any upstream change could break the e2e test.
12. Documentation typo
virtual-genai.md: "Virtual cache represent the Generative AI service nodes" - copy-paste from virtual-cache doc.
Minor Issues
13. Missing newline at end of file in multiple files: gen-ai-config.yml, menu.yaml, SPI files, e2e expected YAMLs, dashboard JSONs.
14. GenAIModelAccessDispatcher bypasses normal dispatch flow - directly calls MetricsStreamProcessor.getInstance().in(traffic).
15. VirtualGenAIProcessor.recordList should be final.
16. Blank line in import block in VirtualServiceAnalysisListener.java between java.util and lombok imports.
There was a problem hiding this comment.
Additional issue: should use percentile2 instead of percentile
All production OAL files use percentile2(10). The old percentile function only exists in e2e test OAL for backward-compatibility testing.
In virtual-gen-ai.oal, the following lines should use percentile2:
gen_ai_provider_latency_percentile = from(GenAIProviderAccess.latency).percentile2(10);
gen_ai_model_latency_percentile = from(GenAIModelAccess.latency).percentile2(10);
gen_ai_model_ttft_percentile = from(GenAIModelAccess.timeToFirstToken).filter(timeToFirstToken > 0).percentile2(10);
And your UI doesn't show the correct percentile labels.
|
@wu-sheng |
|
UI side got merged. When you update this PR, please include the submodule update. |
|
not yet finish, some check fails in my local env, still fixing |
| @@ -0,0 +1,16 @@ | |||
| # Virtual GenAI | |||
There was a problem hiding this comment.
You need to update the demo to point to here. I think from Marketplace/General Service?
There was a problem hiding this comment.
Not just this. menu.yml is not updated in the /docs/en
|
e2e fails, please fix it. |
|
@wu-sheng |
…ng' of github.com:peachisai/skywalking into Support-GenAI-monitoring
|
ui submodule cannot push successfully, always loading . will try again. |
...c/main/java/org/apache/skywalking/oap/analyzer/genai/service/GenAIModelAccessDispatcher.java
Outdated
Show resolved
Hide resolved
…lking into Support-GenAI-monitoring
|
ui submodule had updated |
|
You have build errors. |
works on my side. could u retrigger the workflow again? it should be an occasional issue. |
wu-sheng
left a comment
There was a problem hiding this comment.
The spring-ai-examples service in docker-compose.yml has no healthcheck. The e2e trigger will start hitting http://localhost:9260/ai/generateStream as soon as setup completes, but the Spring Boot app may not be ready yet. This could cause flaky tests.
Consider adding a healthcheck like:
healthcheck:
test: ["CMD", "sh", "-c", "nc -nz 127.0.0.1 8080"]
interval: 5s
timeout: 60s
retries: 120
wu-sheng
left a comment
There was a problem hiding this comment.
The doc at docs/en/setup/service-agent/virtual-genai.md has a few grammar issues and is missing key information about provider configuration, cost estimation, and available metrics. Here's a suggested version:
# Virtual GenAI
Virtual GenAI represents the Generative AI service nodes detected by [server agents' plugins](server-agents.md). The performance
metrics of the GenAI operations are from the GenAI client-side perspective.
For example, a Spring AI plugin in the Java agent could detect the latency of a chat completion request.
As a result, SkyWalking would show traffic, latency, success rate, token usage (input/output), and estimated cost in the GenAI dashboard.
## Span Contract
The GenAI operation span should have the following properties:
- It is an **Exit** span
- **Span's layer == GENAI**
- Tag key = `gen_ai.provider.name`, value = The Generative AI provider, e.g. openai, anthropic, ollama
- Tag key = `gen_ai.response.model`, value = The name of the GenAI model, e.g. gpt-4o, claude-3-5-sonnet
- Tag key = `gen_ai.usage.input_tokens`, value = The number of tokens used in the GenAI input (prompt)
- Tag key = `gen_ai.usage.output_tokens`, value = The number of tokens used in the GenAI response (completion)
- Tag key = `gen_ai.server.time_to_first_token`, value = The duration in milliseconds until the first token is received (streaming requests only)
- If the GenAI service is a remote API (e.g. OpenAI), the span's peer should be the network address (IP or domain) of the GenAI server.
## Provider Configuration
SkyWalking uses `gen-ai-config.yml` to map model names to providers and configure cost estimation.
When the `gen_ai.provider.name` tag is present in the span, it is used directly. Otherwise, SkyWalking matches the model name
against `prefix-match` rules to identify the provider. For example, a model name starting with `gpt` is mapped to `openai`.
To configure cost estimation, add `models` with pricing under the provider:
```yaml
providers:
- provider: openai
prefix-match:
- gpt
models:
- name: gpt-4o
input-cost-per-m: 2.5 # cost per 1,000,000 input tokens
output-cost-per-m: 10 # cost per 1,000,000 output tokens
```
## Metrics
The following metrics are available at the **provider** (service) level:
- `gen_ai_provider_cpm` - Calls per minute
- `gen_ai_provider_sla` - Success rate
- `gen_ai_provider_resp_time` - Average response time
- `gen_ai_provider_latency_percentile` - Latency percentiles
- `gen_ai_provider_input_tokens_sum / avg` - Input token usage
- `gen_ai_provider_output_tokens_sum / avg` - Output token usage
- `gen_ai_provider_total_cost / avg_cost` - Estimated cost
The following metrics are available at the **model** (service instance) level:
- `gen_ai_model_call_cpm` - Calls per minute
- `gen_ai_model_sla` - Success rate
- `gen_ai_model_latency_avg / percentile` - Latency
- `gen_ai_model_ttft_avg / percentile` - Time to first token (streaming only)
- `gen_ai_model_input_tokens_sum / avg` - Input token usage
- `gen_ai_model_output_tokens_sum / avg` - Output token usage
- `gen_ai_model_total_cost / avg_cost` - Estimated cost|
And we should mention which version of the Java agent we required. |
About what we provided models, could you run some search(AI should be able to do that), adding the official token price is better. |
I’ve considered this before, but I’m concerned that including pricing configurations for these commercial models might lead to sensitivity issues. Additionally, model pricing tends to change rapidly. |
|
You could updated date to let others know(in the doc and on the UI), then they will know they need to udpate. |
There was a problem hiding this comment.
Pull request overview
This PR adds Virtual GenAI observability to SkyWalking, including backend analysis + metrics (OAL), UI dashboards/menu entries, distribution config, and an E2E scenario to validate emitted metrics.
Changes:
- Add a new
gen-ai-analyzermodule to extract GenAI metrics from spans and emit virtual service/provider/model sources. - Add Virtual GenAI OAL metrics and UI initialized dashboard templates + menu entries.
- Add an E2E test case (mock LLM endpoint + Spring AI example) and wire it into CI.
Reviewed changes
Copilot reviewed 51 out of 51 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/e2e-v2/script/env | Bumps Java agent commit used by E2E. |
| test/e2e-v2/java-test-service/e2e-service-provider/src/main/java/org/apache/skywalking/e2e/controller/LLMMockController.java | Adds an SSE streaming mock LLM endpoint for E2E. |
| test/e2e-v2/java-test-service/e2e-service-provider/pom.xml | Minor formatting change in test service POM. |
| test/e2e-v2/cases/virtual-genai/virtual-genai.yaml | Adds Virtual GenAI GraphQL metric verification cases. |
| test/e2e-v2/cases/virtual-genai/expected/service.yml | Expected service list output for Virtual GenAI provider. |
| test/e2e-v2/cases/virtual-genai/expected/metrics-has-value.yml | Generic expected “metric has value” template. |
| test/e2e-v2/cases/virtual-genai/expected/metrics-has-value-label.yml | Expected metric output with labels (percentiles). |
| test/e2e-v2/cases/virtual-genai/expected/instance.yml | Expected instance list output for GenAI model. |
| test/e2e-v2/cases/virtual-genai/e2e.yaml | E2E scenario definition for Virtual GenAI. |
| test/e2e-v2/cases/virtual-genai/docker-compose.yml | Compose stack for OAP+BanyanDB+mock provider+Spring AI sample. |
| test/e2e-v2/cases/virtual-genai/Dockerfile.provider | Builds Spring AI example image with SkyWalking agent. |
| test/e2e-v2/cases/storage/expected/config-dump.yml | Updates expected config dump to include gen-ai-analyzer provider. |
| skywalking-ui | Updates UI submodule pointer for new dashboards/menu/icon support. |
| oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-root.json | Adds root dashboard template for Virtual GenAI. |
| oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-provider.json | Adds provider-level Virtual GenAI dashboard template. |
| oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-model.json | Adds model-level Virtual GenAI dashboard template. |
| oap-server/server-starter/src/main/resources/ui-initialized-templates/rocketmq/rocketmq-root.json | Fixes RocketMQ doc URL casing. |
| oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml | Adds GenAI menu section and Virtual GenAI entry. |
| oap-server/server-starter/src/main/resources/oal/virtual-gen-ai.oal | Adds OAL metrics definitions for GenAI provider/model. |
| oap-server/server-starter/src/main/resources/gen-ai-config.yml | Adds provider prefix-matching + pricing config file. |
| oap-server/server-starter/src/main/resources/application.yml | Adds gen-ai-analyzer module selector wiring. |
| oap-server/server-starter/pom.xml | Excludes gen-ai-config.yml from a build plugin’s scanning/formatting. |
| oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/source/GenAIProviderAccess.java | Adds new source scope for provider-level access metrics. |
| oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/source/GenAIModelAccess.java | Adds new source scope for model-level access metrics. |
| oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/source/GenAIMetrics.java | Adds internal DTO to carry extracted GenAI metrics. |
| oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/source/DefaultScopeDefine.java | Adds new scope IDs for GenAI sources. |
| oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java | Registers Virtual GenAI layer for UI template initialization. |
| oap-server/oal-rt/src/test/java/org/apache/skywalking/oal/v2/generator/RuntimeOALGenerationTest.java | Extends OAL runtime generation test for GenAI OAL + sources. |
| oap-server/oal-grammar/src/main/antlr4/org/apache/skywalking/oal/rt/grammar/OALParser.g4 | Allows GenAI sources in OAL grammar. |
| oap-server/oal-grammar/src/main/antlr4/org/apache/skywalking/oal/rt/grammar/OALLexer.g4 | Adds lexer tokens for GenAI sources. |
| oap-server/analyzer/pom.xml | Adds new analyzer submodule gen-ai-analyzer. |
| oap-server/analyzer/gen-ai-analyzer/src/main/resources/META-INF/services/org.apache.skywalking.oap.server.library.module.ModuleProvider | Registers GenAI analyzer module provider. |
| oap-server/analyzer/gen-ai-analyzer/src/main/resources/META-INF/services/org.apache.skywalking.oap.server.library.module.ModuleDefine | Registers GenAI analyzer module define. |
| oap-server/analyzer/gen-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/service/IGenAIMeterAnalyzerService.java | Defines service interface for extracting GenAI metrics from spans. |
| oap-server/analyzer/gen-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/service/GenAIMeterAnalyzer.java | Implements tag extraction + provider matching + cost/latency derivation. |
| oap-server/analyzer/gen-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/module/GenAIAnalyzerModule.java | Declares gen-ai-analyzer module and its services. |
| oap-server/analyzer/gen-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/matcher/GenAIProviderPrefixMatcher.java | Implements provider inference via prefix trie. |
| oap-server/analyzer/gen-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/config/GenAITagKey.java | Centralizes GenAI span tag keys. |
| oap-server/analyzer/gen-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/config/GenAIOALDefine.java | Registers the Virtual GenAI OAL file for loading. |
| oap-server/analyzer/gen-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/config/GenAIConfigLoader.java | Loads gen-ai-config.yml and builds config objects. |
| oap-server/analyzer/gen-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/config/GenAIConfig.java | Defines config model for provider matching and pricing. |
| oap-server/analyzer/gen-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/GenAIAnalyzerModuleProvider.java | Wires config loading, matcher, and OAL engine loading on start. |
| oap-server/analyzer/gen-ai-analyzer/pom.xml | Adds Maven module for gen-ai-analyzer. |
| oap-server/analyzer/agent-analyzer/src/main/java/org/apache/skywalking/oap/server/analyzer/provider/trace/parser/listener/vservice/VirtualGenAIProcessor.java | Emits virtual GenAI service/instance + access sources from GenAI spans. |
| oap-server/analyzer/agent-analyzer/src/main/java/org/apache/skywalking/oap/server/analyzer/provider/trace/parser/listener/VirtualServiceAnalysisListener.java | Registers VirtualGenAIProcessor in vservice pipeline. |
| oap-server/analyzer/agent-analyzer/pom.xml | Adds dependency on gen-ai-analyzer module. |
| docs/menu.yml | Adds docs navigation for Virtual GenAI. |
| docs/en/setup/service-agent/virtual-genai.md | Adds documentation for Virtual GenAI span contract and config. |
| docs/en/changes/changes.md | Updates CHANGES log with Virtual-GenAI support entry. |
| apm-dist/src/main/assembly/binary.xml | Ships gen-ai-config.yml in binary distribution. |
| .github/workflows/skywalking.yaml | Adds Virtual GenAI E2E job to CI matrix. |
Comments suppressed due to low confidence (3)
oap-server/server-starter/src/main/resources/ui-initialized-templates/virtual_genai/virtual-genai-provider.json:1
- This widget is “Average Cost” (
gen_ai_provider_avg_cost), but themetricConfiglabel/unit are for latency (ms). This will render misleading UI. Update the label/unit to reflect cost (and align units with howtotalCostis actually stored, especially if fixing the cost-per-million computation).
test/e2e-v2/java-test-service/e2e-service-provider/src/main/java/org/apache/skywalking/e2e/controller/LLMMockController.java:1 - The streamed JSON is constructed via string concatenation without JSON escaping (
chunkis embedded inside quotes). IffullContentever includes"/\/ control characters, the mock will emit invalid JSON and can create flaky E2E behavior. Prefer constructing thedeltaJSON via a JSON library (or at least escapechunk) to ensure valid SSE payloads.
oap-server/server-starter/src/main/resources/gen-ai-config.yml:1 - Corrected grammar in the comment ('a example' -> 'an example') and add a space after
#for readability.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...lyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/service/GenAIMeterAnalyzer.java
Outdated
Show resolved
Hide resolved
...er/oal-rt/src/test/java/org/apache/skywalking/oal/v2/generator/RuntimeOALGenerationTest.java
Outdated
Show resolved
Hide resolved
...lyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/service/GenAIMeterAnalyzer.java
Outdated
Show resolved
Hide resolved
|
Also, please check copilot comments. |
Documentation & Config Review (issue #20 follow-up)The doc ( 1. YAML key mismatch between doc and actual config/codeThe doc example uses: input-estimated-cost-per-m: 2.5
output-estimated-cost-per-m: 10But the actual config file ( input-cost-per-m: 1
output-cost-per-m: 1(Java fields: The doc example should match the real keys: 2. No pricing added to the shipped config
Since cost estimation is a headline feature of this PR, it would be much more useful to ship with well-known model prices pre-configured (at least for major providers like openai, anthropic). Otherwise users get 3. Markdown formatting issueLine 29-30: Should be: 4. Typo in Requirement section
5. Ollama prefix-match (per wu-sheng's earlier comment)Ollama is a runtime, not a model prefix. The |
Will update the doc and config |
|
BTW, I should be able to include this in the 10.4(release soon). After this gets merged, let's prepare a blog to introduce this. |
| prefix-match: | ||
| - llama-3 | ||
| models: | ||
| - name: llama-4-scout-17bx16e-128k |
There was a problem hiding this comment.
This doesn't follow llama-3 prefix, right?
|
|
||
| private long outputTokens; | ||
|
|
||
| private double totalCost; |
There was a problem hiding this comment.
SumMetrics function you used is for Long. Please align with the data type.
There was a problem hiding this comment.
And you should comment this more accurately, this is 10^6 amplified.
| - name: hunyuan-turbos | ||
| input-estimated-cost-per-m: 0.12 | ||
| output-estimated-cost-per-m: 0.29 | ||
| - name: hunyuan-t1 | ||
| input-estimated-cost-per-m: 0.15 | ||
| output-estimated-cost-per-m: 0.58 | ||
| - name: hunyuan-turbos | ||
| input-estimated-cost-per-m: 0.12 | ||
| output-estimated-cost-per-m: 0.29 |
| - provider: ollama | ||
| prefix-match: |
There was a problem hiding this comment.
No match?
Ollama model names are typically user-defined, so I've left this section blank.
| } | ||
|
|
||
| @Override | ||
| public GenAIMetrics extractMetricsFromSWSpan(SpanObject span, SegmentObject segment) { |
There was a problem hiding this comment.
Let's prepare a UT for this. As I noticed, the config file seems to have something missed, we should verify it is processed correctly.
There was a problem hiding this comment.
This UT should include loading files, matching rules, and estimated cost.
|
|
||
| public class GenAIMeterAnalyzer implements IGenAIMeterAnalyzerService { | ||
|
|
||
| private static final Logger LOG = LoggerFactory.getLogger(GenAIMeterAnalyzer.class); |
|
|
||
| package org.apache.skywalking.oap.analyzer.genai.config; | ||
|
|
||
| public class GenAITagKey { |
There was a problem hiding this comment.
| public class GenAITagKey { | |
| public class GenAITagKeys { |


CHANGESlog.