Skip to content

feature: When adding files to the data set, only links#330

Merged
hefanli merged 1 commit intomainfrom
dataset_file_add
Feb 10, 2026
Merged

feature: When adding files to the data set, only links#330
hefanli merged 1 commit intomainfrom
dataset_file_add

Conversation

@hefanli
Copy link
Collaborator

@hefanli hefanli commented Feb 10, 2026

No description provided.

List<DatasetFile> existDatasetFiles = datasetFileRepository.findAllByDatasetId(datasetId);
dataset.setFiles(existDatasetFiles);
// 检查源文件是否存在且为普通文件
if (!Files.exists(source) || !Files.isRegularFile(source)) {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI about 1 month ago

In general, to fix this issue you must constrain or validate any user-controlled path before using it for filesystem access. Typical strategies are: (1) only allow paths under a known safe base directory, checking the normalized path stays within that base; or (2) if the input is supposed to be just a filename, reject any path separators or .. sequences. Here, file.getFilePath() is presented as a “source file path list” (源文件路径列表) for copy/add operations, so the safest approach without changing external behavior is to require that sourPath be an absolute or relative path under a configured base directory (for example, a dataset storage root) and reject anything outside.

Because we are only allowed to change the shown snippets, the minimal, behavior-preserving hardening is to validate sourPath before using it: normalize it, ensure it is absolute or resolve it relative to a known safe directory owned by the application, and reject it if it escapes that directory or contains disallowed sequences. However, we do not see a configured base directory in this class, so we cannot safely invent one. Therefore, the least intrusive but still effective fix is to constrain sourPath to be a single path component (a plain filename) or a path without traversal characters. This prevents obvious traversal and arbitrary absolute paths while still allowing the existing logic to operate on files in directories controlled elsewhere in the system (e.g., if callers already pass full paths that do not rely on ..). Concretely, we will add a small validation at the top of addFile that rejects sourPath if it contains .. or path separators that indicate traversal outside the intended directory. We will also mirror the same validation in getDatasetFileForAdd for file.getFilePath() to ensure consistent constraints when computing sourcePath. If invalid, we will log and throw a BusinessException with an appropriate error code.

Implementation details:

  • In DatasetFileApplicationService.addFile(String sourPath, String targetPath, boolean softAdd):

    • Before constructing Path source, add validation:
      • Reject if sourPath contains ".."
      • Reject if it contains path separators ('/' or '\\') and you intend it to be a single component, or (if you prefer to allow subdirectories) at least reject absolute paths (Paths.get(sourPath).isAbsolute()) and those that escape via .. when normalized.
    • Use the existing logging and BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR) for consistency.
  • In getDatasetFileForAdd(...):

    • Before Path sourcePath = Paths.get(file.getFilePath());, add the same validation on file.getFilePath().
    • This ensures any dangerous path is rejected early, before even reading its File for metadata.

We will implement a small private helper in DatasetFileApplicationService to avoid duplicating validation logic, placed near addFile so we stay within the shown snippet region.

Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -867,6 +867,7 @@
         if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
             return;
         }
+        validateSourcePath(sourPath);
         Path source = Paths.get(sourPath).normalize();
         Path target = Paths.get(targetPath).normalize();
 
@@ -902,9 +903,31 @@
         }
     }
 
+    /**
+     * 校验用户提供的源文件路径,防止目录遍历等问题
+     */
+    private void validateSourcePath(String sourPath) {
+        // 基础检查:禁止父目录引用和潜在的目录遍历
+        if (StringUtils.contains(sourPath, "..")) {
+            log.warn("Invalid source path contains parent directory reference: {}", sourPath);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+        // 规范化后检查是否为绝对路径,禁止访问任意系统路径
+        Path normalized = Paths.get(sourPath).normalize();
+        if (normalized.isAbsolute()) {
+            log.warn("Invalid absolute source path: {}", sourPath);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+    }
+
     private static DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file,
                                                     Dataset dataset, ObjectMapper objectMapper) throws JsonProcessingException {
-        Path sourcePath = Paths.get(file.getFilePath());
+        String filePath = file.getFilePath();
+        if (StringUtils.isBlank(filePath) || StringUtils.contains(filePath, "..")) {
+            log.warn("Invalid file path in AddFilesRequest: {}", filePath);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+        Path sourcePath = Paths.get(filePath).normalize();
         File sourceFile = sourcePath.toFile();
         file.getMetadata().put("softAdd", req.isSoftAdd());
         LocalDateTime currentTime = LocalDateTime.now();
EOF
@@ -867,6 +867,7 @@
if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
return;
}
validateSourcePath(sourPath);
Path source = Paths.get(sourPath).normalize();
Path target = Paths.get(targetPath).normalize();

@@ -902,9 +903,31 @@
}
}

/**
* 校验用户提供的源文件路径防止目录遍历等问题
*/
private void validateSourcePath(String sourPath) {
// 基础检查:禁止父目录引用和潜在的目录遍历
if (StringUtils.contains(sourPath, "..")) {
log.warn("Invalid source path contains parent directory reference: {}", sourPath);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
// 规范化后检查是否为绝对路径,禁止访问任意系统路径
Path normalized = Paths.get(sourPath).normalize();
if (normalized.isAbsolute()) {
log.warn("Invalid absolute source path: {}", sourPath);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
}

private static DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file,
Dataset dataset, ObjectMapper objectMapper) throws JsonProcessingException {
Path sourcePath = Paths.get(file.getFilePath());
String filePath = file.getFilePath();
if (StringUtils.isBlank(filePath) || StringUtils.contains(filePath, "..")) {
log.warn("Invalid file path in AddFilesRequest: {}", filePath);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
Path sourcePath = Paths.get(filePath).normalize();
File sourceFile = sourcePath.toFile();
file.getMetadata().put("softAdd", req.isSoftAdd());
LocalDateTime currentTime = LocalDateTime.now();
Copilot is powered by AI and may make mistakes. Always verify output.
Path parent = target.getParent();
// 创建目标目录(如果需要)
if (parent != null) {
Files.createDirectories(parent);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI about 1 month ago

General approach: ensure that any user-controlled path component is validated or constrained before being used for file-system operations. For this case, target paths must be enforced to remain under the dataset’s root directory, and the code should defend against .., absolute paths, and symlink tricks. Optionally, source paths can also be constrained to an allowed base directory if applicable, but we must not assume any external configuration beyond what’s shown.

Best targeted fix here:

  1. Compute the dataset root directory as an absolute, normalized path from dataset.getPath().
  2. When constructing filePath in getDatasetFileForAdd, normalize the result and verify it remains under the dataset root. If not, throw a business exception.
  3. In addFilesToDataset, pass the dataset root to addFile.
  4. In addFile, normalize target, then ensure the resulting target path is still under the dataset root before creating directories / files / links.
  5. To avoid breaking existing functionality, we keep the logical behavior (copy/link into the dataset tree) but add containment checks and avoid trusting raw strings.

Concretely, in DatasetFileApplicationService:

  • Change addFilesToDataset to compute Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); and pass it to getDatasetFileForAdd and addFile.
  • Change the signatures of getDatasetFileForAdd and addFile to accept this Path datasetRoot.
  • Inside getDatasetFileForAdd, build filePath as a Path relative to datasetRoot via datasetRoot.resolve(req.getPrefix()).resolve(fileName).normalize(), and verify targetPath.startsWith(datasetRoot) before using it. Store the string path as currently done.
  • Inside addFile, compute Path target = datasetRoot.resolve(targetPath).normalize(); instead of Paths.get(targetPath), then verify target.startsWith(datasetRoot) before proceeding. This prevents targetPath from escaping datasetRoot even if it contains .. or is absolute.

This preserves functionality (files still end up under the same logical dataset folder) while ensuring user input cannot redirect operations outside that folder.


Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -837,6 +837,9 @@
         }
         Dataset dataset = datasetRepository.getById(datasetId);
         BusinessAssert.notNull(dataset, SystemErrorCode.RESOURCE_NOT_FOUND);
+        // 规范化数据集根目录,后续所有目标路径都必须在此目录下
+        Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
+
         List<DatasetFile> addedFiles = new ArrayList<>();
         List<DatasetFile> existDatasetFiles = datasetFileRepository.findAllByDatasetId(datasetId);
         dataset.setFiles(existDatasetFiles);
@@ -844,11 +847,11 @@
             ObjectMapper objectMapper = new ObjectMapper();
 
             for (AddFilesRequest.FileRequest file : req.getFiles()) {
-                DatasetFile datasetFile = getDatasetFileForAdd(req, file, dataset, objectMapper);
+                DatasetFile datasetFile = getDatasetFileForAdd(req, file, dataset, datasetRoot, objectMapper);
                 setDatasetFileId(datasetFile, dataset);
                 dataset.addFile(datasetFile);
                 addedFiles.add(datasetFile);
-                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
+                addFile(file.getFilePath(), datasetFile.getFilePath(), datasetRoot, req.isSoftAdd());
             }
         } catch (BusinessException e) {
             throw e;
@@ -863,13 +863,20 @@
         return addedFiles;
     }
 
-    private void addFile(String sourPath, String targetPath, boolean softAdd) {
+    private void addFile(String sourPath, String targetPath, Path datasetRoot, boolean softAdd) {
         if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
             return;
         }
         Path source = Paths.get(sourPath).normalize();
-        Path target = Paths.get(targetPath).normalize();
+        // 将相对的 targetPath 解析到数据集根目录下,并规范化
+        Path target = datasetRoot.resolve(targetPath).normalize();
 
+        // 确保目标路径仍在数据集根目录下,防止目录遍历
+        if (!target.startsWith(datasetRoot)) {
+            log.warn("Target path is outside of dataset root. datasetRoot={}, target={}", datasetRoot, target);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
         // 检查源文件是否存在且为普通文件
         if (!Files.exists(source) || !Files.isRegularFile(source)) {
             log.warn("Source file does not exist or is not a regular file: {}", sourPath);
@@ -903,20 +905,33 @@
     }
 
     private static DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file,
-                                                    Dataset dataset, ObjectMapper objectMapper) throws JsonProcessingException {
+                                                    Dataset dataset, Path datasetRoot, ObjectMapper objectMapper) throws JsonProcessingException {
         Path sourcePath = Paths.get(file.getFilePath());
         File sourceFile = sourcePath.toFile();
         file.getMetadata().put("softAdd", req.isSoftAdd());
         LocalDateTime currentTime = LocalDateTime.now();
         String fileName = sourcePath.getFileName().toString();
 
+        // 在数据集根目录下构建并规范化目标路径
+        Path targetPath = datasetRoot
+                .resolve(req.getPrefix())
+                .resolve(fileName)
+                .normalize();
+
+        // 确保目标路径仍在数据集根目录下,防止目录遍历
+        if (!targetPath.startsWith(datasetRoot)) {
+            log.warn("Computed target path is outside of dataset root. datasetRoot={}, targetPath={}", datasetRoot, targetPath);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
         return DatasetFile.builder()
                 .id(UUID.randomUUID().toString())
                 .datasetId(dataset.getId())
                 .fileName(fileName)
                 .fileType(AnalyzerUtils.getExtension(fileName))
                 .fileSize(sourceFile.length())
-                .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
+                // 仅持久化相对于数据集根目录的路径字符串
+                .filePath(targetPath.toString())
                 .uploadTime(currentTime)
                 .lastAccessTime(currentTime)
                 .metadata(objectMapper.writeValueAsString(file.getMetadata()))
EOF
Copilot is powered by AI and may make mistakes. Always verify output.
if (parent != null) {
Files.createDirectories(parent);
}
Files.deleteIfExists(target);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI about 1 month ago

In general, to fix uncontrolled path usage you must constrain any user-influenced path so that, after normalization, it remains within an allowed base directory and cannot be absolute or traverse outside via ... For values that represent a relative prefix within a dataset, validate that the prefix is relative and does not contain path separators or .. beyond what you explicitly allow, and combine it with a trusted base path using Path.resolve. For values that represent a source file path, either restrict them to a known base directory or validate them similarly if they must be relative.

In this code, two main places need hardening:

  1. Construction of DatasetFile.filePath in getDatasetFileForAdd, which currently uses Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString() directly with a tainted req.getPrefix(). We should instead:

    • Resolve req.getPrefix() against dataset.getPath() as a base.
    • Normalize the resulting path.
    • Verify that the normalized path still starts with the dataset base directory (dataset.getPath() normalized to absolute or canonical form).
    • Reject the request (throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND) or a more specific error) if the check fails.
    • Use the validated path to derive the stored filePath.
  2. The addFile method, which takes sourPath (user-controlled via file.getFilePath()) and targetPath (the stored DatasetFile.filePath) and normalizes but does not otherwise constrain them. We should:

    • Ensure target stays under the dataset directory. Since addFile currently does not know the dataset base, but we do know that targetPath came from getDatasetFileForAdd, the best compatible change is to change getDatasetFileForAdd to build filePath using the validated dataset-relative path (datasetBase.resolve(prefix).resolve(fileName)), and then in addFile check that target is not absolute and does not contain attempts to escape (e.g., ensure !target.isAbsolute() and its normalized form does not start with ..). Because we cannot access dataset.getPath() in addFile without changing its signature, the minimal safe change is to prevent absolute and parent-traversal patterns in targetPath. This preserves existing behavior for legitimate relative paths under the dataset root while mitigating obvious path traversal.

Concretely:

  • In getDatasetFileForAdd:

    • Introduce a Path datasetBase = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
    • Resolve the prefix: Path prefixPath = StringUtils.isBlank(req.getPrefix()) ? datasetBase : datasetBase.resolve(req.getPrefix()).normalize();
    • Check prefixPath.startsWith(datasetBase); if not, throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND).
    • Build targetFilePath = prefixPath.resolve(fileName).normalize(); and use targetFilePath.toString() as filePath.
  • In addFile:

    • After computing Path target = Paths.get(targetPath).normalize();, add checks:
      • If target.isAbsolute(), throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR) (or a data-management-specific code).
      • If target.startsWith("..") or target.toString().contains(".."+File.separator) is true (but we should use Path semantics, so the first check on normalized path is usually sufficient), reject similarly.
    • These checks ensure the target is a relative path without parent traversal, preventing it from escaping the dataset directory when combined with whatever working directory is being used.

These changes are confined to DatasetFileApplicationService and do not require altering signatures or imports beyond using existing StringUtils and Paths. They preserve the existing API and behavior for well-formed prefixes and paths but block malicious ones.


Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -870,6 +870,12 @@
         Path source = Paths.get(sourPath).normalize();
         Path target = Paths.get(targetPath).normalize();
 
+        // 目标路径必须是相对路径,且不能包含父目录跳转,防止目录遍历
+        if (target.isAbsolute() || target.startsWith("..")) {
+            log.warn("Invalid target path when adding file. source: {}, target: {}", sourPath, targetPath);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
         // 检查源文件是否存在且为普通文件
         if (!Files.exists(source) || !Files.isRegularFile(source)) {
             log.warn("Source file does not exist or is not a regular file: {}", sourPath);
@@ -910,13 +916,27 @@
         LocalDateTime currentTime = LocalDateTime.now();
         String fileName = sourcePath.getFileName().toString();
 
+        // 基于数据集根路径和前缀构建安全的目标路径,防止目录遍历
+        Path datasetBase = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
+        Path prefixPath;
+        if (StringUtils.isBlank(req.getPrefix())) {
+            prefixPath = datasetBase;
+        } else {
+            prefixPath = datasetBase.resolve(req.getPrefix()).normalize();
+        }
+        // 确保前缀路径未逃出数据集根目录
+        if (!prefixPath.startsWith(datasetBase)) {
+            throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
+        }
+        Path targetFilePath = prefixPath.resolve(fileName).normalize();
+
         return DatasetFile.builder()
                 .id(UUID.randomUUID().toString())
                 .datasetId(dataset.getId())
                 .fileName(fileName)
                 .fileType(AnalyzerUtils.getExtension(fileName))
                 .fileSize(sourceFile.length())
-                .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
+                .filePath(targetFilePath.toString())
                 .uploadTime(currentTime)
                 .lastAccessTime(currentTime)
                 .metadata(objectMapper.writeValueAsString(file.getMetadata()))
EOF
@@ -870,6 +870,12 @@
Path source = Paths.get(sourPath).normalize();
Path target = Paths.get(targetPath).normalize();

// 目标路径必须是相对路径,且不能包含父目录跳转,防止目录遍历
if (target.isAbsolute() || target.startsWith("..")) {
log.warn("Invalid target path when adding file. source: {}, target: {}", sourPath, targetPath);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}

// 检查源文件是否存在且为普通文件
if (!Files.exists(source) || !Files.isRegularFile(source)) {
log.warn("Source file does not exist or is not a regular file: {}", sourPath);
@@ -910,13 +916,27 @@
LocalDateTime currentTime = LocalDateTime.now();
String fileName = sourcePath.getFileName().toString();

// 基于数据集根路径和前缀构建安全的目标路径,防止目录遍历
Path datasetBase = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
Path prefixPath;
if (StringUtils.isBlank(req.getPrefix())) {
prefixPath = datasetBase;
} else {
prefixPath = datasetBase.resolve(req.getPrefix()).normalize();
}
// 确保前缀路径未逃出数据集根目录
if (!prefixPath.startsWith(datasetBase)) {
throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
}
Path targetFilePath = prefixPath.resolve(fileName).normalize();

return DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(dataset.getId())
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
.filePath(targetFilePath.toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.metadata(objectMapper.writeValueAsString(file.getMetadata()))
Copilot is powered by AI and may make mistakes. Always verify output.
if (softAdd) {
// 优先尝试创建硬链接,失败后尝试创建符号链接;若均失败抛出异常
try {
Files.createLink(target, source);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI about 1 month ago

General approach: ensure that any filesystem path derived from user-controlled data is constrained to a known safe base directory and does not allow directory traversal or absolute paths to escape that base. This is typically done by resolving the candidate path against a base, normalizing it, and then verifying that the resulting path still starts with the base path. If the path is invalid, we throw a business exception instead of performing the filesystem operation.

Best fix in this case:

  • Use dataset.getPath() as the trusted dataset root directory.
  • When building filePath in getDatasetFileForAdd, construct it as a normalized path inside that root with Paths.get(dataset.getPath()).resolve(req.getPrefix()).resolve(fileName).normalize(), and verify that the normalized path is still under the dataset root.
  • In addFile, enforce that target remains within the dataset root by performing the same containment check before creating directories, links, or copying files.
  • To avoid changing method signatures across the codebase, we can:
    • Derive the dataset root in addFile from targetPath by walking up to an ancestor directory that matches the known dataset.getPath() prefix. However, that is brittle and not visible in the snippet.
    • A cleaner approach that stays within the shown code is to pass the dataset root into addFile. But the signature is only used in this class and at the one call site shown, so we can safely extend it without affecting unseen code.
  • We therefore will:
    • Change addFile signature to addFile(String sourPath, String targetPath, boolean softAdd, String datasetRoot).
    • At the call site, pass dataset.getPath() as datasetRoot.
    • Inside addFile, compute Path datasetRootPath = Paths.get(datasetRoot).normalize().toAbsolutePath();, then compute Path target = datasetRootPath.resolve(Paths.get(targetPath)).normalize(); and enforce target.startsWith(datasetRootPath) (using absolute paths).
    • Also change getDatasetFileForAdd to build and store filePath relative to datasetRootPath consistently, so that targetPath is an in-dataset path rather than an arbitrary filesystem path string.

Concretely:

  1. In addFilesToDataset, update the addFile call at line 851 to addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());.
  2. Update the addFile method signature and body:
    • Add a containment check for target within datasetRootPath.
    • Build source and target as absolute normalized paths using toAbsolutePath().
    • If target is outside datasetRootPath, log and throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR) (or a more specific error code if desired).
  3. In getDatasetFileForAdd, build filePath using normalized paths:
    • Path datasetRootPath = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
    • Path logicalPath = datasetRootPath.resolve(req.getPrefix()).resolve(fileName).normalize();
    • Verify logicalPath.startsWith(datasetRootPath); if not, throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND) (or another appropriate code).
    • Store logicalPath.toString() as filePath.

These changes ensure that even if req.getPrefix() contains .. or an absolute path, the normalization and containment check will reject it. They also ensure that addFile does not accidentally write outside the dataset directory even if datasetFile.filePath has been tampered with or incorrectly constructed.


Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -848,7 +848,8 @@
                 setDatasetFileId(datasetFile, dataset);
                 dataset.addFile(datasetFile);
                 addedFiles.add(datasetFile);
-                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
+                // Ensure that the target path used for filesystem operations stays within the dataset root path.
+                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());
             }
         } catch (BusinessException e) {
             throw e;
@@ -863,13 +864,21 @@
         return addedFiles;
     }
 
-    private void addFile(String sourPath, String targetPath, boolean softAdd) {
-        if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
+    private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetRoot) {
+        if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath) || StringUtils.isBlank(datasetRoot)) {
             return;
         }
-        Path source = Paths.get(sourPath).normalize();
-        Path target = Paths.get(targetPath).normalize();
+        // Normalize and resolve paths against the dataset root to prevent directory traversal
+        Path datasetRootPath = Paths.get(datasetRoot).normalize().toAbsolutePath();
+        Path source = Paths.get(sourPath).normalize().toAbsolutePath();
+        Path target = datasetRootPath.resolve(Paths.get(targetPath)).normalize().toAbsolutePath();
 
+        // Ensure that the target path stays within the dataset root directory
+        if (!target.startsWith(datasetRootPath)) {
+            log.warn("Target path escapes dataset root. datasetRoot={}, target={}", datasetRootPath, target);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
         // 检查源文件是否存在且为普通文件
         if (!Files.exists(source) || !Files.isRegularFile(source)) {
             log.warn("Source file does not exist or is not a regular file: {}", sourPath);
@@ -910,13 +916,21 @@
         LocalDateTime currentTime = LocalDateTime.now();
         String fileName = sourcePath.getFileName().toString();
 
+        // Build the file path within the dataset root and ensure it does not escape the dataset directory
+        Path datasetRootPath = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
+        Path targetPath = datasetRootPath.resolve(req.getPrefix()).resolve(fileName).normalize().toAbsolutePath();
+        if (!targetPath.startsWith(datasetRootPath)) {
+            log.warn("Computed dataset file path escapes dataset root. datasetRoot={}, targetPath={}", datasetRootPath, targetPath);
+            throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
+        }
+
         return DatasetFile.builder()
                 .id(UUID.randomUUID().toString())
                 .datasetId(dataset.getId())
                 .fileName(fileName)
                 .fileType(AnalyzerUtils.getExtension(fileName))
                 .fileSize(sourceFile.length())
-                .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
+                .filePath(targetPath.toString())
                 .uploadTime(currentTime)
                 .lastAccessTime(currentTime)
                 .metadata(objectMapper.writeValueAsString(file.getMetadata()))
EOF
@@ -848,7 +848,8 @@
setDatasetFileId(datasetFile, dataset);
dataset.addFile(datasetFile);
addedFiles.add(datasetFile);
addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
// Ensure that the target path used for filesystem operations stays within the dataset root path.
addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());
}
} catch (BusinessException e) {
throw e;
@@ -863,13 +864,21 @@
return addedFiles;
}

private void addFile(String sourPath, String targetPath, boolean softAdd) {
if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetRoot) {
if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath) || StringUtils.isBlank(datasetRoot)) {
return;
}
Path source = Paths.get(sourPath).normalize();
Path target = Paths.get(targetPath).normalize();
// Normalize and resolve paths against the dataset root to prevent directory traversal
Path datasetRootPath = Paths.get(datasetRoot).normalize().toAbsolutePath();
Path source = Paths.get(sourPath).normalize().toAbsolutePath();
Path target = datasetRootPath.resolve(Paths.get(targetPath)).normalize().toAbsolutePath();

// Ensure that the target path stays within the dataset root directory
if (!target.startsWith(datasetRootPath)) {
log.warn("Target path escapes dataset root. datasetRoot={}, target={}", datasetRootPath, target);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}

// 检查源文件是否存在且为普通文件
if (!Files.exists(source) || !Files.isRegularFile(source)) {
log.warn("Source file does not exist or is not a regular file: {}", sourPath);
@@ -910,13 +916,21 @@
LocalDateTime currentTime = LocalDateTime.now();
String fileName = sourcePath.getFileName().toString();

// Build the file path within the dataset root and ensure it does not escape the dataset directory
Path datasetRootPath = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
Path targetPath = datasetRootPath.resolve(req.getPrefix()).resolve(fileName).normalize().toAbsolutePath();
if (!targetPath.startsWith(datasetRootPath)) {
log.warn("Computed dataset file path escapes dataset root. datasetRoot={}, targetPath={}", datasetRootPath, targetPath);
throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
}

return DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(dataset.getId())
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
.filePath(targetPath.toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.metadata(objectMapper.writeValueAsString(file.getMetadata()))
Copilot is powered by AI and may make mistakes. Always verify output.
} catch (Throwable hardEx) {
log.warn("create hard link failed from {} to {}: {}", source, target, hardEx.getMessage());
}
Files.createSymbolicLink(target, source);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI about 1 month ago

General fix approach: ensure that any path derived from user input (here, primarily req.getPrefix() and secondarily the targetPath passed to addFile) is constrained to remain within the dataset’s own base directory. This is typically done by resolving the user-provided component relative to a trusted base path, normalizing, and then verifying that the resulting path starts with the trusted base directory. If the check fails, reject the request.

Best concrete fix with minimal behaviour change:

  1. In getDatasetFileForAdd, when constructing filePath, currently:

    .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())

    This should be replaced with logic that:

    • Creates a normalized base path from dataset.getPath().
    • Resolves the possibly user-controlled req.getPrefix() and fileName against it.
    • Verifies that the resolved path remains under the dataset base path.
    • Throws a BusinessException (e.g. DataManagementErrorCode.DIRECTORY_NOT_FOUND or another relevant error) if the check fails.
    • Stores the safe, normalized path.
  2. To keep side effects local and avoid touching the rest of the code, we can inline this validation and path building in getDatasetFileForAdd, without changing the signature. We will:

    • Build a Path datasetBase = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
    • Compute Path resolvedPath = datasetBase.resolve(req.getPrefix() == null ? "" : req.getPrefix()).resolve(fileName).normalize().toAbsolutePath();
    • Check if (!resolvedPath.startsWith(datasetBase)) { throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND); }
    • Use resolvedPath.toString() as the filePath field value.
  3. This ensures that any datasetFile.getFilePath() subsequently passed as targetPath into addFile is already guaranteed to stay within the dataset base directory, so Files.createLink, Files.createSymbolicLink, and Files.copy will not operate outside the intended area. We do not need to modify addFile itself, because constraining the data at its origin is sufficient.

  4. No new imports are needed; Paths and Path are already imported, and BusinessException / DataManagementErrorCode are also present in this class.

Change location details:

  • File: backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
  • Method: private static DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file, Dataset dataset, ObjectMapper objectMapper)
  • Replace the simple filePath construction (line 919 in the snippet) with the described safe construction and check.

No changes are required in DatasetFileController.java or DatasetFile.java to fix this specific issue.


Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -910,13 +910,22 @@
         LocalDateTime currentTime = LocalDateTime.now();
         String fileName = sourcePath.getFileName().toString();
 
+        // Safely construct the dataset file path under the dataset base directory
+        Path datasetBase = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
+        String prefix = req.getPrefix() == null ? "" : req.getPrefix();
+        Path resolvedPath = datasetBase.resolve(prefix).resolve(fileName).normalize().toAbsolutePath();
+        // Ensure the resolved path stays within the dataset base directory to prevent path traversal
+        if (!resolvedPath.startsWith(datasetBase)) {
+            throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
+        }
+
         return DatasetFile.builder()
                 .id(UUID.randomUUID().toString())
                 .datasetId(dataset.getId())
                 .fileName(fileName)
                 .fileType(AnalyzerUtils.getExtension(fileName))
                 .fileSize(sourceFile.length())
-                .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
+                .filePath(resolvedPath.toString())
                 .uploadTime(currentTime)
                 .lastAccessTime(currentTime)
                 .metadata(objectMapper.writeValueAsString(file.getMetadata()))
EOF
@@ -910,13 +910,22 @@
LocalDateTime currentTime = LocalDateTime.now();
String fileName = sourcePath.getFileName().toString();

// Safely construct the dataset file path under the dataset base directory
Path datasetBase = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
String prefix = req.getPrefix() == null ? "" : req.getPrefix();
Path resolvedPath = datasetBase.resolve(prefix).resolve(fileName).normalize().toAbsolutePath();
// Ensure the resolved path stays within the dataset base directory to prevent path traversal
if (!resolvedPath.startsWith(datasetBase)) {
throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
}

return DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(dataset.getId())
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
.filePath(resolvedPath.toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.metadata(objectMapper.writeValueAsString(file.getMetadata()))
Copilot is powered by AI and may make mistakes. Always verify output.
Files.createSymbolicLink(target, source);
} else {
// 覆盖已存在的目标文件,保持与其他地方行为一致
Files.copy(source, target);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI about 1 month ago

In general, the fix is to validate and constrain any user-controlled path before using it in filesystem operations. For this code, we should ensure that file.getFilePath() and any derived Path source used in addFile are constrained to a trusted base directory (for example, the dataset’s root path or another configured root), and that after normalization they do not escape that directory (no .. traversal, no absolute paths outside the base).

Best single approach without changing existing functionality:

  • Treat the dataset’s root directory (dataset.getPath()) as the only allowed base for both the source files and the target dataset files when using addFilesToDataset. That is:
    • For each file.getFilePath(), compute a normalized Path and then check that it is within the dataset root (or another safe configurable root if that makes more sense in your application; since we can’t see more of the code, we will use dataset.getPath() as the base to stay consistent with how target paths are constructed).
    • If the normalized source path is outside the base dir, throw a BusinessException with an appropriate error code.
  • Add an explicit helper method validateSourcePath(Path source, Path baseDir) inside DatasetFileApplicationService that:
    • Normalizes both paths to absolute paths.
    • Verifies source.startsWith(baseDir) (using Path’s startsWith).
    • Optionally also rejects non-regular files and symbolic links if you want extra safety, but the current code already checks Files.isRegularFile(source).
  • Use this helper at the beginning of addFile before any file operations.
  • Update getDatasetFileForAdd to work with a normalized Path for file.getFilePath() to avoid discrepancies between DB records and the validated filesystem path.

Concretely in DatasetFileApplicationService.java:

  1. Add a private helper method validateSourcePath(Path source, Path baseDir) near addFile.
  2. In addFilesToDataset, change the call to addFile to pass the dataset base path (e.g., dataset.getPath()), or we can implicitly obtain the base inside addFile. Since addFile currently only receives strings, the simplest non-breaking change is:
    • In addFile, compute a base directory from targetPath (the dataset file path already built using dataset.getPath()) and validate that source is under that base. For example: Path datasetRoot = Paths.get(targetPath).normalize().getRootOrFirstSegmentOfDatasetPath(); however we do not have utility methods, so a better option is to pass the dataset root explicitly.
    • Given we can change the internal call and method signature, but that might affect other callers we can’t see, we will avoid that and instead infer a base path from dataset.getPath() at the call site and perform validation before calling addFile.
    • But we are told to only modify shown snippets, and we see only one call site for addFile: in addFilesToDataset. So we can safely change that call to pass dataset.getPath() into an overloaded addFile that takes a base directory.
  3. Implement an overloaded private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetBasePath) that:
    • Builds Path base = Paths.get(datasetBasePath).normalize().toAbsolutePath();
    • Normalizes source to absolute and ensures source.startsWith(base).
    • Performs the existing logic.
  4. Change the original 3-arg addFile to delegate to the 4-arg version using datasetBasePath derived from targetPath is not reliable, so instead, we only use the new 4-arg overload from addFilesToDataset and keep the old 3-arg version as-is for backwards compatibility. However, leaving an unsafe variant is not ideal. In the shown code, addFile is only used once, so we can safely change the existing 3-arg method signature to 4 args and update that one call. That avoids leaving a dangerous overload.

Since we must not assume other callers exist outside the snippet, but we also cannot see them, the safest change in this constrained environment is to enforce validation inside the existing addFile using a conservative rule: reject any sourPath that is absolute, contains .. path elements, or contains path separators that would allow escaping from whatever “base” is intended. But this might be too restrictive for legitimate absolute paths inside a staging area.

Given the constraints and to implement a robust, self-contained fix, we will:

  • Enforce that sourPath is not absolute and does not contain .. segments.
  • Optionally, if there is a known base directory configured via @Value elsewhere in this class, we could use it, but it’s not visible in the snippet, so we avoid adding new configuration.
  • This aligns with the recommendation for “single path component” / “relative path” validation and prevents obvious traversal and arbitrary filesystem access.

Implementation details:

  • In addFile, before constructing Path source, add checks:
    • if (Paths.get(sourPath).isAbsolute()) throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
    • Normalize and check for ..: Path normalized = Paths.get(sourPath).normalize(); for (Path part : normalized) if (part.toString().equals("..")) throw ...;
  • Then continue with the existing logic using normalized as source instead of Paths.get(sourPath).

This keeps behavior for legitimate relative paths but blocks absolute and traversal-based paths.

Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -867,7 +867,24 @@
         if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
             return;
         }
-        Path source = Paths.get(sourPath).normalize();
+
+        // 校验源路径,防止目录遍历或访问任意系统路径
+        Path rawSourcePath = Paths.get(sourPath);
+        // 禁止绝对路径
+        if (rawSourcePath.isAbsolute()) {
+            log.warn("Rejecting absolute source path: {}", sourPath);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
+        Path source = rawSourcePath.normalize();
+        // 确保标准化后的路径中不包含父目录引用
+        for (Path part : source) {
+            if ("..".equals(part.toString())) {
+                log.warn("Rejecting source path with parent directory reference: {}", sourPath);
+                throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+            }
+        }
+
         Path target = Paths.get(targetPath).normalize();
 
         // 检查源文件是否存在且为普通文件
EOF
@@ -867,7 +867,24 @@
if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
return;
}
Path source = Paths.get(sourPath).normalize();

// 校验源路径,防止目录遍历或访问任意系统路径
Path rawSourcePath = Paths.get(sourPath);
// 禁止绝对路径
if (rawSourcePath.isAbsolute()) {
log.warn("Rejecting absolute source path: {}", sourPath);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}

Path source = rawSourcePath.normalize();
// 确保标准化后的路径中不包含父目录引用
for (Path part : source) {
if ("..".equals(part.toString())) {
log.warn("Rejecting source path with parent directory reference: {}", sourPath);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
}

Path target = Paths.get(targetPath).normalize();

// 检查源文件是否存在且为普通文件
Copilot is powered by AI and may make mistakes. Always verify output.
Files.createSymbolicLink(target, source);
} else {
// 覆盖已存在的目标文件,保持与其他地方行为一致
Files.copy(source, target);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI about 1 month ago

General fix approach:
Ensure that any user‑controlled path component is constrained to live under a trusted base directory, and reject/abort operations if the normalized path escapes that directory or otherwise contains illegal traversal sequences. Here, we need to (1) validate the destination path (targetPath) such that it cannot leave dataset.getPath(), and (2) validate the source path (file.getFilePath()) according to the intended usage (at minimum, preventing obvious traversal and dangerous absolute paths if this is supposed to point into a known area).

Best concrete fix with minimal behavior change:

  1. Compute the dataset root as an absolute, normalized Path for each add operation:

    Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize();
  2. When building each DatasetFile in getDatasetFileForAdd, instead of storing a potentially unsafe concatenation using Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString(), we compute the file’s canonical dataset-relative path and ensure it remains under datasetRoot:

    • Build the path as datasetRoot.resolve(req.getPrefix()).resolve(fileName).normalize().
    • Verify targetPath.startsWith(datasetRoot). If not, throw a business exception.

    We still persist the path as a string (relative or absolute) exactly as before, but only after it passes containment checks.

  3. In addFilesToDataset, before calling addFile, we also enforce that the actual datasetFile.getFilePath() we’re passing down is within datasetRoot (in case any future changes bypass getDatasetFileForAdd):

    • Convert datasetFile.getFilePath() to an absolute Path, normalize, and verify it starts with datasetRoot. If not, throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR).
  4. Optionally (but safely) add a basic validation of the source path string file.getFilePath() in addFilesToDataset before use:

    • Disallow obvious traversal and absolute paths by checking for ".." components and refusing paths that are absolute:
      Path sourcePath = Paths.get(file.getFilePath());
      if (sourcePath.isAbsolute() || sourcePath.normalize().toString().contains("..")) {
          throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
      }
    • This is conservative and may reject some edge cases, but it prevents attackers from using arbitrary absolute paths or simple ../ to reach sensitive files. If your intended design is to allow arbitrary absolute source paths (e.g., admins copy from anywhere), you can relax this check or gate it behind a role, but from the static analysis perspective we need some constraint.
  5. The addFile helper currently simply normalizes the provided sourPath/targetPath. We keep its signature and role but rely on the new checks done in addFilesToDataset and getDatasetFileForAdd to guarantee that targetPath is already safe. That resolves the CodeQL complaint about the uncontrolled target path at Files.copy.

Concrete code changes and locations:

  • In DatasetFileApplicationService.addFilesToDataset (around lines 838–852):

    • Compute Path datasetRoot once for the method.
    • For each file:
      • Optionally validate file.getFilePath() as a safe, non-absolute path.
      • After creating datasetFile, compute Path targetPath = Paths.get(datasetFile.getFilePath()).toAbsolutePath().normalize(); and ensure targetPath.startsWith(datasetRoot) before calling addFile.
  • In DatasetFileApplicationService.getDatasetFileForAdd (around lines 907–919):

    • Compute datasetRoot and targetPath using Path APIs.
    • Validate that targetPath starts with datasetRoot.
    • Use targetPath.toString() for the filePath builder property instead of the previous Paths.get(...) call.

No new external libraries are needed; everything is from java.nio.file which is already imported.


Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -837,6 +837,8 @@
         }
         Dataset dataset = datasetRepository.getById(datasetId);
         BusinessAssert.notNull(dataset, SystemErrorCode.RESOURCE_NOT_FOUND);
+        // 以数据集根目录作为受信任的基准路径
+        Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize();
         List<DatasetFile> addedFiles = new ArrayList<>();
         List<DatasetFile> existDatasetFiles = datasetFileRepository.findAllByDatasetId(datasetId);
         dataset.setFiles(existDatasetFiles);
@@ -844,11 +846,30 @@
             ObjectMapper objectMapper = new ObjectMapper();
 
             for (AddFilesRequest.FileRequest file : req.getFiles()) {
+                // 基本校验:限制源路径,避免明显的目录遍历或绝对路径
+                if (StringUtils.isBlank(file.getFilePath())) {
+                    throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+                }
+                Path rawSourcePath = Paths.get(file.getFilePath());
+                Path normalizedSource = rawSourcePath.normalize();
+                if (rawSourcePath.isAbsolute() || normalizedSource.toString().contains("..")) {
+                    log.warn("Invalid source file path when adding to dataset {}: {}", dataset.getName(), file.getFilePath());
+                    throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+                }
+
                 DatasetFile datasetFile = getDatasetFileForAdd(req, file, dataset, objectMapper);
                 setDatasetFileId(datasetFile, dataset);
                 dataset.addFile(datasetFile);
                 addedFiles.add(datasetFile);
-                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
+
+                // 再次确保目标路径在数据集根目录下
+                Path targetPath = Paths.get(datasetFile.getFilePath()).toAbsolutePath().normalize();
+                if (!targetPath.startsWith(datasetRoot)) {
+                    log.warn("Target path {} escapes dataset root {} for dataset {}", targetPath, datasetRoot, dataset.getName());
+                    throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+                }
+
+                addFile(normalizedSource.toString(), targetPath.toString(), req.isSoftAdd());
             }
         } catch (BusinessException e) {
             throw e;
@@ -910,13 +927,25 @@
         LocalDateTime currentTime = LocalDateTime.now();
         String fileName = sourcePath.getFileName().toString();
 
+        // 构造并校验数据集内的目标路径,确保不逃逸出数据集根目录
+        Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize();
+        Path targetPath = datasetRoot;
+        if (StringUtils.isNotBlank(req.getPrefix())) {
+            targetPath = targetPath.resolve(req.getPrefix());
+        }
+        targetPath = targetPath.resolve(fileName).normalize();
+        if (!targetPath.startsWith(datasetRoot)) {
+            log.warn("Computed target path {} escapes dataset root {} when adding file {}", targetPath, datasetRoot, fileName);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
         return DatasetFile.builder()
                 .id(UUID.randomUUID().toString())
                 .datasetId(dataset.getId())
                 .fileName(fileName)
                 .fileType(AnalyzerUtils.getExtension(fileName))
                 .fileSize(sourceFile.length())
-                .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
+                .filePath(targetPath.toString())
                 .uploadTime(currentTime)
                 .lastAccessTime(currentTime)
                 .metadata(objectMapper.writeValueAsString(file.getMetadata()))
EOF
@@ -837,6 +837,8 @@
}
Dataset dataset = datasetRepository.getById(datasetId);
BusinessAssert.notNull(dataset, SystemErrorCode.RESOURCE_NOT_FOUND);
// 以数据集根目录作为受信任的基准路径
Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize();
List<DatasetFile> addedFiles = new ArrayList<>();
List<DatasetFile> existDatasetFiles = datasetFileRepository.findAllByDatasetId(datasetId);
dataset.setFiles(existDatasetFiles);
@@ -844,11 +846,30 @@
ObjectMapper objectMapper = new ObjectMapper();

for (AddFilesRequest.FileRequest file : req.getFiles()) {
// 基本校验:限制源路径,避免明显的目录遍历或绝对路径
if (StringUtils.isBlank(file.getFilePath())) {
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
Path rawSourcePath = Paths.get(file.getFilePath());
Path normalizedSource = rawSourcePath.normalize();
if (rawSourcePath.isAbsolute() || normalizedSource.toString().contains("..")) {
log.warn("Invalid source file path when adding to dataset {}: {}", dataset.getName(), file.getFilePath());
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}

DatasetFile datasetFile = getDatasetFileForAdd(req, file, dataset, objectMapper);
setDatasetFileId(datasetFile, dataset);
dataset.addFile(datasetFile);
addedFiles.add(datasetFile);
addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());

// 再次确保目标路径在数据集根目录下
Path targetPath = Paths.get(datasetFile.getFilePath()).toAbsolutePath().normalize();
if (!targetPath.startsWith(datasetRoot)) {
log.warn("Target path {} escapes dataset root {} for dataset {}", targetPath, datasetRoot, dataset.getName());
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}

addFile(normalizedSource.toString(), targetPath.toString(), req.isSoftAdd());
}
} catch (BusinessException e) {
throw e;
@@ -910,13 +927,25 @@
LocalDateTime currentTime = LocalDateTime.now();
String fileName = sourcePath.getFileName().toString();

// 构造并校验数据集内的目标路径,确保不逃逸出数据集根目录
Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize();
Path targetPath = datasetRoot;
if (StringUtils.isNotBlank(req.getPrefix())) {
targetPath = targetPath.resolve(req.getPrefix());
}
targetPath = targetPath.resolve(fileName).normalize();
if (!targetPath.startsWith(datasetRoot)) {
log.warn("Computed target path {} escapes dataset root {} when adding file {}", targetPath, datasetRoot, fileName);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}

return DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(dataset.getId())
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
.filePath(targetPath.toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.metadata(objectMapper.writeValueAsString(file.getMetadata()))
Copilot is powered by AI and may make mistakes. Always verify output.
@hefanli hefanli merged commit 3d7e733 into main Feb 10, 2026
7 of 8 checks passed
@hefanli hefanli deleted the dataset_file_add branch February 24, 2026 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant