feature: When adding files to the data set, only links#330
Conversation
… without copying.
| List<DatasetFile> existDatasetFiles = datasetFileRepository.findAllByDatasetId(datasetId); | ||
| dataset.setFiles(existDatasetFiles); | ||
| // 检查源文件是否存在且为普通文件 | ||
| if (!Files.exists(source) || !Files.isRegularFile(source)) { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, to fix this issue you must constrain or validate any user-controlled path before using it for filesystem access. Typical strategies are: (1) only allow paths under a known safe base directory, checking the normalized path stays within that base; or (2) if the input is supposed to be just a filename, reject any path separators or .. sequences. Here, file.getFilePath() is presented as a “source file path list” (源文件路径列表) for copy/add operations, so the safest approach without changing external behavior is to require that sourPath be an absolute or relative path under a configured base directory (for example, a dataset storage root) and reject anything outside.
Because we are only allowed to change the shown snippets, the minimal, behavior-preserving hardening is to validate sourPath before using it: normalize it, ensure it is absolute or resolve it relative to a known safe directory owned by the application, and reject it if it escapes that directory or contains disallowed sequences. However, we do not see a configured base directory in this class, so we cannot safely invent one. Therefore, the least intrusive but still effective fix is to constrain sourPath to be a single path component (a plain filename) or a path without traversal characters. This prevents obvious traversal and arbitrary absolute paths while still allowing the existing logic to operate on files in directories controlled elsewhere in the system (e.g., if callers already pass full paths that do not rely on ..). Concretely, we will add a small validation at the top of addFile that rejects sourPath if it contains .. or path separators that indicate traversal outside the intended directory. We will also mirror the same validation in getDatasetFileForAdd for file.getFilePath() to ensure consistent constraints when computing sourcePath. If invalid, we will log and throw a BusinessException with an appropriate error code.
Implementation details:
-
In
DatasetFileApplicationService.addFile(String sourPath, String targetPath, boolean softAdd):- Before constructing
Path source, add validation:- Reject if
sourPathcontains".." - Reject if it contains path separators (
'/'or'\\') and you intend it to be a single component, or (if you prefer to allow subdirectories) at least reject absolute paths (Paths.get(sourPath).isAbsolute()) and those that escape via..when normalized.
- Reject if
- Use the existing logging and
BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR)for consistency.
- Before constructing
-
In
getDatasetFileForAdd(...):- Before
Path sourcePath = Paths.get(file.getFilePath());, add the same validation onfile.getFilePath(). - This ensures any dangerous path is rejected early, before even reading its
Filefor metadata.
- Before
We will implement a small private helper in DatasetFileApplicationService to avoid duplicating validation logic, placed near addFile so we stay within the shown snippet region.
| @@ -867,6 +867,7 @@ | ||
| if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) { | ||
| return; | ||
| } | ||
| validateSourcePath(sourPath); | ||
| Path source = Paths.get(sourPath).normalize(); | ||
| Path target = Paths.get(targetPath).normalize(); | ||
|
|
||
| @@ -902,9 +903,31 @@ | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * 校验用户提供的源文件路径,防止目录遍历等问题 | ||
| */ | ||
| private void validateSourcePath(String sourPath) { | ||
| // 基础检查:禁止父目录引用和潜在的目录遍历 | ||
| if (StringUtils.contains(sourPath, "..")) { | ||
| log.warn("Invalid source path contains parent directory reference: {}", sourPath); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
| // 规范化后检查是否为绝对路径,禁止访问任意系统路径 | ||
| Path normalized = Paths.get(sourPath).normalize(); | ||
| if (normalized.isAbsolute()) { | ||
| log.warn("Invalid absolute source path: {}", sourPath); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
| } | ||
|
|
||
| private static DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file, | ||
| Dataset dataset, ObjectMapper objectMapper) throws JsonProcessingException { | ||
| Path sourcePath = Paths.get(file.getFilePath()); | ||
| String filePath = file.getFilePath(); | ||
| if (StringUtils.isBlank(filePath) || StringUtils.contains(filePath, "..")) { | ||
| log.warn("Invalid file path in AddFilesRequest: {}", filePath); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
| Path sourcePath = Paths.get(filePath).normalize(); | ||
| File sourceFile = sourcePath.toFile(); | ||
| file.getMetadata().put("softAdd", req.isSoftAdd()); | ||
| LocalDateTime currentTime = LocalDateTime.now(); |
| Path parent = target.getParent(); | ||
| // 创建目标目录(如果需要) | ||
| if (parent != null) { | ||
| Files.createDirectories(parent); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
General approach: ensure that any user-controlled path component is validated or constrained before being used for file-system operations. For this case, target paths must be enforced to remain under the dataset’s root directory, and the code should defend against .., absolute paths, and symlink tricks. Optionally, source paths can also be constrained to an allowed base directory if applicable, but we must not assume any external configuration beyond what’s shown.
Best targeted fix here:
- Compute the dataset root directory as an absolute, normalized path from
dataset.getPath(). - When constructing
filePathingetDatasetFileForAdd, normalize the result and verify it remains under the dataset root. If not, throw a business exception. - In
addFilesToDataset, pass the dataset root toaddFile. - In
addFile, normalizetarget, then ensure the resultingtargetpath is still under the dataset root before creating directories / files / links. - To avoid breaking existing functionality, we keep the logical behavior (copy/link into the dataset tree) but add containment checks and avoid trusting raw strings.
Concretely, in DatasetFileApplicationService:
- Change
addFilesToDatasetto computePath datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();and pass it togetDatasetFileForAddandaddFile. - Change the signatures of
getDatasetFileForAddandaddFileto accept thisPath datasetRoot. - Inside
getDatasetFileForAdd, buildfilePathas aPathrelative todatasetRootviadatasetRoot.resolve(req.getPrefix()).resolve(fileName).normalize(), and verifytargetPath.startsWith(datasetRoot)before using it. Store the string path as currently done. - Inside
addFile, computePath target = datasetRoot.resolve(targetPath).normalize();instead ofPaths.get(targetPath), then verifytarget.startsWith(datasetRoot)before proceeding. This preventstargetPathfrom escapingdatasetRooteven if it contains..or is absolute.
This preserves functionality (files still end up under the same logical dataset folder) while ensuring user input cannot redirect operations outside that folder.
| if (parent != null) { | ||
| Files.createDirectories(parent); | ||
| } | ||
| Files.deleteIfExists(target); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, to fix uncontrolled path usage you must constrain any user-influenced path so that, after normalization, it remains within an allowed base directory and cannot be absolute or traverse outside via ... For values that represent a relative prefix within a dataset, validate that the prefix is relative and does not contain path separators or .. beyond what you explicitly allow, and combine it with a trusted base path using Path.resolve. For values that represent a source file path, either restrict them to a known base directory or validate them similarly if they must be relative.
In this code, two main places need hardening:
-
Construction of
DatasetFile.filePathingetDatasetFileForAdd, which currently usesPaths.get(dataset.getPath(), req.getPrefix(), fileName).toString()directly with a taintedreq.getPrefix(). We should instead:- Resolve
req.getPrefix()againstdataset.getPath()as a base. - Normalize the resulting path.
- Verify that the normalized path still starts with the dataset base directory (
dataset.getPath()normalized to absolute or canonical form). - Reject the request (throw
BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND)or a more specific error) if the check fails. - Use the validated path to derive the stored
filePath.
- Resolve
-
The
addFilemethod, which takessourPath(user-controlled viafile.getFilePath()) andtargetPath(the storedDatasetFile.filePath) and normalizes but does not otherwise constrain them. We should:- Ensure
targetstays under the dataset directory. SinceaddFilecurrently does not know the dataset base, but we do know thattargetPathcame fromgetDatasetFileForAdd, the best compatible change is to changegetDatasetFileForAddto buildfilePathusing the validated dataset-relative path (datasetBase.resolve(prefix).resolve(fileName)), and then inaddFilecheck thattargetis not absolute and does not contain attempts to escape (e.g., ensure!target.isAbsolute()and its normalized form does not start with..). Because we cannot accessdataset.getPath()inaddFilewithout changing its signature, the minimal safe change is to prevent absolute and parent-traversal patterns intargetPath. This preserves existing behavior for legitimate relative paths under the dataset root while mitigating obvious path traversal.
- Ensure
Concretely:
-
In
getDatasetFileForAdd:- Introduce a
Path datasetBase = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); - Resolve the prefix:
Path prefixPath = StringUtils.isBlank(req.getPrefix()) ? datasetBase : datasetBase.resolve(req.getPrefix()).normalize(); - Check
prefixPath.startsWith(datasetBase); if not, throwBusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND). - Build
targetFilePath = prefixPath.resolve(fileName).normalize();and usetargetFilePath.toString()asfilePath.
- Introduce a
-
In
addFile:- After computing
Path target = Paths.get(targetPath).normalize();, add checks:- If
target.isAbsolute(), throwBusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR)(or a data-management-specific code). - If
target.startsWith("..")ortarget.toString().contains(".."+File.separator)is true (but we should usePathsemantics, so the first check on normalized path is usually sufficient), reject similarly.
- If
- These checks ensure the target is a relative path without parent traversal, preventing it from escaping the dataset directory when combined with whatever working directory is being used.
- After computing
These changes are confined to DatasetFileApplicationService and do not require altering signatures or imports beyond using existing StringUtils and Paths. They preserve the existing API and behavior for well-formed prefixes and paths but block malicious ones.
| @@ -870,6 +870,12 @@ | ||
| Path source = Paths.get(sourPath).normalize(); | ||
| Path target = Paths.get(targetPath).normalize(); | ||
|
|
||
| // 目标路径必须是相对路径,且不能包含父目录跳转,防止目录遍历 | ||
| if (target.isAbsolute() || target.startsWith("..")) { | ||
| log.warn("Invalid target path when adding file. source: {}, target: {}", sourPath, targetPath); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
|
|
||
| // 检查源文件是否存在且为普通文件 | ||
| if (!Files.exists(source) || !Files.isRegularFile(source)) { | ||
| log.warn("Source file does not exist or is not a regular file: {}", sourPath); | ||
| @@ -910,13 +916,27 @@ | ||
| LocalDateTime currentTime = LocalDateTime.now(); | ||
| String fileName = sourcePath.getFileName().toString(); | ||
|
|
||
| // 基于数据集根路径和前缀构建安全的目标路径,防止目录遍历 | ||
| Path datasetBase = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); | ||
| Path prefixPath; | ||
| if (StringUtils.isBlank(req.getPrefix())) { | ||
| prefixPath = datasetBase; | ||
| } else { | ||
| prefixPath = datasetBase.resolve(req.getPrefix()).normalize(); | ||
| } | ||
| // 确保前缀路径未逃出数据集根目录 | ||
| if (!prefixPath.startsWith(datasetBase)) { | ||
| throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND); | ||
| } | ||
| Path targetFilePath = prefixPath.resolve(fileName).normalize(); | ||
|
|
||
| return DatasetFile.builder() | ||
| .id(UUID.randomUUID().toString()) | ||
| .datasetId(dataset.getId()) | ||
| .fileName(fileName) | ||
| .fileType(AnalyzerUtils.getExtension(fileName)) | ||
| .fileSize(sourceFile.length()) | ||
| .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString()) | ||
| .filePath(targetFilePath.toString()) | ||
| .uploadTime(currentTime) | ||
| .lastAccessTime(currentTime) | ||
| .metadata(objectMapper.writeValueAsString(file.getMetadata())) |
| if (softAdd) { | ||
| // 优先尝试创建硬链接,失败后尝试创建符号链接;若均失败抛出异常 | ||
| try { | ||
| Files.createLink(target, source); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
General approach: ensure that any filesystem path derived from user-controlled data is constrained to a known safe base directory and does not allow directory traversal or absolute paths to escape that base. This is typically done by resolving the candidate path against a base, normalizing it, and then verifying that the resulting path still starts with the base path. If the path is invalid, we throw a business exception instead of performing the filesystem operation.
Best fix in this case:
- Use
dataset.getPath()as the trusted dataset root directory. - When building
filePathingetDatasetFileForAdd, construct it as a normalized path inside that root withPaths.get(dataset.getPath()).resolve(req.getPrefix()).resolve(fileName).normalize(), and verify that the normalized path is still under the dataset root. - In
addFile, enforce thattargetremains within the dataset root by performing the same containment check before creating directories, links, or copying files. - To avoid changing method signatures across the codebase, we can:
- Derive the dataset root in
addFilefromtargetPathby walking up to an ancestor directory that matches the knowndataset.getPath()prefix. However, that is brittle and not visible in the snippet. - A cleaner approach that stays within the shown code is to pass the dataset root into
addFile. But the signature is only used in this class and at the one call site shown, so we can safely extend it without affecting unseen code.
- Derive the dataset root in
- We therefore will:
- Change
addFilesignature toaddFile(String sourPath, String targetPath, boolean softAdd, String datasetRoot). - At the call site, pass
dataset.getPath()asdatasetRoot. - Inside
addFile, computePath datasetRootPath = Paths.get(datasetRoot).normalize().toAbsolutePath();, then computePath target = datasetRootPath.resolve(Paths.get(targetPath)).normalize();and enforcetarget.startsWith(datasetRootPath)(using absolute paths). - Also change
getDatasetFileForAddto build and storefilePathrelative todatasetRootPathconsistently, so thattargetPathis an in-dataset path rather than an arbitrary filesystem path string.
- Change
Concretely:
- In
addFilesToDataset, update theaddFilecall at line 851 toaddFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());. - Update the
addFilemethod signature and body:- Add a containment check for
targetwithindatasetRootPath. - Build
sourceandtargetas absolute normalized paths usingtoAbsolutePath(). - If
targetis outsidedatasetRootPath, log and throwBusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR)(or a more specific error code if desired).
- Add a containment check for
- In
getDatasetFileForAdd, buildfilePathusing normalized paths:Path datasetRootPath = Paths.get(dataset.getPath()).normalize().toAbsolutePath();Path logicalPath = datasetRootPath.resolve(req.getPrefix()).resolve(fileName).normalize();- Verify
logicalPath.startsWith(datasetRootPath); if not, throwBusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND)(or another appropriate code). - Store
logicalPath.toString()asfilePath.
These changes ensure that even if req.getPrefix() contains .. or an absolute path, the normalization and containment check will reject it. They also ensure that addFile does not accidentally write outside the dataset directory even if datasetFile.filePath has been tampered with or incorrectly constructed.
| @@ -848,7 +848,8 @@ | ||
| setDatasetFileId(datasetFile, dataset); | ||
| dataset.addFile(datasetFile); | ||
| addedFiles.add(datasetFile); | ||
| addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd()); | ||
| // Ensure that the target path used for filesystem operations stays within the dataset root path. | ||
| addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath()); | ||
| } | ||
| } catch (BusinessException e) { | ||
| throw e; | ||
| @@ -863,13 +864,21 @@ | ||
| return addedFiles; | ||
| } | ||
|
|
||
| private void addFile(String sourPath, String targetPath, boolean softAdd) { | ||
| if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) { | ||
| private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetRoot) { | ||
| if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath) || StringUtils.isBlank(datasetRoot)) { | ||
| return; | ||
| } | ||
| Path source = Paths.get(sourPath).normalize(); | ||
| Path target = Paths.get(targetPath).normalize(); | ||
| // Normalize and resolve paths against the dataset root to prevent directory traversal | ||
| Path datasetRootPath = Paths.get(datasetRoot).normalize().toAbsolutePath(); | ||
| Path source = Paths.get(sourPath).normalize().toAbsolutePath(); | ||
| Path target = datasetRootPath.resolve(Paths.get(targetPath)).normalize().toAbsolutePath(); | ||
|
|
||
| // Ensure that the target path stays within the dataset root directory | ||
| if (!target.startsWith(datasetRootPath)) { | ||
| log.warn("Target path escapes dataset root. datasetRoot={}, target={}", datasetRootPath, target); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
|
|
||
| // 检查源文件是否存在且为普通文件 | ||
| if (!Files.exists(source) || !Files.isRegularFile(source)) { | ||
| log.warn("Source file does not exist or is not a regular file: {}", sourPath); | ||
| @@ -910,13 +916,21 @@ | ||
| LocalDateTime currentTime = LocalDateTime.now(); | ||
| String fileName = sourcePath.getFileName().toString(); | ||
|
|
||
| // Build the file path within the dataset root and ensure it does not escape the dataset directory | ||
| Path datasetRootPath = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); | ||
| Path targetPath = datasetRootPath.resolve(req.getPrefix()).resolve(fileName).normalize().toAbsolutePath(); | ||
| if (!targetPath.startsWith(datasetRootPath)) { | ||
| log.warn("Computed dataset file path escapes dataset root. datasetRoot={}, targetPath={}", datasetRootPath, targetPath); | ||
| throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND); | ||
| } | ||
|
|
||
| return DatasetFile.builder() | ||
| .id(UUID.randomUUID().toString()) | ||
| .datasetId(dataset.getId()) | ||
| .fileName(fileName) | ||
| .fileType(AnalyzerUtils.getExtension(fileName)) | ||
| .fileSize(sourceFile.length()) | ||
| .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString()) | ||
| .filePath(targetPath.toString()) | ||
| .uploadTime(currentTime) | ||
| .lastAccessTime(currentTime) | ||
| .metadata(objectMapper.writeValueAsString(file.getMetadata())) |
| } catch (Throwable hardEx) { | ||
| log.warn("create hard link failed from {} to {}: {}", source, target, hardEx.getMessage()); | ||
| } | ||
| Files.createSymbolicLink(target, source); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
General fix approach: ensure that any path derived from user input (here, primarily req.getPrefix() and secondarily the targetPath passed to addFile) is constrained to remain within the dataset’s own base directory. This is typically done by resolving the user-provided component relative to a trusted base path, normalizing, and then verifying that the resulting path starts with the trusted base directory. If the check fails, reject the request.
Best concrete fix with minimal behaviour change:
-
In
getDatasetFileForAdd, when constructingfilePath, currently:.filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
This should be replaced with logic that:
- Creates a normalized base path from
dataset.getPath(). - Resolves the possibly user-controlled
req.getPrefix()andfileNameagainst it. - Verifies that the resolved path remains under the dataset base path.
- Throws a
BusinessException(e.g.DataManagementErrorCode.DIRECTORY_NOT_FOUNDor another relevant error) if the check fails. - Stores the safe, normalized path.
- Creates a normalized base path from
-
To keep side effects local and avoid touching the rest of the code, we can inline this validation and path building in
getDatasetFileForAdd, without changing the signature. We will:- Build a
Path datasetBase = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); - Compute
Path resolvedPath = datasetBase.resolve(req.getPrefix() == null ? "" : req.getPrefix()).resolve(fileName).normalize().toAbsolutePath(); - Check
if (!resolvedPath.startsWith(datasetBase)) { throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND); } - Use
resolvedPath.toString()as thefilePathfield value.
- Build a
-
This ensures that any
datasetFile.getFilePath()subsequently passed astargetPathintoaddFileis already guaranteed to stay within the dataset base directory, soFiles.createLink,Files.createSymbolicLink, andFiles.copywill not operate outside the intended area. We do not need to modifyaddFileitself, because constraining the data at its origin is sufficient. -
No new imports are needed;
PathsandPathare already imported, andBusinessException/DataManagementErrorCodeare also present in this class.
Change location details:
- File:
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java - Method:
private static DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file, Dataset dataset, ObjectMapper objectMapper) - Replace the simple
filePathconstruction (line 919 in the snippet) with the described safe construction and check.
No changes are required in DatasetFileController.java or DatasetFile.java to fix this specific issue.
| @@ -910,13 +910,22 @@ | ||
| LocalDateTime currentTime = LocalDateTime.now(); | ||
| String fileName = sourcePath.getFileName().toString(); | ||
|
|
||
| // Safely construct the dataset file path under the dataset base directory | ||
| Path datasetBase = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); | ||
| String prefix = req.getPrefix() == null ? "" : req.getPrefix(); | ||
| Path resolvedPath = datasetBase.resolve(prefix).resolve(fileName).normalize().toAbsolutePath(); | ||
| // Ensure the resolved path stays within the dataset base directory to prevent path traversal | ||
| if (!resolvedPath.startsWith(datasetBase)) { | ||
| throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND); | ||
| } | ||
|
|
||
| return DatasetFile.builder() | ||
| .id(UUID.randomUUID().toString()) | ||
| .datasetId(dataset.getId()) | ||
| .fileName(fileName) | ||
| .fileType(AnalyzerUtils.getExtension(fileName)) | ||
| .fileSize(sourceFile.length()) | ||
| .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString()) | ||
| .filePath(resolvedPath.toString()) | ||
| .uploadTime(currentTime) | ||
| .lastAccessTime(currentTime) | ||
| .metadata(objectMapper.writeValueAsString(file.getMetadata())) |
| Files.createSymbolicLink(target, source); | ||
| } else { | ||
| // 覆盖已存在的目标文件,保持与其他地方行为一致 | ||
| Files.copy(source, target); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, the fix is to validate and constrain any user-controlled path before using it in filesystem operations. For this code, we should ensure that file.getFilePath() and any derived Path source used in addFile are constrained to a trusted base directory (for example, the dataset’s root path or another configured root), and that after normalization they do not escape that directory (no .. traversal, no absolute paths outside the base).
Best single approach without changing existing functionality:
- Treat the dataset’s root directory (
dataset.getPath()) as the only allowed base for both the source files and the target dataset files when usingaddFilesToDataset. That is:- For each
file.getFilePath(), compute a normalizedPathand then check that it is within the dataset root (or another safe configurable root if that makes more sense in your application; since we can’t see more of the code, we will usedataset.getPath()as the base to stay consistent with how target paths are constructed). - If the normalized
sourcepath is outside the base dir, throw aBusinessExceptionwith an appropriate error code.
- For each
- Add an explicit helper method
validateSourcePath(Path source, Path baseDir)insideDatasetFileApplicationServicethat:- Normalizes both paths to absolute paths.
- Verifies
source.startsWith(baseDir)(usingPath’sstartsWith). - Optionally also rejects non-regular files and symbolic links if you want extra safety, but the current code already checks
Files.isRegularFile(source).
- Use this helper at the beginning of
addFilebefore any file operations. - Update
getDatasetFileForAddto work with a normalizedPathforfile.getFilePath()to avoid discrepancies between DB records and the validated filesystem path.
Concretely in DatasetFileApplicationService.java:
- Add a private helper method
validateSourcePath(Path source, Path baseDir)nearaddFile. - In
addFilesToDataset, change the call toaddFileto pass the dataset base path (e.g.,dataset.getPath()), or we can implicitly obtain the base insideaddFile. SinceaddFilecurrently only receives strings, the simplest non-breaking change is:- In
addFile, compute a base directory fromtargetPath(the dataset file path already built usingdataset.getPath()) and validate thatsourceis under that base. For example:Path datasetRoot = Paths.get(targetPath).normalize().getRootOrFirstSegmentOfDatasetPath(); however we do not have utility methods, so a better option is to pass the dataset root explicitly. - Given we can change the internal call and method signature, but that might affect other callers we can’t see, we will avoid that and instead infer a base path from
dataset.getPath()at the call site and perform validation before callingaddFile. - But we are told to only modify shown snippets, and we see only one call site for
addFile: inaddFilesToDataset. So we can safely change that call to passdataset.getPath()into an overloadedaddFilethat takes a base directory.
- In
- Implement an overloaded
private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetBasePath)that:- Builds
Path base = Paths.get(datasetBasePath).normalize().toAbsolutePath(); - Normalizes
sourceto absolute and ensuressource.startsWith(base). - Performs the existing logic.
- Builds
- Change the original 3-arg
addFileto delegate to the 4-arg version usingdatasetBasePathderived fromtargetPathis not reliable, so instead, we only use the new 4-arg overload fromaddFilesToDatasetand keep the old 3-arg version as-is for backwards compatibility. However, leaving an unsafe variant is not ideal. In the shown code,addFileis only used once, so we can safely change the existing 3-arg method signature to 4 args and update that one call. That avoids leaving a dangerous overload.
Since we must not assume other callers exist outside the snippet, but we also cannot see them, the safest change in this constrained environment is to enforce validation inside the existing addFile using a conservative rule: reject any sourPath that is absolute, contains .. path elements, or contains path separators that would allow escaping from whatever “base” is intended. But this might be too restrictive for legitimate absolute paths inside a staging area.
Given the constraints and to implement a robust, self-contained fix, we will:
- Enforce that
sourPathis not absolute and does not contain..segments. - Optionally, if there is a known base directory configured via
@Valueelsewhere in this class, we could use it, but it’s not visible in the snippet, so we avoid adding new configuration. - This aligns with the recommendation for “single path component” / “relative path” validation and prevents obvious traversal and arbitrary filesystem access.
Implementation details:
- In
addFile, before constructingPath source, add checks:if (Paths.get(sourPath).isAbsolute()) throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);- Normalize and check for
..:Path normalized = Paths.get(sourPath).normalize(); for (Path part : normalized) if (part.toString().equals("..")) throw ...;
- Then continue with the existing logic using
normalizedassourceinstead ofPaths.get(sourPath).
This keeps behavior for legitimate relative paths but blocks absolute and traversal-based paths.
| @@ -867,7 +867,24 @@ | ||
| if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) { | ||
| return; | ||
| } | ||
| Path source = Paths.get(sourPath).normalize(); | ||
|
|
||
| // 校验源路径,防止目录遍历或访问任意系统路径 | ||
| Path rawSourcePath = Paths.get(sourPath); | ||
| // 禁止绝对路径 | ||
| if (rawSourcePath.isAbsolute()) { | ||
| log.warn("Rejecting absolute source path: {}", sourPath); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
|
|
||
| Path source = rawSourcePath.normalize(); | ||
| // 确保标准化后的路径中不包含父目录引用 | ||
| for (Path part : source) { | ||
| if ("..".equals(part.toString())) { | ||
| log.warn("Rejecting source path with parent directory reference: {}", sourPath); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
| } | ||
|
|
||
| Path target = Paths.get(targetPath).normalize(); | ||
|
|
||
| // 检查源文件是否存在且为普通文件 |
| Files.createSymbolicLink(target, source); | ||
| } else { | ||
| // 覆盖已存在的目标文件,保持与其他地方行为一致 | ||
| Files.copy(source, target); |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
General fix approach:
Ensure that any user‑controlled path component is constrained to live under a trusted base directory, and reject/abort operations if the normalized path escapes that directory or otherwise contains illegal traversal sequences. Here, we need to (1) validate the destination path (targetPath) such that it cannot leave dataset.getPath(), and (2) validate the source path (file.getFilePath()) according to the intended usage (at minimum, preventing obvious traversal and dangerous absolute paths if this is supposed to point into a known area).
Best concrete fix with minimal behavior change:
-
Compute the dataset root as an absolute, normalized
Pathfor each add operation:Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize();
-
When building each
DatasetFileingetDatasetFileForAdd, instead of storing a potentially unsafe concatenation usingPaths.get(dataset.getPath(), req.getPrefix(), fileName).toString(), we compute the file’s canonical dataset-relative path and ensure it remains underdatasetRoot:- Build the path as
datasetRoot.resolve(req.getPrefix()).resolve(fileName).normalize(). - Verify
targetPath.startsWith(datasetRoot). If not, throw a business exception.
We still persist the path as a string (relative or absolute) exactly as before, but only after it passes containment checks.
- Build the path as
-
In
addFilesToDataset, before callingaddFile, we also enforce that the actualdatasetFile.getFilePath()we’re passing down is withindatasetRoot(in case any future changes bypassgetDatasetFileForAdd):- Convert
datasetFile.getFilePath()to an absolutePath, normalize, and verify it starts withdatasetRoot. If not, throwBusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR).
- Convert
-
Optionally (but safely) add a basic validation of the source path string
file.getFilePath()inaddFilesToDatasetbefore use:- Disallow obvious traversal and absolute paths by checking for
".."components and refusing paths that are absolute:Path sourcePath = Paths.get(file.getFilePath()); if (sourcePath.isAbsolute() || sourcePath.normalize().toString().contains("..")) { throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); }
- This is conservative and may reject some edge cases, but it prevents attackers from using arbitrary absolute paths or simple
../to reach sensitive files. If your intended design is to allow arbitrary absolute source paths (e.g., admins copy from anywhere), you can relax this check or gate it behind a role, but from the static analysis perspective we need some constraint.
- Disallow obvious traversal and absolute paths by checking for
-
The
addFilehelper currently simply normalizes the providedsourPath/targetPath. We keep its signature and role but rely on the new checks done inaddFilesToDatasetandgetDatasetFileForAddto guarantee thattargetPathis already safe. That resolves the CodeQL complaint about the uncontrolledtargetpath atFiles.copy.
Concrete code changes and locations:
-
In
DatasetFileApplicationService.addFilesToDataset(around lines 838–852):- Compute
Path datasetRootonce for the method. - For each
file:- Optionally validate
file.getFilePath()as a safe, non-absolute path. - After creating
datasetFile, computePath targetPath = Paths.get(datasetFile.getFilePath()).toAbsolutePath().normalize();and ensuretargetPath.startsWith(datasetRoot)before callingaddFile.
- Optionally validate
- Compute
-
In
DatasetFileApplicationService.getDatasetFileForAdd(around lines 907–919):- Compute
datasetRootandtargetPathusingPathAPIs. - Validate that
targetPathstarts withdatasetRoot. - Use
targetPath.toString()for thefilePathbuilder property instead of the previousPaths.get(...)call.
- Compute
No new external libraries are needed; everything is from java.nio.file which is already imported.
| @@ -837,6 +837,8 @@ | ||
| } | ||
| Dataset dataset = datasetRepository.getById(datasetId); | ||
| BusinessAssert.notNull(dataset, SystemErrorCode.RESOURCE_NOT_FOUND); | ||
| // 以数据集根目录作为受信任的基准路径 | ||
| Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize(); | ||
| List<DatasetFile> addedFiles = new ArrayList<>(); | ||
| List<DatasetFile> existDatasetFiles = datasetFileRepository.findAllByDatasetId(datasetId); | ||
| dataset.setFiles(existDatasetFiles); | ||
| @@ -844,11 +846,30 @@ | ||
| ObjectMapper objectMapper = new ObjectMapper(); | ||
|
|
||
| for (AddFilesRequest.FileRequest file : req.getFiles()) { | ||
| // 基本校验:限制源路径,避免明显的目录遍历或绝对路径 | ||
| if (StringUtils.isBlank(file.getFilePath())) { | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
| Path rawSourcePath = Paths.get(file.getFilePath()); | ||
| Path normalizedSource = rawSourcePath.normalize(); | ||
| if (rawSourcePath.isAbsolute() || normalizedSource.toString().contains("..")) { | ||
| log.warn("Invalid source file path when adding to dataset {}: {}", dataset.getName(), file.getFilePath()); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
|
|
||
| DatasetFile datasetFile = getDatasetFileForAdd(req, file, dataset, objectMapper); | ||
| setDatasetFileId(datasetFile, dataset); | ||
| dataset.addFile(datasetFile); | ||
| addedFiles.add(datasetFile); | ||
| addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd()); | ||
|
|
||
| // 再次确保目标路径在数据集根目录下 | ||
| Path targetPath = Paths.get(datasetFile.getFilePath()).toAbsolutePath().normalize(); | ||
| if (!targetPath.startsWith(datasetRoot)) { | ||
| log.warn("Target path {} escapes dataset root {} for dataset {}", targetPath, datasetRoot, dataset.getName()); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
|
|
||
| addFile(normalizedSource.toString(), targetPath.toString(), req.isSoftAdd()); | ||
| } | ||
| } catch (BusinessException e) { | ||
| throw e; | ||
| @@ -910,13 +927,25 @@ | ||
| LocalDateTime currentTime = LocalDateTime.now(); | ||
| String fileName = sourcePath.getFileName().toString(); | ||
|
|
||
| // 构造并校验数据集内的目标路径,确保不逃逸出数据集根目录 | ||
| Path datasetRoot = Paths.get(dataset.getPath()).toAbsolutePath().normalize(); | ||
| Path targetPath = datasetRoot; | ||
| if (StringUtils.isNotBlank(req.getPrefix())) { | ||
| targetPath = targetPath.resolve(req.getPrefix()); | ||
| } | ||
| targetPath = targetPath.resolve(fileName).normalize(); | ||
| if (!targetPath.startsWith(datasetRoot)) { | ||
| log.warn("Computed target path {} escapes dataset root {} when adding file {}", targetPath, datasetRoot, fileName); | ||
| throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); | ||
| } | ||
|
|
||
| return DatasetFile.builder() | ||
| .id(UUID.randomUUID().toString()) | ||
| .datasetId(dataset.getId()) | ||
| .fileName(fileName) | ||
| .fileType(AnalyzerUtils.getExtension(fileName)) | ||
| .fileSize(sourceFile.length()) | ||
| .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString()) | ||
| .filePath(targetPath.toString()) | ||
| .uploadTime(currentTime) | ||
| .lastAccessTime(currentTime) | ||
| .metadata(objectMapper.writeValueAsString(file.getMetadata())) |
No description provided.