Skip to content

Conversation

@Rutam21
Copy link
Contributor

@Rutam21 Rutam21 commented Oct 25, 2024

Description

This PR implements a new input parameter, always_enqueue, in the Request.from_url constructor. This feature allows users to bypass the request deduplication process, ensuring that each request is always enqueued and processed.

Key Changes:

  • Added the always_enqueue parameter to the Request.from_url method.
  • When always_enqueue is set to true, a random unique key is generated for each request, preventing it from being considered a duplicate.
  • This enhancement provides users with a convenient way to enqueue requests without worrying about deduplication.

With this addition, users can seamlessly manage their requests while maintaining flexibility in their queuing strategy.

Issues

Fixes #547

Testing

The modifications can be verified by making requests to the same URL. The operations should execute each time this is done without treating it as a duplicate request when always_enqueue is set to true.

Checklist

  • CI passed

@Rutam21
Copy link
Contributor Author

Rutam21 commented Oct 28, 2024

@vdusek I have reviewed this issue and your suggestions and implemented them in this PR. Please review and suggest changes, if any. Thanks.

cc; @janbuchar

@souravjain540 souravjain540 requested a review from vdusek October 29, 2024 03:51
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite good, but make sure the CI passes (lint check, type check, unit tests... all are failing), see the CONTRIBUTING.md for more information.

@Rutam21
Copy link
Contributor Author

Rutam21 commented Oct 30, 2024

@vdusek Please run the CI checks again. I have pushed a commit to fix them all. Thanks.

cc: @souravjain540

@janbuchar janbuchar self-requested a review October 31, 2024 09:47
@janbuchar janbuchar self-requested a review October 31, 2024 10:45
Copy link
Collaborator

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your contribution!

@janbuchar janbuchar merged commit 4e59fa4 into apify:master Oct 31, 2024
@Rutam21 Rutam21 deleted the IS-547 branch October 31, 2024 11:21
Pijukatel added a commit to apify/apify-sdk-python that referenced this pull request Nov 28, 2025
#677)

### Description

- Make sure that storage from `ApifyFileSystemStorageClient` does not
get purged twice due to storage from `FileSystemStorageClient` pointing
to the same location. (Those storage clients will have the same cache
key and thus there can be only one.)
- Ensure that `Actor` will open input containing KVS on initialization
to ensure that an aware storage client is used.
- Support any possible pre-existing input key and file that is defined
through `Configuration.input_key`. Different input files will have
different handling based on their suffix:
  - ".json" is parsed as json.
  - ".txt" is opened as plain text
  - everything else is opened as bytes
- without extension is tried to be parsed as json first, but falls back
to bytes
- Create a metadata file for the valid pre-existing input file, without
modifying the input file (otherwise, cli might detect the change to the
input, which would be a false positive)
- Raise an error if two valid pre-existing input files exist in the
expected storage directory.
- CLI does not respect env variables with the input key so far. TODO:
apify/apify-cli#960

### Issues

Closes: apify/crawlee-python#621 
Related to: [#INPUT.json Automatically Deleted on Each Run (Python SDK
Local Storage
Issue)](#686)

### Testing

- Added unit tests.
- Manually tested with
[[email protected]](https://www.npmjs.com/package/apify-cli/v/1.1.2-beta.20)
- npx [email protected] run -i {\"a\":\"c\"} with pre-existing
input file or without input and multiple times in a row
- npx [email protected] run with pre-existing input file or
without input and multiple times in a row
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add always_enqueue option to Request for bypassing deduplication

3 participants