don't stop on encoding-errors #393

a7p · 2019-07-02T14:11:02Z

No description provided.

asottile · 2019-07-02T15:09:10Z

pre_commit_hooks/detect_aws_credentials.py

    for filename in filenames:
-        with open(filename, 'r', errors='ignore') as content:
-            text_body = content.read()
+        with io.open(filename, 'rb') as content:


I'd rather see io.open(filename, encoding='UTF-8') -- errors='ignore' can lead to false negatives

providing an encoding does not solve my issue (a file which is not utf-8 encoded).

Using ignore means throwing away the characters which cannot be decoded as UTF-8. Since the AWS-Credentials will be ASCII-Strings (this is me claiming something without having any prove) it may only lead to false positives, which would not be that bad imo. I can try to provide further information on this and tests for various scenarios, but it will take a few weeks.

afaik, ignore=... and replace=... can remove bytes surrounding an error (consider a surrogate or continuation byte followed by garbage) so they could in fact remove part of the key leading to no-match

Sounds plausible to me, I did not know about this. The question is, what is the expected behavior for detect-aws-credentials on non-utf-8 files? I could live with skipping them or with opening them as utf-8 with ignore or replace - in none of these scenarios credential-detection will not (reliably) work for the file, as you pointed out. I cannot see how to build an encoding agnostic solution. If you've got any idea I'll happy to do try to implement it.

another idea would be to scan the files as binary files and use the bytes representation of the credentials -- then you don't worry about the encoding

Here's one you can use as an example for that: https://github.com/pre-commit/pre-commit-hooks/blob/master/pre_commit_hooks/detect_private_key.py

thank you very much - I'll look into it and try to come up with a PR next weekend.

ssbarnea

I fail to understand the purpose of this change. If it is a bugfix it should refer to an existing bug that documents clearly the problem.

Based on its title, I would directly close the PR as I don't find ignoring errors as a good practice in general.

asottile · 2019-08-04T14:09:11Z

I fail to understand the purpose of this change. If it is a bugfix it should refer to an existing bug that documents clearly the problem.

Based on its title, I would directly close the PR as I don't find ignoring errors as a good practice in general.

we're iterating on the hook as seen above -- there's a way to make this work without dealing with encoding problems

asottile · 2020-02-18T18:27:11Z

via #453 -- thanks again for working on this!

Albrecht Mühlenschulte added 2 commits July 2, 2019 16:10

don't stop on encoding-errors

c13d003

trying to fix python-version issue

352113d

asottile reviewed Jul 2, 2019

View reviewed changes

Albrecht Mühlenschulte added 3 commits July 2, 2019 17:13

trying to fix python-version issue 2

5bbf4fd

trying to fix python-version issue 3

72826ab

removes unused import

66af165

ssbarnea suggested changes Aug 3, 2019

View reviewed changes

This was referenced Feb 18, 2020

UnicodeDecodeError in detect-aws-credentials is unreadable #346

Closed

Allow arbitrarily encoded files to be checked with detect-aws-credentials #453

Merged

asottile closed this Feb 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

don't stop on encoding-errors #393

don't stop on encoding-errors #393

Uh oh!

a7p commented Jul 2, 2019

Uh oh!

asottile Jul 2, 2019

Uh oh!

a7p Jul 4, 2019

Uh oh!

asottile Jul 4, 2019

Uh oh!

a7p Jul 4, 2019

Uh oh!

asottile Jul 4, 2019

Uh oh!

asottile Jul 7, 2019

Uh oh!

a7p Jul 7, 2019

Uh oh!

ssbarnea left a comment

Uh oh!

asottile commented Aug 4, 2019

Uh oh!

asottile commented Feb 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Uh oh!

don't stop on encoding-errors #393

don't stop on encoding-errors #393

Uh oh!

Conversation

a7p commented Jul 2, 2019

Uh oh!

asottile Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

a7p Jul 4, 2019

Choose a reason for hiding this comment

Uh oh!

asottile Jul 4, 2019

Choose a reason for hiding this comment

Uh oh!

a7p Jul 4, 2019

Choose a reason for hiding this comment

Uh oh!

asottile Jul 4, 2019

Choose a reason for hiding this comment

Uh oh!

asottile Jul 7, 2019

Choose a reason for hiding this comment

Uh oh!

a7p Jul 7, 2019

Choose a reason for hiding this comment

Uh oh!

ssbarnea left a comment

Choose a reason for hiding this comment

Uh oh!

asottile commented Aug 4, 2019

Uh oh!

asottile commented Feb 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants