Skip to content

[ENH] Add TextFeatures transformer for text feature extraction#880

Open
ankitlade12 wants to merge 10 commits intofeature-engine:mainfrom
ankitlade12:add-text-features
Open

[ENH] Add TextFeatures transformer for text feature extraction#880
ankitlade12 wants to merge 10 commits intofeature-engine:mainfrom
ankitlade12:add-text-features

Conversation

@ankitlade12
Copy link
Contributor

  • Add TextFeatures class to extract features from text columns
  • Support for features: char_count, word_count, digit_count, uppercase_count, etc.
  • Add comprehensive tests with pytest parametrize
  • Add user guide documentation

Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ankitlade12

Thanks a lot!

This transformer, function-wise, I'd say it's ready. I made a few suggestions regarding how to optimize the feature creation functions. Let me know if they make sense.

Other than that, we need the various docs file and we'll be good to go :)

Thanks again!

@ankitlade12 ankitlade12 requested a review from solegalli January 23, 2026 16:40
@solegalli
Copy link
Collaborator

We need to rebase main so the 2 remaining tests pass.

Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ankitlade12

I am very sorry for the delayed review. I am travelling till end of April, so I am a bit slower than usual.

I think, for the first version of the transformer, let's enforce the user to pass the names of the text variables. They can pass one or more variables in case there are more than one text column.

Other than that, we need to add the tranformer in the docs/index file, in the readme, and in the docs/api, and adjust the tests and the demo to the newer functionality. Then it is good to merge.

Thank you very much for this great addition.

Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ankitlade12

Thank you very much for adding the doc files back to the PR.

Would you mind going over the comments again? Many are marked as resolved, but haven't been resolved.

We are getting closer to to the final version :)

Thanks a lot!

Copy link
Collaborator

@solegalli solegalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ankitlade12

Thank you very much for addressing all the comments. We are getting to the final version. It's looking really good!

I made a few suggestions for the user guide.

We changed the name of the last feature to lexical diversity at some point, and then the change was lost, so I am bringing it back.

We also changed the tests from having the class to having single tests. That change was also lost, could you please bring it back?

The init file in the tests needs to be deleted.

I think that's pretty much it! Then we should be good to merge!

Thanks a lot for this great addition!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants