[ENH] Add TextFeatures transformer for text feature extraction#880
[ENH] Add TextFeatures transformer for text feature extraction#880ankitlade12 wants to merge 10 commits intofeature-engine:mainfrom
Conversation
ankitlade12
commented
Jan 8, 2026
- Add TextFeatures class to extract features from text columns
- Support for features: char_count, word_count, digit_count, uppercase_count, etc.
- Add comprehensive tests with pytest parametrize
- Add user guide documentation
solegalli
left a comment
There was a problem hiding this comment.
Hi @ankitlade12
Thanks a lot!
This transformer, function-wise, I'd say it's ready. I made a few suggestions regarding how to optimize the feature creation functions. Let me know if they make sense.
Other than that, we need the various docs file and we'll be good to go :)
Thanks again!
|
We need to rebase main so the 2 remaining tests pass. |
solegalli
left a comment
There was a problem hiding this comment.
Hi @ankitlade12
I am very sorry for the delayed review. I am travelling till end of April, so I am a bit slower than usual.
I think, for the first version of the transformer, let's enforce the user to pass the names of the text variables. They can pass one or more variables in case there are more than one text column.
Other than that, we need to add the tranformer in the docs/index file, in the readme, and in the docs/api, and adjust the tests and the demo to the newer functionality. Then it is good to merge.
Thank you very much for this great addition.
877550f to
0b8a183
Compare
solegalli
left a comment
There was a problem hiding this comment.
Hi @ankitlade12
Thank you very much for adding the doc files back to the PR.
Would you mind going over the comments again? Many are marked as resolved, but haven't been resolved.
We are getting closer to to the final version :)
Thanks a lot!
solegalli
left a comment
There was a problem hiding this comment.
Hey @ankitlade12
Thank you very much for addressing all the comments. We are getting to the final version. It's looking really good!
I made a few suggestions for the user guide.
We changed the name of the last feature to lexical diversity at some point, and then the change was lost, so I am bringing it back.
We also changed the tests from having the class to having single tests. That change was also lost, could you please bring it back?
The init file in the tests needs to be deleted.
I think that's pretty much it! Then we should be good to merge!
Thanks a lot for this great addition!
…ersity, improve docs, refactor tests