Korean Ordinal TN support#286
Conversation
.gitignore
Outdated
| .hydra/ | ||
| nemo_experiments/ | ||
| *.swp | ||
| *.far |
There was a problem hiding this comment.
let's not edit these files unless strictly necessary
There was a problem hiding this comment.
Got it — I’ve removed the *.far line.
| 8 여덟 | ||
| 9 아홉 | ||
| 10 열 | ||
| 11 열한 |
There was a problem hiding this comment.
for numbers that build from existing ones, let's use rules instead (it seems that 12 == 10>2, for example, and this is repeated up to 39)
There was a problem hiding this comment.
also, if there is overlap between these characters and cardinal, it is important to leverage one class to develop the other
|
|
||
| graph_ordinal_1to39 = pynini.string_file(get_abs_path("data/ordinal/digit_1to39.tsv")) + pynini.accep("번째") | ||
|
|
||
| graph_cardinal = cardinal.just_cardinals + pynini.accep("번째") |
There was a problem hiding this comment.
great use of the cardinal graph! let's rename the variable, since it doesn't just represent cardinals
There was a problem hiding this comment.
Renamed it to graph_ordinal_from40!
| graph_cardinal = cardinal.just_cardinals + pynini.accep("번째") | ||
|
|
||
| graph_ordinal = ( | ||
| pynutil.add_weight(graph_ordinal_1to39, 0.1) | pynutil.add_weight(graph_cardinal, 1.0) |
There was a problem hiding this comment.
let's pick a single way to normalize numbers 1 through 39 and add an exception instead, keeping the weights untouched
There was a problem hiding this comment.
if one of the two is correct, pick that one. otherwise, pick whichever is most common
| overwrite_cache: set to True to overwrite .far files | ||
| """ | ||
|
|
||
| def __init__(self, cache_dir: str = None, overwrite_cache: bool = False): |
There was a problem hiding this comment.
do you need or use this graph elsewhere?
There was a problem hiding this comment.
You're right — I'll go ahead and remove it
| @@ -0,0 +1,20 @@ | |||
| 1번째~첫번째 | |||
There was a problem hiding this comment.
can you add some test cases for your different 1 through 39 graph?
There was a problem hiding this comment.
I will add some test cases for 1 through 39 graph!
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
for more information, see https://pre-commit.ci
…feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
for more information, see https://pre-commit.ci
.gitignore
Outdated
| .hydra/ | ||
| nemo_experiments/ | ||
| *.swp | ||
| *.swp No newline at end of file |
There was a problem hiding this comment.
can we make sure that this file doesn't show up in the PR at all?
There was a problem hiding this comment.
I will delete this file in the PR!
5313873 to
a49a969
Compare
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
…SV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
for more information, see https://pre-commit.ci
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
tbartley94
left a comment
There was a problem hiding this comment.
add additional text files, remove the tars and revert gitignore
nemo_text_processing/text_normalization/ko/data/ordinal/digit.tsv
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/ko/verbalizers/verbalize.py
Outdated
Show resolved
Hide resolved
tools/text_processing_deployment/ko_tn_grammars_lower_cased/classify/tokenize_and_classify.far
Outdated
Show resolved
Hide resolved
tools/text_processing_deployment/ko_tn_grammars_lower_cased/ko_tn_True_tokenize.far
Outdated
Show resolved
Hide resolved
tools/text_processing_deployment/ko_tn_grammars_lower_cased/verbalize/post_process.far
Outdated
Show resolved
Hide resolved
tools/text_processing_deployment/ko_tn_grammars_lower_cased/verbalize/verbalize.far
Outdated
Show resolved
Hide resolved
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
for more information, see https://pre-commit.ci
|
@bbae0312 please refrain from force pushes |
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
for more information, see https://pre-commit.ci
tbartley94
left a comment
There was a problem hiding this comment.
LGTM. Have all tests passed?
| 7 칠십 | ||
| 8 팔십 | ||
| 9 구십 | ||
| 9 구십 No newline at end of file |
There was a problem hiding this comment.
I didn’t change the encoding, but not sure why it’s being detected like that. I’ll double-check just in case.
* Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean TN support for cardinal numbers and postprocessing (#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean TN support for cardinal numbers and postprocessing (NVIDIA#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (NVIDIA#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (NVIDIA#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (NVIDIA#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (NVIDIA#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (NVIDIA#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
* Add Korean TN support for cardinal numbers and postprocessing (NVIDIA#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (NVIDIA#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (NVIDIA#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (NVIDIA#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (NVIDIA#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (NVIDIA#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
* Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean TN support for cardinal numbers and postprocessing (NVIDIA#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (NVIDIA#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (NVIDIA#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (NVIDIA#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (NVIDIA#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (NVIDIA#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean TN support for cardinal numbers and postprocessing (NVIDIA#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (NVIDIA#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (NVIDIA#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (NVIDIA#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (NVIDIA#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (NVIDIA#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
What does this PR do ?
This PR adds support for Korean ordinal number text normalization.
Included:
ordinal.py)verbalize.py)test_ordinal.py,test_cases_ordinal.txttest_sparrowhawk_normalization.shBefore your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.