Conversation
jhnwu3
commented
Mar 27, 2025
- johnwu3
- Contribute a multimodality addition to the MIMIC3/MIMIC4 implementations and their medical coding counterparts.
- I've created 3 new task templates and added parse_notes() and parse_xrays() for mimic3 and mimic4 following the new event stream format.
- pyhealth.datasets.mimic3.py, pyhealth.datasets.mimic4.py, and pyhealth.tasks.medical_coding.py
There was a problem hiding this comment.
Pull Request Overview
This PR adds multimodality support for medical coding tasks by introducing new task templates and dataset utilities for MIMIC-III/MIMIC-IV, including caching, parallel processing in data parsing, and new featurizers. Key changes include:
- A new TaskTemplate class with caching functionality and a unified API for task processing.
- New featurizers for handling image and value data modalities.
- New medical coding task implementations (MIMIC3ICD9Coding, MIMIC4ICD9Coding, and MIMIC4ICD10Coding) and significant updates to dataset parsing in mimic3.py.
Reviewed Changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pyhealth/tasks/task_template.py | Introduces a new abstract base task template with caching logic. |
| pyhealth/featurizers/value.py | Adds a basic ValueFeaturizer implementation. |
| pyhealth/featurizers/image.py | Adds an ImageFeaturizer using PIL and torchvision.transforms. |
| pyhealth/tasks/medical_coding.py | Implements multiple medical coding tasks for MIMIC datasets. |
| pyhealth/data/data_v2.py | Adds new definitions for Patient and Event with updated APIs. |
| pyhealth/datasets/base_dataset_v2.py | Provides a new abstract dataset class with cache support. |
| pyhealth/datasets/sample_dataset_v2.py | Introduces an updated SampleDataset for torch integration. |
| test.py | Provides a basic performance test for the MIMIC dataset processing. |
| pyhealth/data/cache.py | Adds cache read/write functions using msgpack for Patient objects. |
| pyhealth/datasets/mimic3.py | Updates processing for MIMIC3, including new note parsing and code mapping. |
| Other files (e.g. init.py, readmission_prediction.py, drug_recommendation.py, eicu.py, mortality_prediction.py, sample_dataset.py) | Minor refactors and cleanup (e.g. removal of unused imports). |
Comments suppressed due to low confidence (2)
pyhealth/datasets/sample_dataset_v2.py:5
- [nitpick] Modifying sys.path inline can lead to unexpected imports; consider configuring PYTHONPATH or package management instead.
sys.path.append('.')
pyhealth/datasets/mimic3.py:97
- [nitpick] The cache file extension '.pkl' differs from the '.msgpack' format used in other modules; ensure a consistent cache serialization format.
filename = hash_str("+".join([str(arg) for arg in args_to_hash])) + ".pkl"
pyhealth/tasks/task_template.py
Outdated
| except: | ||
| print(f"Failed to load cache for {self.task_name}. Processing from scratch.") |
There was a problem hiding this comment.
Use specific exception handling (e.g. 'except Exception as e:') instead of a bare except to avoid masking unexpected errors.
| except: | |
| print(f"Failed to load cache for {self.task_name}. Processing from scratch.") | |
| except Exception as e: | |
| print(f"Failed to load cache for {self.task_name}. Processing from scratch. Error: {e}") |
There was a problem hiding this comment.
I've forced-pushed my dev branch, but it's a little odd that I have to do something like this to get this correct. Will update the PyHealth contributions tutorial accordingly.
| ): | ||
| """Loads tables into a dict of patients and saves it to cache.""" | ||
|
|
||
| print("LOL I AM BEING INHERITED!!!") |
There was a problem hiding this comment.
Remove the debug print statement before merging as it may clutter logs in production.
| print("LOL I AM BEING INHERITED!!!") |