Skip to content

Medical Coding Pull Request#317

Merged
jhnwu3 merged 4 commits intosunlabuiuc:devfrom
jhnwu3:dev
Apr 7, 2025
Merged

Medical Coding Pull Request#317
jhnwu3 merged 4 commits intosunlabuiuc:devfrom
jhnwu3:dev

Conversation

@jhnwu3
Copy link
Collaborator

@jhnwu3 jhnwu3 commented Mar 27, 2025

  1. johnwu3
  2. Contribute a multimodality addition to the MIMIC3/MIMIC4 implementations and their medical coding counterparts.
  3. I've created 3 new task templates and added parse_notes() and parse_xrays() for mimic3 and mimic4 following the new event stream format.
  4. pyhealth.datasets.mimic3.py, pyhealth.datasets.mimic4.py, and pyhealth.tasks.medical_coding.py

@zzachw zzachw requested a review from Copilot March 27, 2025 03:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds multimodality support for medical coding tasks by introducing new task templates and dataset utilities for MIMIC-III/MIMIC-IV, including caching, parallel processing in data parsing, and new featurizers. Key changes include:

  • A new TaskTemplate class with caching functionality and a unified API for task processing.
  • New featurizers for handling image and value data modalities.
  • New medical coding task implementations (MIMIC3ICD9Coding, MIMIC4ICD9Coding, and MIMIC4ICD10Coding) and significant updates to dataset parsing in mimic3.py.

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pyhealth/tasks/task_template.py Introduces a new abstract base task template with caching logic.
pyhealth/featurizers/value.py Adds a basic ValueFeaturizer implementation.
pyhealth/featurizers/image.py Adds an ImageFeaturizer using PIL and torchvision.transforms.
pyhealth/tasks/medical_coding.py Implements multiple medical coding tasks for MIMIC datasets.
pyhealth/data/data_v2.py Adds new definitions for Patient and Event with updated APIs.
pyhealth/datasets/base_dataset_v2.py Provides a new abstract dataset class with cache support.
pyhealth/datasets/sample_dataset_v2.py Introduces an updated SampleDataset for torch integration.
test.py Provides a basic performance test for the MIMIC dataset processing.
pyhealth/data/cache.py Adds cache read/write functions using msgpack for Patient objects.
pyhealth/datasets/mimic3.py Updates processing for MIMIC3, including new note parsing and code mapping.
Other files (e.g. init.py, readmission_prediction.py, drug_recommendation.py, eicu.py, mortality_prediction.py, sample_dataset.py) Minor refactors and cleanup (e.g. removal of unused imports).
Comments suppressed due to low confidence (2)

pyhealth/datasets/sample_dataset_v2.py:5

  • [nitpick] Modifying sys.path inline can lead to unexpected imports; consider configuring PYTHONPATH or package management instead.
sys.path.append('.')

pyhealth/datasets/mimic3.py:97

  • [nitpick] The cache file extension '.pkl' differs from the '.msgpack' format used in other modules; ensure a consistent cache serialization format.
filename = hash_str("+".join([str(arg) for arg in args_to_hash])) + ".pkl"

Comment on lines +27 to +28
except:
print(f"Failed to load cache for {self.task_name}. Processing from scratch.")
Copy link

Copilot AI Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use specific exception handling (e.g. 'except Exception as e:') instead of a bare except to avoid masking unexpected errors.

Suggested change
except:
print(f"Failed to load cache for {self.task_name}. Processing from scratch.")
except Exception as e:
print(f"Failed to load cache for {self.task_name}. Processing from scratch. Error: {e}")

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've forced-pushed my dev branch, but it's a little odd that I have to do something like this to get this correct. Will update the PyHealth contributions tutorial accordingly.

):
"""Loads tables into a dict of patients and saves it to cache."""

print("LOL I AM BEING INHERITED!!!")
Copy link

Copilot AI Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the debug print statement before merging as it may clutter logs in production.

Suggested change
print("LOL I AM BEING INHERITED!!!")

Copilot uses AI. Check for mistakes.
@jhnwu3 jhnwu3 merged commit c8c58f0 into sunlabuiuc:dev Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants