Add/cxr comprehensive tutorial and benchmarks for PyHealth paper#773
Add/cxr comprehensive tutorial and benchmarks for PyHealth paper#773
Conversation
…example that were never fixed before
There was a problem hiding this comment.
I converted chestxray14_binary_classification.ipynb and chestxray14_multilabel_classification.ipynb to .py files in #777. I tried to use git mv to first move them to examples/cxr/ to avoid a conflict with this PR but looks like it didn't work. You can delete these two files, as if you merge them they will be duplicates of the .py versions.
There was a problem hiding this comment.
No worries, I think the notebook version isn't necessarily a bad idea to have too.
| * - ``cxr/cnn_cxr.ipynb`` | ||
| - CNN for chest X-ray classification (notebook) | ||
| * - ``chestXray_image_generation_VAE.py`` | ||
| * - ``cxr/chestxray14_binary_classification.ipynb`` |
There was a problem hiding this comment.
| * - ``cxr/chestxray14_binary_classification.ipynb`` | |
| * - ``cxr/chestxray14_binary_classification.py`` |
| parse_options=pv.ParseOptions(delimiter=delimiter), | ||
| parse_options=pv.ParseOptions( | ||
| delimiter=delimiter, newlines_in_values=True | ||
| ), |
There was a problem hiding this comment.
I do need a check from someone like @Logiquo if this will break some of the test cases/workflows since I changed this option to support reading notes.
There was a problem hiding this comment.
This should be safe unless it breaks some weird testcase. In theory a well-formated csv file should not be affected by this.
This pull request introduces several documentation improvements and adds a new benchmarking script for length of stay prediction using pandas. The most significant changes are the addition of interpretability documentation for Vision Transformers (ViT), new visualization utility documentation, expanded tutorial listings for image analysis, and a comprehensive benchmark script for MIMIC-IV data.
Documentation Enhancements for Interpretability and Vision Transformers (ViT):
docs/api/interpret.rst).This pull request introduces significant improvements to the documentation and benchmarking utilities for interpretability and image analysis in PyHealth, as well as adds a new benchmarking script for length-of-stay prediction using pandas. The main changes include expanded interpretability documentation (especially for Vision Transformers), a detailed API reference for visualization utilities, improved organization of image analysis tutorials, and the addition of a comprehensive benchmarking script for MIMIC-IV data processing.
Documentation Improvements
docs/api/interpret.rstto include a new ViT/Chefer attribution example, providing step-by-step guidance on using CheferRelevance for Vision Transformers and visualizing model attributions.docs/api/interpret.rst, introducing thepyhealth.interpret.utilsmodule and its specialized support for Vision Transformer attribution visualizations.docs/api/interpret/pyhealth.interpret.utils.rstthat details all visualization functions, normalization utilities, and ViT-specific visualization helpers, including example usage for both standard and ViT attributions.Tutorial and Example Organization
docs/tutorials.rstto clarify that chest X-ray examples are located in theexamples/cxr/directory, and reorganized the list of example files for better clarity and coverage of new notebooks and scripts.Benchmarking Utilities
examples/benchmark_perf/benchmark_pandas_los.pythat benchmarks length-of-stay prediction processing using pandas on MIMIC-IV data, mirroring the PyHealth LengthOfStayPredictionMIMIC4 task. The script includes patient-level processing, LOS categorization, memory and time tracking, and outputs detailed statistics and results.on utilities for attribution overlays and ViT-specific visualizations, with links to relevant utility functions (docs/api/interpret.rst,docs/api/interpret/pyhealth.interpret.utils.rst). [1] [2]Expanded Tutorials and Example Listings:
docs/tutorials.rst).New Benchmark Script for Length of Stay Prediction:
examples/benchmark_perf/benchmark_pandas_los.py, a standalone script that benchmarks visit-level length of stay prediction on MIMIC-IV data using pandas. The script processes admissions, diagnoses, procedures, and prescriptions, categorizes LOS, tracks memory usage, and outputs summary statistics and results.