Commit 1d99f08f authored by Chia Ying Chiu's avatar Chia Ying Chiu
Browse files

Update README.md

parent a8a218ad
...@@ -6,11 +6,10 @@ The International Classification of Diseases (ICD) standardizes the format for r ...@@ -6,11 +6,10 @@ The International Classification of Diseases (ICD) standardizes the format for r
Method Method
Data: Data: Data are from MIMIC-III, Medical Information Mart for Intensive Care, database, compromising health information of each encounter at the critical care units of a large tertiary care hospital (Johnson et al, 2016). For this study, 40,000 clinical notes were used.
Data are from MIMIC-III, Medical Information Mart for Intensive Care, database, compromising health information of each encounter at the critical care units of a large tertiary care hospital (Johnson et al, 2016). For this study, data of diagnostic codes and clinical notes are included.
Preprocessing: Preprocessing:
In order to simulate coder’s work in hospitals, our goal is to construct a model that predicts ICD-10 codes based on the given free-form texts. In our model, we first apply basic preprocessing methods via NLTK [8], and then build a neural network model for learning the features from input texts. The preprocessing procedure includes spell checking, converting into lower cases, stop words removal, tokenization, and removing infrequent In order to simulate coder’s work in hospitals, our goal is to construct a model that predicts ICD-9 codes based on the given free-form texts. In our model, we first apply basic preprocessing methods via NLTK, and then build model for learning the features from input texts. The preprocessing procedure includes spell checking, converting into lower cases, stop words removal, tokenization, and removing infrequent
words. The preprocessed data are then split to training and validation set by Scikit-Learn library. (https://www.ejbi.org/scholarly-articles/using-deep-learning-for-automatic-icd10-classification-from-freetext-data.pdf) words. The preprocessed data are then split to training and validation set by Scikit-Learn library. (https://www.ejbi.org/scholarly-articles/using-deep-learning-for-automatic-icd10-classification-from-freetext-data.pdf)
Using text frequency - inverse document frequency (TF-IDF) to convert texts to vectorize text. passed this data into a Random Forest, SVC, Multinomial Naive Bayes, and Logistic Regression classifier and evaluated them using Cross-validation. I further explored Random Forest by creating a confusion matrix and created a classification report Using text frequency - inverse document frequency (TF-IDF) to convert texts to vectorize text. passed this data into a Random Forest, SVC, Multinomial Naive Bayes, and Logistic Regression classifier and evaluated them using Cross-validation. I further explored Random Forest by creating a confusion matrix and created a classification report
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment