Commit b8aeda35 authored by Chia Ying Chiu's avatar Chia Ying Chiu
Browse files

Update README.md

parent 1d99f08f
......@@ -9,10 +9,7 @@ Method
Data: Data are from MIMIC-III, Medical Information Mart for Intensive Care, database, compromising health information of each encounter at the critical care units of a large tertiary care hospital (Johnson et al, 2016). For this study, 40,000 clinical notes were used.
Preprocessing:
In order to simulate coder’s work in hospitals, our goal is to construct a model that predicts ICD-9 codes based on the given free-form texts. In our model, we first apply basic preprocessing methods via NLTK, and then build model for learning the features from input texts. The preprocessing procedure includes spell checking, converting into lower cases, stop words removal, tokenization, and removing infrequent
words. The preprocessed data are then split to training and validation set by Scikit-Learn library. (https://www.ejbi.org/scholarly-articles/using-deep-learning-for-automatic-icd10-classification-from-freetext-data.pdf)
Using text frequency - inverse document frequency (TF-IDF) to convert texts to vectorize text. passed this data into a Random Forest, SVC, Multinomial Naive Bayes, and Logistic Regression classifier and evaluated them using Cross-validation. I further explored Random Forest by creating a confusion matrix and created a classification report
In order to simulate coder’s work in hospitals, our goal is to construct a model that predicts ICD-9 codes based on the given free-form texts. In our model, we first apply basic preprocessing methods via NLTK, and then build model for learning the features from input texts. The preprocessing procedure includes converting into lowercase, stop words removal, and tokenization. Using text frequency - inverse document frequency (TF-IDF) to convert texts to vectors, data then were passed into a Multinomial Naive Bayes, logistic regression, random forest, and linear SVC machine learning models. Lastly, the results were evaluated using Cross-validation. Random Forest was explored by creating a confusion matrix and created a classification report.
Result
xxx
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment