Commit a43f6a1b authored by Chia Ying Chiu's avatar Chia Ying Chiu
Browse files


parent b8aeda35
......@@ -11,8 +11,6 @@ Data: Data are from MIMIC-III, Medical Information Mart for Intensive Care, data
In order to simulate coder’s work in hospitals, our goal is to construct a model that predicts ICD-9 codes based on the given free-form texts. In our model, we first apply basic preprocessing methods via NLTK, and then build model for learning the features from input texts. The preprocessing procedure includes converting into lowercase, stop words removal, and tokenization. Using text frequency - inverse document frequency (TF-IDF) to convert texts to vectors, data then were passed into a Multinomial Naive Bayes, logistic regression, random forest, and linear SVC machine learning models. Lastly, the results were evaluated using Cross-validation. Random Forest was explored by creating a confusion matrix and created a classification report.
1. Tf-IDF vectorization is not suitable for clinical notes
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment