"Article used the same data set MIMIC-III to evaluate the ICD9 code assignment of RNNs and CNNs. https://github.com/lsy3/clinical-notes-diagnosis-dl-nlp \n",
"\n",
"GitHub seems to provide code and cleaned data sets. \n",
"Using the resources from the GitHub project to assign ICD9 code using different multi-label text classification models \n",
"\n",
"Is it possible to optimize their models? \n",
"\n",
"Prediction models using the ICD9 codes with covariates (insurance type, gender*, ethnicity, marital status, admission type) to see what are the top ICD codes that are associated with prolonged length of stay. https://towardsdatascience.com/predicting-hospital-length-of-stay-at-time-of-admission-55dfdfe69598 \n",
"\n",
"Compare the prediction models of different multi-label text classification models, and see if the results are agreed across models \n",
"\n",
" \n",
"| Task | Assigned To | Deadline|\n",
" ------|-------------|----------\n",
"| Run the CNN and RNN models, refer to the GitHub link above| Zaid & Chia | 11/15|\n",
"|------|\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"For this project, our goal is create an NLP model to automatically assign ICD-9 encodings, given the clinical notes at each encounter)."
"full_dataset = pd.merge(diagnoses, note_events, on =[\"HADM_ID\", \"SUBJECT_ID\"])\n",
"full_dataset = full_dataset[:40000]\n",
"\n",
""
"print(full_dataset)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "72a0901e",
"metadata": {},
"outputs": [],
"source": []
...
...
@@ -94,9 +116,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python [conda env:nlp2021]",
"language": "python",
"name": "python3"
"name": "conda-env-nlp2021-py"
},
"language_info": {
"codemirror_mode": {
...
...
@@ -108,9 +130,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
"nbformat_minor": 4
}
%% Cell type:markdown id:bc36f532 tags:
NLP project ideas:
Article used the same data set MIMIC-III to evaluate the ICD9 code assignment of RNNs and CNNs. https://github.com/lsy3/clinical-notes-diagnosis-dl-nlp
GitHub seems to provide code and cleaned data sets.
Using the resources from the GitHub project to assign ICD9 code using different multi-label text classification models
Is it possible to optimize their models?
Prediction models using the ICD9 codes with covariates (insurance type, gender*, ethnicity, marital status, admission type) to see what are the top ICD codes that are associated with prolonged length of stay. https://towardsdatascience.com/predicting-hospital-length-of-stay-at-time-of-admission-55dfdfe69598
Compare the prediction models of different multi-label text classification models, and see if the results are agreed across models
| Task | Assigned To | Deadline|
------|-------------|----------
| Run the CNN and RNN models, refer to the GitHub link above| Zaid & Chia | 11/15|
|------|
%% Cell type:markdown id: tags:
For this project, our goal is create an NLP model to automatically assign ICD-9 encodings, given the clinical notes at each encounter).