-
Notifications
You must be signed in to change notification settings - Fork 9
Class Dynamic Feature -> Ts_extraction #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
martindaniel4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Top, merci @clembac pour cette PR et ce premier ajout à la classe feattures 🚀
J'ai ajouté des remarques au sein de ton code, en gros:
- Il y a quelques dependancies qui me semble manquer (notamment
threshold.json) - Ca serait top si tu pouvais utiliser un linter python (on utilise les règles Flake8 pour construire les classes et ca catche les petites erreurs de syntax (lignes trop longues, indentation, etc..)
- Remarques sur la nomenclature
- Je pense que tu pourrais ajouter ton graph au sein de la classe visualization, prévue à cet effet.
| import json | ||
| import os | ||
| import pandas as pd | ||
| import seaborn as sns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ajouter au fichier requirements.txt pour être bien sûr qu'on ait toutes les libs au setup!
| def __init__(self, df): | ||
| self.df = df | ||
|
|
||
| def f_treshold(self, treshold_json): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Où se trouve le fichier threshold_json ? Je l'ajouterais au fichier config.py cf ici
|
|
||
| for i in treshold_json.keys(): | ||
|
|
||
| tjson = treshold_json[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cela suppose qu'on ait des valeurs de threshold pour toutes les colonnes du DataFrame df non? Est ce que le script plante s'il n'y a pas de valeurs correspondante trouvée?
| self.df[i + "_abnormal_tresh"] = np.where(cond1 | cond2, 1, 0) | ||
| self.df[i + "_clean_tresh"] = np.where((serie >= tjson["etendue"][0]) & (serie <= tjson["etendue"][1] ), 1, 0) | ||
| self.df[i + "_normal_tresh"] = np.where((serie >= tjson["normalite"][0]) & (serie <= tjson["normalite"][1] ), 1, 0) | ||
| self.df[i + "_dirty_tresh"] = np.where((serie <= tjson["etendue"][0]) & ( serie >= tjson["etendue"][1] ), 1, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Je ne trouve pas les prefix dirty vs clean très clair. Pourquoi ne pas simplement détecter si les valeurs sont normal ou abnormal (Sachant que ce sera plutôt cela qu'on regardera) ?
Aussi en terme d'output, là tu renvoies quatre colonnes par variable ce qui me parait beaucoup. Pourquoi ne pas renvoyer une nouvelle colonne avec un booléen, e.g:
etco2_threshold: avec1ou0où 1 désigne une anormalité et 0 normalité
| cond2 = (serie <= tjson["etendue"][1] ) & (serie > tjson["normalite"][1]) | ||
|
|
||
| self.df[i + "_abnormal_tresh"] = np.where(cond1 | cond2, 1, 0) | ||
| self.df[i + "_clean_tresh"] = np.where((serie >= tjson["etendue"][0]) & (serie <= tjson["etendue"][1] ), 1, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: > 80 char
| cond1 = (y <= treshold_json[var]["normalite"][0]) | ||
| cond2 = (y >= treshold_json[var]["normalite"][1]) | ||
|
|
||
| ax.fill_between(x, y, treshold_json[var]["normalite"][0], where = cond1, facecolor='red', interpolate=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: > 80 char
| self.df[i + "_normal_tresh"] = np.where((serie >= tjson["normalite"][0]) & (serie <= tjson["normalite"][1] ), 1, 0) | ||
| self.df[i + "_dirty_tresh"] = np.where((serie <= tjson["etendue"][0]) & ( serie >= tjson["etendue"][1] ), 1, 0) | ||
|
|
||
| def plot_tresh(self, treshold_json ,id_patient = 100, var = "Pmean", title = '') : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Est ce qu'on ne mettrait pas ça plutôt dans la classe visualization ? https://github.com/dataforgoodfr/batch_5_transplant/blob/master/transplant/visualization/graph.py
On pourrait imaginer un paramètre lorsque tu plot une timeseries avec une variable qui renverrait les limites. Tu pourrais aussi créer une autre fonction pour plotter les aires au-dessus / en-dessous des variables anormales? Qu'en penses tu ?
In this PR :
-> ts_extraction.f_treshold : output a df with new features based on treshold provided in a json file
-> ts_extraction.plot_tresh : plot time serie filled with treshold provided in the json file