-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add Malicious Webpage Detection Example #976
base: develop
Are you sure you want to change the base?
Conversation
Add Malicious Webpage Detection Example by PaddleNLP
TCChenlong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有些地方有问题,comments了,辛苦改下吧 感谢~
| "source": [ | ||
| "# 使用LSTM的恶意网页识别\n", | ||
| "\n", | ||
| "**作者:** [PaddlePaddle](https://github.com/PaddlePaddle) <br>\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
作者这里写自己的github名字和链接 感谢大家的贡献~
| "source": [ | ||
| "## 三、网络搭建\n", | ||
| "\n", | ||
| "### 3.1 构造dataloder\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataloder -> DataLoader
| "import paddlenlp\n", | ||
| "import paddle.nn as nn\n", | ||
| "import paddle.nn.functional as F\n", | ||
| "import paddlenlp as ppnlp\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不推荐这么用,还是 paddlenlp 就好~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
就是删掉72行?
TCChenlong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "!pip install lxml -i https://mirror.baidu.com/pypi/simple/\r\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lxml和html5lib若后面没用到,需删除
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "class SelfDefinedDataset(paddle.io.Dataset):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaddleNLP自定义数据集有多种方式,可参考:https://paddlenlp.readthedocs.io/zh/latest/data_prepare/dataset_self_defined.html
当然,这里的自定义也没问题~
| "然后接一个线性变换层,完成二分类任务。\n", | ||
| "\n", | ||
| "- `paddle.nn.Embedding`组建word-embedding层\n", | ||
| "- `ppnlp.seq2vec.LSTMEncoder`组建句子建模层\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也需要改一下: ppnlp -> paddlenlp
| " padding_idx=padding_idx)\n", | ||
| "\n", | ||
| " # 将word embedding经过LSTMEncoder变换到文本语义表征空间中\n", | ||
| " self.lstm_encoder = ppnlp.seq2vec.LSTMEncoder(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也需要改一下: ppnlp -> paddlenlp
| "# 提取全部被黑页面样本\r\n", | ||
| "d_page = tempdf[tempdf['flag']=='d']\r\n", | ||
| "# 合并样本\r\n", | ||
| "train_page = pd.concat([n_page,d_page],axis=0)\r\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里做了两次合并 合并一次就可以吧?
Add Malicious Webpage Detection Example by PaddleNLP