利用微调的deberta-v3-large来预测情感分类

通过手动输入一段话来测试我们情感分类的大模型准确率，并且介绍了如何获取对应的预测下标和使用softmax函数来计算出相应的概率

若石之上

998人浏览 · 2023-09-06 19:50:03

若石之上 · 2023-09-06 19:50:03 发布

前言：

昨天我们讲述了怎么利用emotion数据集进行deberta-v3-large大模型的微调，那今天我们就来输入一些数据来测试一下，看看模型的准确率，为了方便起见，我直接用测试集的前十条数据

代码：

from transformers import AutoModelForSequenceClassification,AutoTokenizer
import torch
import numpy

tokenizer = AutoTokenizer.from_pretrained("deberta-v3-large")
model = AutoModelForSequenceClassification.from_pretrained("result/checkpoint-500",num_labels=6)

raw_inputs = [
    "im feeling rather rotten so im not very ambitious right now",
    "im updating my blog because i feel shitty",
    "i never make her separate from me because i don t ever want her to feel like i m ashamed with her",
    "i left with my bouquet of red and yellow tulips under my arm feeling slightly more optimistic than when i arrived",
    "i was feeling a little vain when i did this one",
    "i cant walk into a shop anywhere where i do not feel uncomfortable",
    "i felt anger when at the end of a telephone call",
    "i explain why i clung to a relationship with a boy who was in many ways immature and uncommitted despite the excitement i should have been feeling for g
etting accepted into the masters program at the university of virginia",
    "i like to have the same breathless feeling as a reader eager to see what will happen next",
    "i jest i feel grumpy tired and pre menstrual which i probably am but then again its only been a week and im about as fit as a walrus on vacation for the
 summer"
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits.argmax(-1).numpy())

output_tensor = torch.softmax(outputs.logits, dim=1)

numpy.set_printoptions(suppress=True, precision=15)
print(output_tensor.detach().numpy())

标注结果：

[0 0 0 1 0 4 3 1 1 3]

测试结果：

[0 0 0 1 0 4 4 2 1 3]
[[0.99185866    0.0011510316  0.00038844926 0.0026896652  0.0029623401
  0.00094986777]
 [0.9918577     0.0011512033  0.00038886679 0.0026923663  0.0029585315
  0.000951257  ]
 [0.99185807    0.0011446937  0.00038163515 0.0026456509  0.0030354485
  0.00093440723]
 [0.00041773843 0.9972398     0.0014854104  0.0002909223  0.00036231524
  0.00020376328]
 [0.99185014    0.0011451623  0.00038086114 0.0026396883  0.0030524035
  0.00093187904]
 [0.015044774   0.0025362356  0.00041989447 0.015223678   0.95009714
  0.016678285  ]
 [0.11319714    0.030935207   0.007336047   0.3035547     0.47545433
  0.069522515  ]
 [0.0011094044  0.18334262    0.8081213     0.0011003793  0.0007297965
  0.005596481  ]
 [0.0004444314  0.9972433     0.0014491597  0.00028465112 0.00037411976
  0.00020446534]
 [0.00241266    0.00079152075 0.00092184055 0.9924028     0.0024109248
  0.0010602956 ]]

结果对比：

除了第七、第八条数据错误外，其他的八条数据都是正确的

代码解释：

1、raw_inputs：用户输入的数据，这个地方你可以使用一个while循环，然后使用input来与用户进行交互，需要注意的是这个必须是一个数组，哪怕用户只输入了一句文本。

2、return_tensors="pt"：表示tokenizer返回的是PyTorch格式的数据

3、argmax(-1)：将logits属性中的浮点数张量沿着最后一个轴（即-1轴）进行argmax操作，从而找到该张量中最大值所对应的标签编号。

4、softmax(outputs.logits, dim=1)：dim指沿着哪个维度计算softmax，通常指定为1，表示对每一行进行softmax操作。如果不指定，则默认在最后一维计算softmax。

5、numpy.set_printoptions(suppress=True, precision=15)：使用 numpy.set_printoptions() 函数来设置打印选项，从而调整打印输出格式。其中，suppress 选项可以关闭科学计数法，precision 选项可以设置打印精度。

智源数据社区

更多推荐

自然语言处理(NLP)-下游任务&数据集：语言模型、机器翻译、问答、文本分类、情感分析、文本生成、自动摘要、命名实体识别、阅读理解、自然语言推理、信息提取、词性标注、共指消解、实体链接【＞200项】

智源数据社区

利用科大讯飞开放平台进行自然语言处理（NLP）Python

最近在做聊天机器人的人工智能实践，需要用到依存句法分析和语义依存分析，所以利用强大的中文语言技术平台注册及快速入门网址 https://www.xfyun.cn/快速入门文档 https://www.xfyun.cn/doc/platform/quickguide.htmlIP白名单设置运行demo时，会出现类似{"code":"10105","data":{},"desc":"ill...