DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. It outperforms BERT and RoBERTa on majority of NLU tasks with 80GB training data.即梦al
Please check the official repository for more details and updates.百度aiapp
This the DeBERTa xlarge model(750M) fine-tuned with mnli task.快问ai
Fine-tuning on NLU tasks
We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.人工智能ai哪个好
| Model | SQuAD 1.1 | SQuAD 2.0 | MNLI-m/mm | SST-2 | QNLI | CoLA | RTE | MRPC | QQP | STS-B |
|---|---|---|---|---|---|---|---|---|---|---|
| F1/EM | F1/EM | Acc | Acc | Acc | MCC | Acc | Acc/F1 | Acc/F1 | P/S | |
| BERT-Large | 90.9/84.1 | 81.8/79.0 | 86.6/- | 93.2 | 92.3 | 60.6 | 70.4 | 88.0/- | 91.3/- | 90.0/- |
| RoBERTa-Large | 94.6/88.9 | 89.4/86.5 | 90.2/- | 96.4 | 93.9 | 68.0 | 86.6 | 90.9/- | 92.2/- | 92.4/- |
| XLNet-Large | 95.1/89.7 | 90.6/87.9 | 90.8/- | 97.0 | 94.9 | 69.0 | 85.9 | 90.8/- | 92.3/- | 92.5/- |
| DeBERTa-Large1 | 95.5/90.1 | 90.7/88.0 | 91.3/91.1 | 96.5 | 95.3 | 69.5 | 91.0 | 92.6/94.6 | 92.3/- | 92.8/92.5 |
| DeBERTa-XLarge1 | -/- | -/- | 91.5/91.2 | 97.0 | – | – | 93.1 | 92.1/94.3 | – | 92.9/92.7 |
| DeBERTa-V2-XLarge1 | 95.8/90.8 | 91.4/88.9 | 91.7/91.6 | 97.5ai分析软件 | 95.8 | 71.1 | 93.9即梦下载官方 | 92.0/94.2 | 92.3/89.8 | 92.9/92.9 |
| DeBERTa-V2-XXLarge1,2 | 96.1/91.4即梦下载官方 | 92.2/89.7grok中文版下载 | 91.7/91.9快问ai | 97.2 | 96.0百度流畅ai制作 | 72.0免费的ai工具 | 93.5 | 93.1/94.9al一键脱装入口 | 92.7/90.3做al视频怎么赚钱 | 93.2/93.1下载官方即梦a1 |
Notes.
- 1 Following RoBERTa, for RTE, MRPC, STS-B, we fine-tune the tasks based on DeBERTa-Large-MNLI, DeBERTa-XLarge-MNLI, DeBERTa-V2-XLarge-MNLI, DeBERTa-V2-XXLarge-MNLI. The results of SST-2/QQP/QNLI/SQuADv2 will also be slightly improved when start from MNLI fine-tuned models, however, we only report the numbers fine-tuned from pretrained base models for those 4 tasks.
- 2 To try the XXLarge免费的ai工具 model with HF Transformers猫箱下载安装, you need to specify –sharded_ddp即梦al
cd transformers/examples/text-classificationima是什么软件/
export TASK_NAME=mrpc
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py --model_name_or_path microsoft/deberta-v2-xxlarge \\
--task_name $TASK_NAME --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 4 \\
--learning_rate 3e-6 --num_train_epochs 3 --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp --fp16
Citation
If you find DeBERTa useful for your work, please cite the following paper:ai软件哪个比较好
@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}
数据统计
数据评估
本站菠萝导航提供的microsoft/deberta-xlarge-mnli都来源于网络,不保证外部链接的准确性和完整性,同时,对于该外部链接的指向,不由菠萝导航实际控制,在2023年5月15日 下午3:14收录时,该网页上的内容,都属于合规合法,后期网页的内容如出现违规,可以直接联系网站管理员进行删除,菠萝导航不承担任何责任。元宝大模型

