dpr-question_encoder-fr_qa-camembert

Description

French DPR model using CamemBERT as base and then fine-tuned on a combo of three French Q&Agrok中文版下载

Data

French Q&A

We use a combination of three French Q&A datasets:al一键脱装入口

piaf免费的ai工具v1.1
FQuADai软件哪个比较好v1.0
SQuAD-FR (SQuAD automatically translated to French)

Training

We are using 90 562 random questions for train and 22 391 for dev. No question in train exists in dev. For each question, we have a single positive_context (the paragraph where the answer to this question is found) and around 30 hard_negtive_contexts. Hard negative contexts are found by querying an ES instance (via bm25 retrieval) and getting the top-k candidates that do not contain the answer免费的ai工具.

The files are over here.免费的ai工具

Evaluation

We use FQuADv1.0 and French-SQuAD evaluation sets.ai软件哪个比较好

Training Script

We use the official Facebook DPR implentation with a slight modification: by default, the code can work with Roberta models, still we changed a single line to make it easier to work with Camembert. This modification can be found over here.百度aiapp

Hyperparameters

python -m torch.distributed.launch --nproc_per_node=8 train_dense_encoder.py \
--max_grad_norm 2.0 --encoder_model_type hf_bert --pretrained_file data/bert-base-multilingual-uncased \
--seed 12345 --sequence_length 256 --warmup_steps 1237 --batch_size 16 --do_lower_case \
--train_file DPR_FR_train.json \
--dev_file  ./data/100_hard_neg_ctxs/DPR_FR_dev.json \
--output_dir ./output/bert --learning_rate 2e-05 --num_train_epochs 35 \
--dev_batch_size 16 --val_av_rank_start_epoch 25 \
--pretrained_model_cfg ./data/bert-base-multilingual-uncased

Evaluation results

We obtain the following evaluation by using FQuAD and SQuAD-FR evaluation (or validation) sets. To obtain these results, we use haystack’s evaluation script (we report Retrieval results onlyima是什么软件).

DPR

FQuAD v1.0 Evaluation

For 2764 out of 3184 questions (86.81%), the answer was in the top-20 candidate passages selected by the retriever.
Retriever Recall: 0.87
Retriever Mean Avg Precision: 0.57

SQuAD-FR Evaluation

For 8945 out of 10018 questions (89.29%), the answer was in the top-20 candidate passages selected by the retriever.
Retriever Recall: 0.89
Retriever Mean Avg Precision: 0.63

BM25

For reference, BM25 gets the results shown below. As in the original paper, regarding SQuAD-like datasets, the results of DPR are consistently superseeded by BM25.al一键脱装入口

FQuAD v1.0 Evaluation

For 2966 out of 3184 questions (93.15%), the answer was in the top-20 candidate passages selected by the retriever.
Retriever Recall: 0.93
Retriever Mean Avg Precision: 0.74

SQuAD-FR Evaluation

For 9353 out of 10018 questions (93.36%), the answer was in the top-20 candidate passages selected by the retriever.
Retriever Recall: 0.93
Retriever Mean Avg Precision: 0.77

Usage

The results reported here are obtained with the haystack library. To get to similar embeddings using exclusively HF Transformersai是什么东西? library, you can do the following:

from transformers import AutoTokenizer, AutoModel
query = "Salut, mon chien est-il mignon ?"
tokenizer = AutoTokenizer.from_pretrained("etalab-ia/dpr-question_encoder-fr_qa-camembert",  do_lower_case=True)
input_ids = tokenizer(query, return_tensors='pt')["input_ids"]
model = AutoModel.from_pretrained("etalab-ia/dpr-question_encoder-fr_qa-camembert", return_dict=True)
embeddings = model.forward(input_ids).pooler_output
print(embeddings)

And with haystack, we use it as a retriever:

retriever = DensePassageRetriever(
    document_store=document_store,
    query_embedding_model="etalab-ia/dpr-question_encoder-fr_qa-camembert",
    passage_embedding_model="etalab-ia/dpr-ctx_encoder-fr_qa-camembert",
    model_version=dpr_model_tag,
    infer_tokenizer_classes=True,
)

Acknowledgments

This work was performed using HPC resources from GENCI–IDRIS (Grant 2020-AD011011224).下载官方即梦a1

Citations

Datasets

PIAF

@inproceedings{KeraronLBAMSSS20,
  author    = {Rachel Keraron and
               Guillaume Lancrenon and
               Mathilde Bras and
               Fr{\'{e}}d{\'{e}}ric Allary and
               Gilles Moyse and
               Thomas Scialom and
               Edmundo{-}Pavel Soriano{-}Morales and
               Jacopo Staiano},
  title     = {Project {PIAF:} Building a Native French Question-Answering Dataset},
  booktitle = {{LREC}},
  pages     = {5481--5490},
  publisher = {European Language Resources Association},
  year      = {2020}
}

FQuAD

@article{dHoffschmidt2020FQuADFQ,
  title={FQuAD: French Question Answering Dataset},
  author={Martin d'Hoffschmidt and Maxime Vidal and Wacim Belblidia and Tom Brendl'e and Quentin Heinrich},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.06071}
}

SQuAD-FR

 @MISC{kabbadj2018,
   author =       "Kabbadj, Ali",
   title =        "Something new in French Text Mining and Information Extraction (Universal Chatbot): Largest Q&A French training dataset (110 000+) ",
   editor =       "linkedin.com",
   month =        "November",
   year =         "2018",
   url =          "\url{https://www.linkedin.com/pulse/something-new-french-text-mining-information-chatbot-largest-kabbadj/}",
   note =         "[Online; posted 11-November-2018]",
 }

Models

CamemBERT

HF model card : https://huggingface.co/camembert-base即梦下载官方

@inproceedings{martin2020camembert,
  title={CamemBERT: a Tasty French Language Model},
  author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  year={2020}
}

DPR

@misc{karpukhin2020dense,
    title={Dense Passage Retrieval for Open-Domain Question Answering},
    author={Vladimir Karpukhin and Barlas Oğuz and Sewon Min and Patrick Lewis and Ledell Wu and Sergey Edunov and Danqi Chen and Wen-tau Yih},
    year={2020},
    eprint={2004.04906},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

etalab-ia/dpr-question_encoder-fr_qa-camembert浏览人数已经达到867，如你需要查询该站的相关权重信息，可以点击"5118数据元宝大模型""爱站数据ai软件哪个比较好""Chinaz数据al一键脱装入口"进入；以目前的网站数据参考，建议大家请以爱站数据为准，更多网站价值评估因素如：etalab-ia/dpr-question_encoder-fr_qa-camembert的访问速度、搜索引擎收录以及索引量、用户体验等；当然要评估一个站的价值，最主要还是需要根据您自身的需求以及需要，一些确切的数据则需要找etalab-ia/dpr-question_encoder-fr_qa-camembert的站长进行洽谈提供。如该站的IP、PV、跳出率等！

特别声明

本站菠萝导航提供的etalab-ia/dpr-question_encoder-fr_qa-camembert都来源于网络，不保证外部链接的准确性和完整性，同时，对于该外部链接的指向，不由菠萝导航实际控制，在2023年5月9日下午7:16收录时，该网页上的内容，都属于合规合法，后期网页的内容如出现违规，可以直接联系网站管理员进行删除，菠萝导航不承担任何责任。免费的ai工具

菠萝导航致力于优质、实用的网络站点资源收集与分享！本文地址https://huanlankj.com/sites/3046.html转载请注明

microsoft/unixcoder-base百度流畅ai制作

Model Card for UniXcoder-ba...元宝大模型

facebook/dpr-question_encoder-single-nq-baseai是什么东西?

dpr-question_encoder-single...grok中文版下载

dmis-lab/biobert-v1.1有戏ai

microsoft/wavlm-base-plus百度ai智能云

WavLM-Base-Plus Microsoft...下载官方即梦a1

DeepPavlov/rubert-base-cased百度流畅ai制作

rubert-base-cased RuBERT ...下载官方即梦a1

deepset/gbert-base-germandpr-question_encoder百度ai智能云

Overview Language model: ...即梦下载官方

暂无评论快问ai

暂无评论...grok中文版下载

etalab-ia/dpr-question_encoder-fr_qa-camembert即梦下载官方

dpr-question_encoder-fr_qa-camembert

Description

Data

French Q&A

Training

Evaluation

Training Script

Hyperparameters

Evaluation results

DPR

FQuAD v1.0 Evaluation

SQuAD-FR Evaluation

BM25

FQuAD v1.0 Evaluation

SQuAD-FR Evaluation

Usage

Acknowledgments

Citations

Datasets

PIAF

FQuAD

SQuAD-FR

Models

CamemBERT

DPR

数据统计

数据评估

相关导航

暂无评论快问ai

热门标签

随机网址

etalab-ia/dpr-question_encoder-fr_qa-camembert即梦下载官方

dpr-question_encoder-fr_qa-camembert

Description

Data

French Q&A

Training

Evaluation

Training Script

Hyperparameters

Evaluation results

DPR

FQuAD v1.0 Evaluation

SQuAD-FR Evaluation

BM25

FQuAD v1.0 Evaluation

SQuAD-FR Evaluation

Usage

Acknowledgments

Citations

Datasets

PIAF

FQuAD

SQuAD-FR

Models

CamemBERT

DPR

数据统计

数据评估

相关导航

暂无评论快问ai

热门标签

随机网址

广告位制作ai的软件