RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
Robertatransformers에서 지원하는 Roberta를 기반으로 Korquad 데이터를 학습 중 입니다. 한국어를 학습하기 위해서 Multilingual를 지원하는 XLM-RoBERTa를 사용하도록 소스를 수정했습니다. 소스를 수정하고 run_squad.py를 수행하니 다음과 같은 에러가 발생했습니다.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [6,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [6,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [6,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed. Traceback (most recent call last): File "/content/drive/My Drive/models/transformers/examples/run_squad.py", line 858, in main() File "/content/drive/My Drive/models/transformers/examples/run_squad.py", line 797, in main global_step, tr_loss = train(args, train_dataset, model, tokenizer) File "/content/drive/My Drive/models/transformers/examples/run_squad.py", line 231, in train outputs = model(**inputs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_roberta.py", line 677, in forward inputs_embeds=inputs_embeds, File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 806, in forward encoder_attention_mask=encoder_extended_attention_mask, File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 423, in forward hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 384, in forward self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 330, in forward hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 232, in forward mixed_query_layer = self.query(hidden_states) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1372, in linear output = input.matmul(weight.t()) RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)` |
에러 메시지를 보면 시작점에 무언가 문제가 있다고 표시하고 있습니다.
단위 테스트 할 때는 크게 문게가 있어 보이지 않았습니다. 이렇게 또 여러번의 삽질이 들어 갔습니다.
처음부터 변경한 점을 다시 확인하다가 config.json이 RoBERTa로 되어 있었다는걸 확인 했습니다.
초반에 model_type이 아닌 path로 변경하는 걸 테스트 할 때 RoBERTa로 했었습니다.
이 config 파일을 XLMRobertaConfig로 변경하고 정상 수행 확인했습니다.
그래서 둘의 차이점을 확인해 보니 vocab_size가 달랐습니다. 아마 Model에 input을 넣을때 사이즈에 문제가 있었던걸로 보입니다.
추가로 RobertaConfig와 XLMRobertaConfig의 내용을 올립니다.
RobertaConfig
{ "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "do_sample": false, "eos_token_ids": 0, "finetuning_task": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "intermediate_size": 3072, "is_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_eps": 1e-05, "length_penalty": 1.0, "max_length": 20, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_beams": 1, "num_hidden_layers": 12, "num_labels": 2, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_past": true, "pad_token_id": 0, "pruned_heads": {}, "repetition_penalty": 1.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "torchscript": false, "type_vocab_size": 1, "use_bfloat16": false, "vocab_size": 50265 } |
XLMRobertaConfig
{ "architectures": [ "XLMRobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "do_sample": false, "eos_token_ids": 0, "finetuning_task": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "intermediate_size": 3072, "is_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_eps": 1e-05, "length_penalty": 1.0, "max_length": 20, "max_position_embeddings": 514, "model_type": "xlm-roberta", "num_attention_heads": 12, "num_beams": 1, "num_hidden_layers": 12, "num_labels": 2, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_past": true, "pad_token_id": 0, "pruned_heads": {}, "repetition_penalty": 1.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "torchscript": false, "type_vocab_size": 1, "use_bfloat16": false, "vocab_size": 250002 } |
글이 도움되셨다면 공감 부탁 드립니다.
감사합니다.
[Tensorboard]ValueError: Duplicate plugins for name projector (3) | 2019.12.05 |
---|
댓글 영역