pcmind200b_new_predict

History

imyzx 37293aaca5 'update'		5 months ago
..
llama_vocab/llama_zh_hf	'update'	5 months ago

task_dataset	raw_ma_910B_pcmind200b	5 months ago

readme.txt	raw_ma_910B_pcmind200b	5 months ago

tokenlizer_test.py	raw_ma_910B_pcmind200b	5 months ago

readme.txt

使用方法
# 1、Define tokenizer

from tokenizer.spm_13w.tokenizer import SpmTokenizer
vocab_file = '/path/to/spm.133952.PanGu.model'

tokenizer = SpmTokenizer(vocab_file)
EOT = tokenizer.eot_id # 128298
EOD = tokenizer.eod_id # 128299
PAD = tokenizer.pad_id # 128297

vocab_size = tokenizer.vocab_size # 133952

# Tokenize input sentence to ids
input_sentence = "你今天中午吃的什么？"
input_id = tokenizer.encode(input_sentence)

# Decode output ids to sentence
output_sentence = tokenizer.decode(output_id])


# 【提醒】不需要提前分词

No Description

Python Shell

How to access data resources in code

readme.txt

Contributors (1) All

Contributors (1)
All