文章浏览阅读445次。字级别分词,不要用官方的tokenizer (https://github.com/google-research/bert/blob/master/tokenization.py)自己重写一个def tokenize_to_str_list(textString): split_tokens = [] for i in range(len(textString)): split_tokens.append(textString[i]) ......
文章浏览阅读445次。字级别分词,不要用官方的tokenizer (https://github.com/google-research/bert/blob/master/tokenization.py)自己重写一个def tokenize_to_str_list(textString): split_tokens = [] for i in range(len(textString)): split_tokens.append(textString[i]) ......