16 lines
675 B
Markdown
16 lines
675 B
Markdown
Some dict/zh data is from [github.com/fxsjy/jieba](https://github.com/fxsjy/jieba)
|
|
|
|
update at 2023-11-16:
|
|
|
|
add two new dict documents , which from [github.com/GuocaiL/nlp_corpus](https://github.com/GuocaiL/nlp_corpus)
|
|
|
|
generated by `nlp_corpus/open_ner_data/boson/boson.txt`, `open_ner_data/people_daily/people_daily_ner.txt`, `open_ner_data/tianchi_yiyao/train.txt`,`open_ner_data/ResumeNER/dev.txt`
|
|
|
|
1. tf_idf.txt
|
|
|
|
The first column of this document is the term , the second column is the word frequency of the corresponding term, and the third column is the inverse document frequency of the corresponding term
|
|
|
|
2. tf_idf_origin.txt
|
|
|
|
the origin corpus text
|