fix: GSE数据文件从gse/dict目录加载

This commit is contained in:
2026-04-21 10:24:47 +08:00
parent f671096dbe
commit e051046f77
12 changed files with 1886750 additions and 47 deletions

15
gse/dict/README.md Normal file
View File

@@ -0,0 +1,15 @@
Some dict/zh data is from [github.com/fxsjy/jieba](https://github.com/fxsjy/jieba)
update at 2023-11-16:
add two new dict documents , which from [github.com/GuocaiL/nlp_corpus](https://github.com/GuocaiL/nlp_corpus)
generated by `nlp_corpus/open_ner_data/boson/boson.txt`, `open_ner_data/people_daily/people_daily_ner.txt`, `open_ner_data/tianchi_yiyao/train.txt`,`open_ner_data/ResumeNER/dev.txt`
1. tf_idf.txt
The first column of this document is the term , the second column is the word frequency of the corresponding term, and the third column is the inverse document frequency of the corresponding term
2. tf_idf_origin.txt
the origin corpus text

0
gse/dict/en/dict.txt Normal file
View File

1
gse/dict/jp/README.md Normal file
View File

@@ -0,0 +1 @@
dict.txt 通过内部工具生成, Copyright 2017 ego authors. 商用和拷贝请注明来源和版权

885298
gse/dict/jp/dict.txt Normal file

File diff suppressed because it is too large Load Diff

270132
gse/dict/zh/idf.txt Normal file

File diff suppressed because it is too large Load Diff

352279
gse/dict/zh/s_1.txt Normal file

File diff suppressed because it is too large Load Diff

1161
gse/dict/zh/stop_tokens.txt Normal file

File diff suppressed because it is too large Load Diff

88
gse/dict/zh/stop_word.txt Normal file
View File

@@ -0,0 +1,88 @@
,
.
?
!
"
@
 
~
*
<
>
/
\
|
-
_
+
=
&
^
%
#
`
;
$
︿
哎呀
哎哟
俺们
按照
吧哒
罢了
本着
比方
比如
鄙人
彼此
别的
别说

236754
gse/dict/zh/t_1.txt Normal file

File diff suppressed because it is too large Load Diff

107536
gse/dict/zh/tf_idf.txt Normal file

File diff suppressed because it is too large Load Diff

33450
gse/dict/zh/tf_idf_origin.txt Normal file

File diff suppressed because one or more lines are too long