- 文章: 1031
- 活躍值: 21
- 謝謝您收到: 166
關於智慧搜尋的中文作法說明
- 羽城君拉
- Topic Author
- Offline
- 管理員
Less
More
2012-03-20 01:33 #21829
來自 羽城君拉
...
羽城君拉 created the topic: 關於智慧搜尋的中文作法說明
剛好有網友在問智慧搜尋怎麼用在中文上,我之前測過也有找到程式碼,中文是怎麼作的?以下是程式碼的相關註解說明:
If we have Unicode support and are dealing with Chinese text, Chinese has to be handled specially because there are not necessarily any spaces between the "words". So, we have to test if the words belong to the Chinese character set and if so, explode them into single glyphs or "words". Chinese, Japanese, Lao, Khmer, Thai, Myanmar and Tibetan have to be handled specially because there are not necessarily any spaces between the "words." So, we have to test if the words belong to the specific character set and if so, explode them into single glyphs or "words."
Note: Modern Korean uses spaces so Korean texts do not need to be separated.
https://github.com/elinw/joomla-cms/commit/318523fd116cc0fe545f5361bd1ff7d5b67402af#diff-0
英文字是用"空白"來分詞的,但所以像中文或日文等文字不行。
只能被當成一個一個字這樣作索引。
至於中文字中的自然或模糊搜尋,那是一個更專門的研究科目,智慧搜尋裡的中文或日文,只是一個basic(基本)的支援而已。
效率…當然沒那麼好。
有興趣的可以依照我剛回覆的一個問題試試了:
http://www.joomla.org.tw/component/kunena/Joomla-25x/21827-%E6%99%BA%E6%85%A7%E6%90%9C%E5%B0%8B%E7%84%A1%E6%B3%95%E4%BD%BF%E7%94%A8%E4%B8%AD%E6%96%87?Itemid=0
If we have Unicode support and are dealing with Chinese text, Chinese has to be handled specially because there are not necessarily any spaces between the "words". So, we have to test if the words belong to the Chinese character set and if so, explode them into single glyphs or "words". Chinese, Japanese, Lao, Khmer, Thai, Myanmar and Tibetan have to be handled specially because there are not necessarily any spaces between the "words." So, we have to test if the words belong to the specific character set and if so, explode them into single glyphs or "words."
Note: Modern Korean uses spaces so Korean texts do not need to be separated.
https://github.com/elinw/joomla-cms/commit/318523fd116cc0fe545f5361bd1ff7d5b67402af#diff-0
英文字是用"空白"來分詞的,但所以像中文或日文等文字不行。
只能被當成一個一個字這樣作索引。
至於中文字中的自然或模糊搜尋,那是一個更專門的研究科目,智慧搜尋裡的中文或日文,只是一個basic(基本)的支援而已。
效率…當然沒那麼好。
有興趣的可以依照我剛回覆的一個問題試試了:
http://www.joomla.org.tw/component/kunena/Joomla-25x/21827-%E6%99%BA%E6%85%A7%E6%90%9C%E5%B0%8B%E7%84%A1%E6%B3%95%E4%BD%BF%E7%94%A8%E4%B8%AD%E6%96%87?Itemid=0
...
Please 登入 to join the conversation.