關於智慧搜尋的中文作法說明

最先
上一篇
1
下一篇
最後

羽城君拉
Topic Author
Offline
管理員

2012-03-20 01:33 #21829 來自羽城君拉

羽城君拉 created the topic: 關於智慧搜尋的中文作法說明

剛好有網友在問智慧搜尋怎麼用在中文上，我之前測過也有找到程式碼，中文是怎麼作的？以下是程式碼的相關註解說明：

If we have Unicode support and are dealing with Chinese text, Chinese has to be handled specially because there are not necessarily any spaces between the "words". So, we have to test if the words belong to the Chinese character set and if so, explode them into single glyphs or "words". Chinese, Japanese, Lao, Khmer, Thai, Myanmar and Tibetan have to be handled specially because there are not necessarily any spaces between the "words." So, we have to test if the words belong to the specific character set and if so, explode them into single glyphs or "words."

Note: Modern Korean uses spaces so Korean texts do not need to be separated.

https://github.com/elinw/joomla-cms/commit/318523fd116cc0fe545f5361bd1ff7d5b67402af#diff-0

英文字是用"空白"來分詞的，但所以像中文或日文等文字不行。
只能被當成一個一個字這樣作索引。

至於中文字中的自然或模糊搜尋，那是一個更專門的研究科目，智慧搜尋裡的中文或日文，只是一個basic（基本）的支援而已。

效率…當然沒那麼好。

有興趣的可以依照我剛回覆的一個問題試試了：
http://www.joomla.org.tw/component/kunena/Joomla-25x/21827-%E6%99%BA%E6%85%A7%E6%90%9C%E5%B0%8B%E7%84%A1%E6%B3%95%E4%BD%BF%E7%94%A8%E4%B8%AD%E6%96%87?Itemid=0

...

Please 登入 to join the conversation.

最先
上一篇
1
下一篇
最後

討論區核心： Kunena 論壇