自然語言工具箱

喺廿一世紀初，NLTK 廣受 NLP 相關領域嘅工作者採用：語言學、認知科學、人工智能同埋資訊科學等領域嘅工作，都成日要教電腦處理文字數據；而經驗表明 NLTK 好方便好好使，於是就成為咗 NLP 上嘅一隻標準架生，呢啲咁多唔同領域嘅工作者都會用到 NLTK 寫程式。

重要功能

畀用家下載同引入語料庫嘅各種資源，包括出名嘅文學作品、字表、詞典同埋語義網絡呀噉，例：

import nltk.corpus # 引入 NLTK 嘅語料庫  emma = nltk.corpus.gutenberg.words('austen-emma.txt') # 將 emma 設做 nltk.corpus 入面嘅... print(emma) # output 出 emma。

畀用家用一句陳述式就搵出字之間嘅關係，例：

... moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt')) # 將 moby 設做一嚿 Text 物件，入面啲字嚟自 nltk.corpus 嘅 'melville-moby_dick.txt'...  print(moby.concordance("monstrous")) # Output 畀出所有 "monstrous" 呢隻字出現嘅 context。 print(moby.similar("monstrous")) # Output 畀出所有 context 上同 "monstrous" 相近嘅字。

畀用家用一句陳述式畫圖顯示字詞嘅頻率，例：

... moby.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])  # 出幅圖顯示 "citizen", "democracy", "freedom", "duties", "America" 呢幾隻字喺 moby 唔同部份入面出現嘅頻率。

畀用家引入語義網絡同搵字嘅同義字，仲可以做「搵兩隻字之間最低嘅共同上位詞」或者「計兩隻字之間嘅語義距離」等，例^{:Ch. 2.5}：

from nltk.corpus import wordnet as wn  print(wn.synsets('word')) wn.synset('word.n.01').path_similarity(wn.synset('whale.n.01')) # Output 畀出 "word" 呢隻字嘅同義字。

畀玩家用一句陳述式做記號化、詞形還原、字幹提取同埋攞走停用詞等嘅事前處理，例如 word_tokenize(my_string) 噉。
畀玩家用一句陳述式計啲字喺一段語料當中嘅頻率分佈（FreqDist(my_text) 同 ConditionalFreqDist(my_text_cond)）^{:Ch. 2}。
畀玩家用一句陳述式做 regex 同相關嘅功能^{:Ch. 3.4, p. 117}，好似係 re.search('ed$', w)（搵 w 當中 -ed 尾嘅字）、re.search(^..j..t..$', w)（搵 w 當中 ..j..t.. 噉嘅字，當中 . 係乜字母都得）同 re.findall(r'[aeiou]', w)（由 w 當中搵出嗮所有 aeiou）... 呀噉。
畀玩家用一句陳述式整 n-gram。

... 呀噉。

睇埋

Python
SpaCy
WordNet
關聯陣列
文本文件
網頁刮料，NLTK 成日會配合呢方面嘅功能嚟用^{:Ch. 3}。

文獻

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media, Inc.

引咗

拎

This article uses material from the Wikipedia 粵語 article 自然語言工具箱, which is released under the Creative Commons Attribution-ShareAlike 3.0 license ("CC BY-SA 3.0"); additional terms may apply (view authors). 呢度嘅所有文字係根據 CC BY-SA 4.0 牌照嘅條款發佈；可能會有附加嘅條款。 Images, videos and audio are available under their respective licenses.
®Wikipedia is a registered trademark of the Wiki Foundation, Inc. Wiki 粵語 (DUHOCTRUNGQUOC.VN) is an independent company and has no affiliation with Wiki Foundation.

自然語言工具箱

重要功能

睇埋

文獻

引咗

拎

Tags:

🔥 Trending searches on Wiki 粵語: