GUI 다운로더없이 NLTK 말뭉치 / 모델을 프로그래밍 방식으로 설치 하시겠습니까?

programing

GUI 다운로더없이 NLTK 말뭉치 / 모델을 프로그래밍 방식으로 설치 하시겠습니까?

goodcopy 2021. 1. 14. 23:14

GUI 다운로더없이 NLTK 말뭉치 / 모델을 프로그래밍 방식으로 설치 하시겠습니까?

내 프로젝트는 NLTK를 사용합니다. 프로젝트의 말뭉치 및 모델 요구 사항을 나열하여 자동으로 설치되도록하려면 어떻게해야합니까? nltk.download()GUI 를 클릭 하여 패키지를 하나씩 설치 하고 싶지 않습니다 .

또한 동일한 요구 사항 목록 (예 :)을 고정하는 방법이 pip freeze있습니까?

NLTK 사이트는이 페이지 하단에 패키지 및 컬렉션 다운로드를위한 명령 줄 인터페이스를 나열합니다.

http://www.nltk.org/data

명령 줄 사용법은 사용중인 Python 버전에 따라 다르지만 Python2.6 설치에서 'spanish_grammar'모델이 누락 된 것을 발견했습니다.

python -m nltk.downloader spanish_grammars

당신은 프로젝트의 말뭉치와 모델 요구 사항을 나열하는 것을 언급했고, 자동적으로 그렇게하는 방법은 확실하지 않지만 적어도 이것을 공유 할 것이라고 생각했습니다.

모든 NLTK 말뭉치 및 모델을 설치하려면 :

python -m nltk.downloader all

또는 Linux에서 다음을 사용할 수 있습니다.

sudo python -m nltk.downloader -d /usr/local/share/nltk_data all

교체 all가 popular방금 가장 인기 말뭉치 및 모델을 나열합니다.

명령 줄을 통해 말뭉치 및 모델을 찾아 볼 수도 있습니다.

mlee@server:/scratch/jjylee/tests$ sudo python -m nltk.downloader
[sudo] password for jjylee:
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> l
Packages:
  [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian)
  [ ] basque_grammars..... Grammars for Basque
  [ ] bllip_wsj_no_aux.... BLLIP Parser: WSJ Model
  [ ] book_grammars....... Grammars from NLTK Book
  [ ] cess_esp............ CESS-ESP Treebank
  [ ] chat80.............. Chat-80 Data Files
  [ ] city_database....... City Database
  [ ] cmudict............. The Carnegie Mellon Pronouncing Dictionary (0.6)
  [ ] comparative_sentences Comparative Sentence Dataset
  [ ] comtrans............ ComTrans Corpus Sample
  [ ] conll2000........... CONLL 2000 Chunking Corpus
  [ ] conll2002........... CONLL 2002 Named Entity Recognition Corpus
  [ ] conll2007........... Dependency Treebanks from CoNLL 2007 (Catalan
                           and Basque Subset)
  [ ] crubadan............ Crubadan Corpus
  [ ] dependency_treebank. Dependency Parsed Treebank
  [ ] europarl_raw........ Sample European Parliament Proceedings Parallel
                           Corpus
  [ ] floresta............ Portuguese Treebank
  [ ] framenet_v15........ FrameNet 1.5
Hit Enter to continue: 
  [ ] framenet_v17........ FrameNet 1.7
  [ ] gazetteers.......... Gazeteer Lists
  [ ] genesis............. Genesis Corpus
  [ ] gutenberg........... Project Gutenberg Selections
  [ ] hmm_treebank_pos_tagger Treebank Part of Speech Tagger (HMM)
  [ ] ieer................ NIST IE-ER DATA SAMPLE
  [ ] inaugural........... C-Span Inaugural Address Corpus
  [ ] indian.............. Indian Language POS-Tagged Corpus
  [ ] jeita............... JEITA Public Morphologically Tagged Corpus (in
                           ChaSen format)
  [ ] kimmo............... PC-KIMMO Data Files
  [ ] knbc................ KNB Corpus (Annotated blog corpus)
  [ ] large_grammars...... Large context-free and feature-based grammars
                           for parser comparison
  [ ] lin_thesaurus....... Lin's Dependency Thesaurus
  [ ] mac_morpho.......... MAC-MORPHO: Brazilian Portuguese news text with
                           part-of-speech tags
  [ ] machado............. Machado de Assis -- Obra Completa
  [ ] masc_tagged......... MASC Tagged Corpus
  [ ] maxent_ne_chunker... ACE Named Entity Chunker (Maximum entropy)
  [ ] moses_sample........ Moses Sample Models
Hit Enter to continue: x


Download which package (l=list; x=cancel)?
  Identifier> conll2002
    Downloading package conll2002 to
        /afs/mit.edu/u/m/mlee/nltk_data...
      Unzipping corpora/conll2002.zip.

---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader>

이미 언급 한 명령 줄 옵션 외에도 download()함수에 인수를 추가하여 Python 스크립트에 프로그래밍 방식으로 NLTK 데이터를 설치할 수 있습니다 .

help(nltk.download)구체적으로 텍스트를 참조하십시오 .

Individual packages can be downloaded by calling the ``download()``
function with a single argument, giving the package identifier for the
package that should be downloaded:

    >>> download('treebank') # doctest: +SKIP
    [nltk_data] Downloading package 'treebank'...
    [nltk_data]   Unzipping corpora/treebank.zip.

한 번에 하나의 패키지를 다운로드 list하거나 또는 tuple.

>>> import nltk
>>> nltk.download('wordnet')
[nltk_data] Downloading package 'wordnet' to
[nltk_data]     C:\Users\_my-username_\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\wordnet.zip.
True

이미 다운로드 한 패키지를 문제없이 다운로드 할 수도 있습니다.

>>> nltk.download('wordnet')
[nltk_data] Downloading package 'wordnet' to
[nltk_data]     C:\Users\_my-username_\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
True

Also, it appears the function returns a boolean value that you can use to see whether or not the download succeeded:

>>> nltk.download('not-a-real-name')
[nltk_data] Error loading not-a-real-name: Package 'not-a-real-name'
[nltk_data]     not found in index
False

I've managed to install the corpora and models inside a custom directory using the following code:

import nltk
nltk.download(info_or_id="popular", download_dir="/path/to/dir")
nltk.data.path.append("/path/to/dir")

this will install "all" corpora/models inside /path/to/dir, and will let know NLTK where to look for it (data.path.append).

You can't «freeze» the data in a requirements file, but you could add this code to your __init__ besides come code to check if the files are already there.

ReferenceURL : https://stackoverflow.com/questions/5843817/programmatically-install-nltk-corpora-models-i-e-without-the-gui-downloader

'programing' 카테고리의 다른 글

HTML 텍스트 입력에서 텍스트를 선택할 때 강조 색상 변경 (0)	2021.01.14
mysql_escape_string VS mysql_real_escape_string (0)	2021.01.14
ListView.addHeaderView ()를 호출 할 때 ClassCastException이 발생합니까? (0)	2021.01.14
RedirectToAction에 모델을 어떻게 포함합니까? (0)	2021.01.14
파이썬에서 여러 줄의 원시 입력을 어떻게 읽습니까? (0)	2021.01.14

현재글GUI 다운로더없이 NLTK 말뭉치 / 모델을 프로그래밍 방식으로 설치 하시겠습니까?

각종 프로그래밍 정보를 다루는 블로그입니다.

javascript, c++, C, jquery, Java, C#, vuejs2, spring3, spring,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

goodcopy

GUI 다운로더없이 NLTK 말뭉치 / 모델을 프로그래밍 방식으로 설치 하시겠습니까?

GUI 다운로더없이 NLTK 말뭉치 / 모델을 프로그래밍 방식으로 설치 하시겠습니까?

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

GUI 다운로더없이 NLTK 말뭉치 / 모델을 프로그래밍 방식으로 설치 하시겠습니까?

GUI 다운로더없이 NLTK 말뭉치 / 모델을 프로그래밍 방식으로 설치 하시겠습니까?

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바