[CMU Sphinx]언어모델(Language Model : LM)파일 생성법

Typical Usage

Given a large corpus of text in a file a.text, but no specified vocabulary

Compute the word unigram counts

cat a.text | text2wfreq > a.wfreq
Convert the word unigram counts into a vocabulary consisting of the 20,000 most common words

cat a.wfreq | wfreq2vocab -top 20000 > a.vocab
Generate a binary id 3-gram of the training text, based on this vocabulary

cat a.text | text2idngram -vocab a.vocab > a.idngram
Convert the idngram into a binary format language model

idngram2lm -idngram a.idngram -vocab a.vocab -binary a.binlm
Compute the perplexity of the language model, with respect to some test text b.text

evallm -binary a.binlm Reading in language model from file a.binlm Done. evallm : perplexity -text b.text Computing perplexity of the language model with respect to the text b.text Perplexity = 128.15, Entropy = 7.00 bits Computation based on 8842804 words. Number of 3-grams hit = 6806674 (76.97%) Number of 2-grams hit = 1766798 (19.98%) Number of 1-grams hit = 269332 (3.05%) 1218322 OOVs (12.11%) and 576763 context cues were removed from the calculation. evallm : quit

Alternatively, some of these processes can be piped together:

cat a.text | text2wfreq | wfreq2vocab -top 20000 > a.vocab
cat a.text | text2idngram -vocab a.vocab | \
   idngram2lm -vocab a.vocab -idngram - \
   -binary a.binlm -spec_num 5000000 15000000
echo "perplexity -text b.text" | evallm -binary a.binlm

==============================================================================================================

윈도우 상에서 구동하기 위해서는 cat 명령어 대신 type 명령어를 사용한다
아래는 기본적으로 서로 대응하는 리눅스와 윈도우 커맨드창(DOS) 명령어

list 보기                                ls          /          dir
디렉토리 생성                       mkdir        /          mkdir , md
디렉토리 삭제                       rmdir        /          rmdir , rd
디렉토리 트리                       ls -R         /          tree
파일 삭제                              rm          /          del , erase
파일 복사                              cp          /          copy
파일 이동                              mv         /          move
이름 변경                              mv         /          rename
change directory                   cd          /          cd
현재 디렉토리 표시                 pwd          /          cd
화면 정리                              clear       /          cls
명령어 해석기                    sh, csh, bash /          command.com
파일 내용 표시                      cat           /          type
도움말, 메뉴얼                      man          /          help
쉘 종료, 도스창 종료               exit           /          exit
시간 표시                             date         /          time
그대로 출력                          echo         /          echo
환경변수 표시                     set,env        /          set
경로 보기                        echo $PATH    /          path
버전 정보                         uname -a      /           ver

저작자표시 비영리 동일조건

'과제모음' 카테고리의 다른 글

[CMU Sphinx]SAX Parse를 이용한 gram 파일 자동작성 (0)	2010.01.22
Installing HTK on Microsoft Windows(compiling HTK using Microsoft Visual Studio) (0)	2010.01.22
CMU SPHINX - JAVA 음성인식 (0)	2010.01.22
HTK음성인식 초기훈련 (0)	2010.01.22
Running the HTK Demo(htk demo) (0)	2010.01.22

Audit & Security

[CMU Sphinx]언어모델(Language Model : LM)파일 생성법

Typical Usage

'과제모음' 카테고리의 다른 글

티스토리툴바

[CMU Sphinx]언어모델(Language Model : LM)파일 생성법

Typical Usage

'과제모음' 카테고리의 다른 글

'과제모음' Related Articles

티스토리툴바