Namazu 2.0 tutorial


This tutorial is for users who begin using Namazu 2.0.

Table of Contents

Mission

This tutorial is written for

in order to reduce the workload when using Namazu. Please refer manual to learn all features in Namazu. Also, installation guide is given in INSTALL file.

History of development

History of Namazu development from 1.3.0.x through 2.0 is as follows.

1.3.0.x
Old stable version. Recommend to use 1.3.0.11, since the versions 1.3.0.10 or earlier may create junk files from outside.
1.3.0.11 is the most current version.
1.3.1.0
Development version. Introduce a check point function (-s option: mknmz periodically "exec" itself to prevent the explosion of process.) However, this version was not released to the public and the development was transferred to 1.4.0.0.
1.4.0.0
Development version. Improve performance using Perl modules
However, this version was not released to the public and the development was transferred to 1.9.x
1.9.x
Development version. In-progress versions that are released during the development of version 2.0
since the versions
2.0
New stable version.
current
In-progress/On-going/Current(??) versions current can be obtained by CVS.

Namazu components

Namazu broadly consists of three components, mknmz, namazu, namazu.cgi.

Preparation and make

You need the following softwares to build Namazu 2.0.

NameDescription Status Current VersionRequired Version File name Development and Distribution Sources(Example) Others
PerlPerl Language Required5.005_035.004 perl-5.005_03.tar.gz Larry Wall GNU CPAN CPAN
nkfNetwork Kanji Filter for Japanese processing only1.91.71 nkf171.shar Shinji Kono University of Ryukuyus
NKFnkf Perl Module for Japanese processing only. ++1.91.71
KAKASI Japanese/Romaji Conversionfor Japanese processing only. **2.3.1<= kakasi-2.3.1.tar.gz KAKASI Project namazu.org
Text::Kakasi KAKASI Perl Modulefor Japanese processing only. ++ 1.01<= Text-Kakasi-1.01.tar.gz NOKUBI Takatsugu Text::Kakasi
ChaSen (ChaSen) -- Japanese Morphology Analyzer for Japanese processing only. ** 2.02<= chasen-sys2.02 -ipadic2.1.tar.gz Nara Institute of Sicence and Technology | ftp:// Distribution Policy For libchasen.a in ChaSen 2.02 or earlier, refer below.
Text::ChaSen ChaSen Perl Module for Japanese processing only. ++ 0.20<= Text-ChaSen-0.20.tar.gz NOKUBI Takatsugu Text::ChaSen
File::MMagic File Type Included0.20<= File-MMagic-0.20.tar.gz NOKUBI Takatsugu CPAN dist This is packaged in Namazu distribution.

Japanese Environment

You will need to set up LANG environment variable to use Namazu 2.0 under Japanese environment. For Windows, LANG is set to ja_JP.SJIS. For Unix, LANG is set to ja, ja_JP.eucJP, ja_JP.EUC, or ja_JP.ujis. The appropriate LANG will depends on your system. If you have no clues, just set LANG to ja.

Test before "make install"

If you wish to test mknmz before make install, do cd scripts and ./mknmz. This will refer adjacent ../pl/*.pl. (not /usr/local/share/namazu).

(To know more about this, see @INC of load_module() in mknmz.)

You can, for example, try


    ./mknmz -C
    ./mknmz --help
    ./mknmz -O /tmp ~/Mail

Help Menu

If you just type mknmz or namazu with no argument, a short usage will be displayed. If you feed --help as an argument, a long usage will be displayed. The option -C will display the configurations at the time. Useful to remember these 3 option usages.

How to get help menum in command-line
ArgumentMeaningOther Arguments
None Short UsageCannot add any argument
--helpLong Usage Ignores other arguments
-C Configurations Other arguments will have meanings.

Running mknmz

First, create index. (If you wish to run mknmz before make install, please see Test before mknmz make install)
Format are changed slightly from versions 1.4.0.8. URI replacement is dealt with by specifying --replace option. URI replacement can be done during namazu/namazu.cgi execution. In this case, run mknmz without --replace option, and setup .namazurc so that URI replacement is performed during namazu/namazu.cgi execution.

Run mknmz as follows.

mknmz [options] target directory

The above example creates index in the current directory. Use -O option to specify the output directory.

For example,

      mkdir /tmp/index
      mknmz -O /tmp/index \
      --replace='s#/foo/bar/doc/#http://foo.bar.jp/software/#' \
      /foo/bar/doc

mknmz will output the following messages during the creation of index. If you wish to display messages in Japansese, please refer to Japanese Envirnment.


    14 files are found to be indexed.
    1/14 - /foo/bar/acrobat3.pdf [application/pdf]
    2/14 - /foo/bar/excel97.xls [application/excel]
    3/14 - /foo/bar/html.html [text/html]
    4/14 - /foo/bar/mail-multipart.txt [message/rfc822]
    5/14 - /foo/bar/mail.txt [message/rfc822]
    6/14 - /foo/bar/man.1 [text/x-roff]
    7/14 - /foo/bar/msg00000.html [text/html; x-type=mhonarc]
    8/14 - /foo/bar/plain.txt [text/plain]
    9/14 - /foo/bar/plain.txt.Z [text/plain]
    10/14 - /foo/bar/plain.txt.bz2 [text/plain]
    11/14 - /foo/bar/plain.txt.gz [text/plain]
    12/14 - /foo/bar/rfc0000.txt [text/plain; x-type=rfc]
    13/14 - /foo/bar/tex.tex [application/x-tex]
    14/14 - /foo/bar/word97.doc [application/msword]
    Writing index files...
    [Base]
    Date:                Thu Mar 16 22:14:01 2000
    Added Documents:     14
    Size (bytes):        58,701
    Total Documents:     14
    Added Keywords:      95
    Total Keywords:      95
    Wakati:              module_kakasi -ieuc -oeuc -w
    Time (sec):          14
    File/Sec:            1.00
    System:              linux
    Perl:                5.00503
    Namazu:              2.0.2

Customizing mknmz

Namazu was originally developed for processing HTML documents, Namazu can now deal with various document styles. You will find useful scripts in /usr/local/share/namazu/filter, and detailed explanation will be found in Document filters in Namazu manual.

Mails in MH format
run mknmz
% mknmz ~/Mail/foobar
MHonArc
Namazu will do specific processing for MHonArc HTML.
hnf
.mknmzrc for hnf and guide can be obtained from Hyper NIKKI System
Documents stored in other machines
Cannot search documents using Namazu alone. Need to use other tools (eg. wget, NFS) that transfer the documents in combination.

For mknmz command-line arguments, you get usage information from mknmz --help. With -C option, you get the configurations of the time.


    Loaded rcfile: /home/foobar/.mknmzrc
    System: linux
    Namazu: 2.0.2
    Perl: 5.00503
    NKF: module_nkf
    KAKASI: module_kakasi -ieuc -oeuc -w
    ChaSen: module_chasen -j -F '%m '
    Wakati: module_kakasi -ieuc -oeuc -w
    Lang: C
    Coding System: euc
    CONFDIR: /usr/local/etc/namazu
    LIBDIR: /usr/local/share/namazu/pl
    FILTERDIR: /usr/local/share/namazu/filter
    TEMPLATEDIR: /usr/local/share/namazu/template
    Supported media types: 
      application/excel
      application/msword
      application/pdf
      application/x-bzip2
      application/x-compress
      application/x-gzip
      application/x-tex
      message/news
      message/rfc822
      text/hnf
      text/html
      text/html; x-type=mhonarc
      text/plain
      text/plain; x-type=rfc
      text/x-roff

Targets of index creation

short namelong namedescription
-F--target-list=FILEread in list of target files for index creation
-t--media-type=MTYPEspecify the document format of target files
--allow=PATTERN specify the regular expression of target file names.
--deny=PATTERN specify the regular expression of to-be-excluded file names.
--exclude=PATTERNspecify the regular expression of to-be-excluded path names.

The current version cannot cope with symbolic link in the target directory.

Running namazu

To search documents, do

      % namazu query index

If you omit index, namazu will assume /usr/local/var/namazu/index as target.

Set up for namazu command will be done in namazurc. An example of namazurc can be found in /usr/local/etc/namazu/namazurc-sample in Namazu distribution package.

To use CGI on the web, you need to do various configuration. For Apache (Configuration)

ScriptAlias /cgi-bin/ /usr/local/apache/cgi-bin/ directory alias to /cgi-bin/ in URI
AddHandler cgi-script .cgi execute cgi for files ending with ".cgi"
AllowOverride All Allow .htaccess configuration (Web administrator)
Options ExecCGI Allow cgi-bin execution
DirectoryIndex index.html file name to display when specifying directory in URI

.htaccess can do configurations other than the one indicated by (Web administrator). (Note that these configuration may be forbidden in Apache configuration.)

What you can do with Namazu

What is written here is not "guarantee". Just introduce the advanced usage that deveoplers have in mind.

What you cannot do with Namazu

Others

Targets of index creation
Which files will be target for index creation in the specified "target directory" will depend on the (mknmzrc's) $ALLOW_FILE and/or $DENY_FILE directives, or -a, --allow, --deny, --exclude command-line options.
For mew-1.94b2x and mew-nmz.el,
mew works in combination with namazu; the features such as are coded in contrib/mew-nmz.el, and you can find further information in contrib/00readme-namazu.jis

Terminology

KAKASI
Software to convert Kanji to Hiragana/Katakana/Ro-maji. Namazu uses this as a segmentaion tool.
ChaSen
Japanese morpological analyser. Namazu uses this as a segmentaion tool.
Segmentaion
Unlike English, Japanese will not put spaces between words. Plain Japanese texts will first be preprocessed so that words are segmented and spaces are put in between. This is called segmentation. (The term "segmentation" is used in the same context other than computing words)
Index(Noun)
               (Preparaion)                (Search display)
                          mknmz       namazu
                         ^     |     ^      |
                         |     v     |      v
      Original Document        Index         Search Result
Namazu prepares index of words in prior to the search request, and upon request, Namazu searches the document based on the prepared index. This "prepared index" is called index. In Namazu, NMZ.* are the index.
Index (verb)
Create index explained above. Use mknmz.
Several Index
Functions to create more than 1 index and search the document in all.
Phrase searching
The basic of Namazu search is the combination of words. "foo and bar" and "bar and foo" (reverse order) are treated in the same way. Moreover, foo or bar can be found anywhere in the document. In contrast, searching string "foo bar" in this strict order is called phrase search.
namazu.conf, conf.pl
Version 1.4 or earlier, namazu and mknmz are configured in namazu.conf, conf.pl respectively. In Version 2.0, this is changed to namazurc, mknmzrc respectively.
mknmzrc (/usr/local/etc/namazu/mknmzrc)
Basic configuration for mknmz.
namazurc (/usr/local/etc/namazu/namazurc)
Configure this if you wish to change the behaviour of namazu and/or namazu.cgi. You can configure Index, Replace, Base, Logging, Lang, Template For further detail, see Manual
Perl module
In the old versions, NKF, KAKASI or ChaSen are called from Namazu as external processes. In this case, processes are invoked for each file, and the execution will be slow. In the current version, these become perl modules. By doing so, the execution speed becomes faster since no external process will be invoked.
This features are not offered in Namazu-1.3 or earlier. This is for Namazu 1.4 or later. To test if Perl modules necessary for Namazu is installed, do
perl -MText::Kakasi -e ''
perl -MText::ChaSen -e ''
perl -MNKF -e ''
You can take advantage of Perl modules if nothing is displayed. If you then do ./configure in namazu, these Perl modules will be used.

References

KAKASI 2.3.1
Program and Dictionary to convert Kanji-Kana sentences to Hiragana/Ro-maji sentences.
Creator: Hironobu Takahashi, Maintanance: KAKASI Project
In Namazu, KAKASI is used for Japanese segmentaion.
http://kakasi.namazu.org/
Development and Distribution
http://www.namazu.org/
FAQ (Japanese)
http://www.namazu.org/FAQ.html
Namazu Mailing List
http://www.namazu.org/ml.html
Akira Yamada's namazu.el (Emacs/Mule client)
http://arika.org/linux/tools/namazu-el/
Namazu Development version
http://www.namazu.org/unstable/

Namazu Homepage

$Id: tutorial.html,v 1.4 2000/04/01 16:41:35 satoru Exp $
developers@namazu.org