The search package is responsible for taking a text string "God
& loves & world" and turning it into a series of calls to
Book
and Passage
to find the answer. I am a
little concerned that this design is a little complex, but I'm sure that
stewing on it will make things come clear.
These are the requirements of the search.Engine:
This is how the current design works. The user types a string like
"aaron ~5 & moses & thesarus talk". (This means;
find Moses within 5 verses of Aaron alongside some speach type activity).
The Engine
prepends the string "/" and tokenizes
this into a Vector of SearchWords
- one
SearchWord
for each part of the search string. The Vector
(using Java array syntax) looks like this: { "/",
"aaron", "~", "5", "&",
"moses", "&", "thesaurus",
"talk" }.
SearchWord
is an interface, implemented in several ways.
The Engine selects which SearchWord
to use from a Hashtable
of SearchWord
s. The members of this Hashtable are the
available SearchWord
s keyed on a token (in this example the
tokens are /, ~, &, & and thesaurus). A
DefaultParamWord
is created for the words in the search
string that do not have keys in the Hashtable (aaron, moses and talk)
The Vector is better understood like this:
Each of these 9 elements in the Vector is a SearchWord
.
The first element on each line (/, ~, & and &) is a
CommandWord
, the others (aaron, 5, moses, thesaurus and
talk) are ParameterWords. CommandWord
and
ParamWord
inherit from SearchWord
.
So in other word you could write the Vector like this, note the new bullet points are for each CommandWord, the Vector is strictly 1D an does not care at all for the difference between CommandWords and ParamWords:
It is worth noting that all the DefaultParamWord
s are
created from unknown tokens. The other SearchWord
s (both
CommandWord
s (/, ~ and &) and the ParamWord
(thesaurus)) were members of the Hashtable in the Engine.
The search Engine loops, taking an element from the Vector - expecting
it to be a CommandWord
and calling
CommandWord.updatePassage()
. These CommandWord
s
have the opportunity to take elements from the Vector and treat them as
ParamWord
s. Any error is a ClassCastException which is
caught and translated into a sensible error message.
This does NOT represent the current design. I've left it here to show the steps I went through to get to the current design. There were 2 possible designs. The smart engine model and the smart data model. The latter won. The ideas were like this:
The engine understands how to parse the search string
into a series of calls to the relevant places. The engine is extensible by
adding new 'commands' (Which must follow an SearchWord
s
interface - now deleted). This model has the advantage of simplicity, and
memory-efficiency.
The engine simply turns the search string into a data structure, the
nodes of this document are instansiated as classes that follow an
interface with a getAnswer()
interface. Calling
getAnswer()
on the root node recurses down to find the
answer. The big advantage of this model is that it can be readily
extended to several types of interface - from the most basic GUI find
dialog to a ridiculously powerful command line version.
I toyed with an XML based engine. The Engine parses the search string into an XML Document. Something like this:
XML representation of the above search, and the code that implements it
<search> // ref = new Passage(); <add> // ref.addAll( <word>aaron</word> // default_bible.getPassages("aaron") </add> // ); <blur>5</blur> // ref.blur(5); <retain> // ref.retainAll( <word>moses</word> // default_bible.getPassages("moses") </retain> // ); <add> // ref.addAll( <words> // default_bible.getPassages( <thesarus>talk</thesarus> // thesarus.getSynonyms("talk") </words> // ) </add> // ); </search> // return ref;
The benefit of this is that it allows us to easily remote the whole search engine. I seem to have an XML disease, so why shouldn't it affect here!
However I decided that a remote search engine was of little benefit
since the individual SearchWord
s can be remoted via a very
simple stub - giving an engine that can be remoted piecemeal. The only
drawback to this solution is on high latency networks (erm like the
Internet) where a set of simple requests can take a lot longer than a
single complex one. However I am sure that I could XMLize or serialize
the Vector invented above.
Some code to do soundex matching ...
// create object listing the SOUNDEX values for each letter // -1 indicates that the letter is not coded, but is used for coding // 1 is for BFPV // 2 is for CGJKQSXZ // 3 is for DT // 4 is for L // 5 is for MN my home state // 6 is for R function makesoundex() { this.a = -1 this.b = 1 this.c = 2 this.d = 3 this.e = -1 this.f = 1 this.g = 2 this.h = -1 this.i = -1 this.j = 2 this.k = 2 this.l = 4 this.m = 5 this.n = 5 this.o = -1 this.p = 1 this.q = 2 this.r = 6 this.s = 2 this.t = 3 this.u = -1 this.v = 1 this.w = -1 this.x = 2 this.y = -1 this.z = 2 } var sndx=new makesoundex() // check to see that the input is valid function isSurname(name) { if (name=="" || name==null) { alert("Please enter surname for which to generate SOUNDEX code.") return false } else { for (var i=0; i='a' && letter<='z' || letter>='A' && letter<='Z')) { alert("Please enter only letters in the surname.") return false } } } return true } // Collapse out directly adjacent sounds // 1. Assume that surname.length>=1 // 2. Assume that surname contains only lowercase letters function collapse(surname) { if (surname.length==1) { return surname } var lname=(document.myform.surname.value) document.myform.lname.value=lname var right=collapse(surname.substring(1,surname.length)) if (sndx[surname.charAt(0)]==sndx[right.charAt(0)]) { return surname.charAt(0)+right.substring(1,right.length) } return surname.charAt(0)+right } // Compute the SOUNDEX code for the surname function soundex(form) { form.result.value="" if (!isSurname(form.surname.value)) { return } var stage1=collapse(form.surname.value.toLowerCase()) form.result.value+=stage1.charAt(0).toUpperCase() // Retain first letter form.result.value+="-" // Separate letter with a dash var stage2=stage1.substring(1,stage1.length) var count=0 for (var i=0; i 0) { form.result.value+= sndx[stage2.charAt(i)] count++ } } for (;count<3; count++) { form.result.value+="0" } form.surname.select() form.surname.focus() }