14
<table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="80%" id="AutoNumber1">
17
<a href="../index.htm">
18
<img border="0" src="w4small.GIF" width="101" height="101"></a></td>
20
<p align="center"><span lang="pt"><font face="Arial" size="7">MODULE IRI</font></span></td>
25
<p align="center"> </td>
30
<p align="center"><font face="Arial">(c)
31
<a href="http://centria.di.fct.unl.pt/~cd">Carlos Viegas Dam�sio</a>,
32
<span lang="pt">October </span>2003</font></td>
37
<p align="left"> </p>
40
<table border="0" cellpadding="0" cellspacing="5" style="border-collapse: collapse" bordercolor="#111111" width="80%" id="AutoNumber2">
43
<font face="Arial Black" size="5" color="#0000FF"><span lang="pt">
45
</span>Description</font></td>
48
<td width="100%">This module implements a set of library predicates for
49
parsing and working with IRI references, according to RFC 2396 and RFC
50
2732 and the draft proposals of RFC 2396 bis and Internationalized
51
Resource Identifiers:<ul>
52
<li>This module implements an IRI parser and resolution of IRI
54
<li>It also provides conversion predicates from atoms and strings to IRI
55
refs, and vice-versa. </li>
56
<li>The mapping of IRIs to ordinary URIs is also supported. </li>
57
<li>Resolves IRI references according to RFC 2396 bis.</li>
59
<p>Currently, the parser does not implement a full parser of IPv4 and IPv6
60
addresses, i.e. some invalid IPv4 and IPv6 addressed may be recognized.
61
Since this module depends on draft specifications, the user is advised to
62
restrict the usage of this module to the parsing of ordinary URI
71
<span lang="pt"><font face="Arial Black" size="5" color="#0000FF">
72
<a name="sec2"></a>2. Internationalized Resource Identifiers References
73
Representation</font></span><p>The user can use this module to construct a
74
Prolog term representation of Internationalized Resource Identifiers
75
References and to resolve them. Basically, the IRIs can be parsed from -1
76
terminated lists of Unicode character codes or from atom names encoding
77
IRIs in UTF-8. In both situations, the following term is constructed when
78
a syntactically correct IRI is provided:</p>
79
<table border="1" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%" id="AutoNumber3">
81
<td width="100%" colspan="2"><span lang="pt"><b>Internationalized
82
Resource Identifiers References Representation</b></span></td>
85
<td width="100%" colspan="2"><span lang="pt">
86
iriref(Scheme,Authority,Path,Query,Fragment)</span></td>
89
<td width="16%"><span lang="pt">Scheme:</span></td>
90
<td width="84%">The term <i>scheme( ListOfCodes )</i> represents an
91
existing scheme component part in the IRI reference, where ListOfCodes
92
is a list of Unicode character codes. <br>
93
The empty list <i>[]</i> if there is no scheme component in the IRI
97
<td width="16%"><span lang="pt">Authority:</span></td>
98
<td width="84%">The term <i>authority( UserInfo, Host, Port )</i>,
99
where UserInfo, Host, and Port are (possibly empty) lists of Unicode
101
The empty list <i>[]</i> if there is no authority component part in
102
the IRI reference.</td>
105
<td width="16%"><span lang="pt">Path:</span></td>
106
<td width="84%">The term <i>path( rel, Segments ) </i>or<i> path( abs,
107
Segments) </i>represents either an relative or absolute path. The
108
Segments are (possibly empty) lists of terms of the form <i>
109
segment(ListOfCodes)</i>, where ListOfCodes is a (possibly empty) list
110
of Unicode character codes.<br>
111
The empty list <i>[]</i> if there is no path component.</td>
114
<td width="16%"><span lang="pt">Query:</span></td>
115
<td width="84%">The term <i>query( ListOfCodes ) </i>represents an
116
existing query component part in the IRI reference, where ListOfCodes
117
is a list of Unicode character codes.<br>
118
The empty list <i>[]</i> if there is no query component in the IRI
122
<td width="16%"><span lang="pt">Fragment:</span></td>
123
<td width="84%">The term <i>fragment( ListOfCodes )</i> represents an
124
existing fragment component in the IRI reference, where ListOfCodes is
125
a list of Unicode character codes.<br>
126
The empty list <i>[]</i> if there is no fragment part in the IRI.</td>
129
<p>The main predicates are <b>parseIRIref/2</b>,<b> parseIRIref/3
130
</b>and <b>atom2iriref/2</b>, for parsing and construction of IRI
131
Reference term representation, and <b>resolveIRIref/3</b> for resolution
132
of relative references with respect to a base IRI. The separators of the
133
several IRI components are not mantained in the IRI reference term
134
representation, i.e. '<font face="Courier New">:</font>','<font face="Courier New">@</font>','<font face="Courier New">/</font>','<font face="Courier New">&</font>',
135
and '<font face="Courier New">#</font>'.</p>
144
<span lang="pt"><font face="Arial Black" size="5" color="#0000FF">
145
<a name="sec3"></a>3. Installation of the IRI Module</font></span><ol>
146
<li>Unpack the <a href="iri-1.0-beta.zip">package</a> containing the
147
source files to a library directory. This package should contain the
149
<li> <b>iri.P</b> and <b>iriparse.P</b>. The latest version of <b>
150
utilities.P</b> and <b>builtins.P</b> should also be available, and
151
are included in the package.</li>
152
<li>The file <b>iriparse.G</b>, containing the source code for
153
generating <b>iriparse.P</b>, if necessary. To generate <b>iriparse.P</b>
154
the user should use our lookup DCG parser generator.</li>
155
<li>This user's manual and the file <b>testiri.P</b> are also
156
provided. The test file illustrates the parsing and resolution of
160
<li>Compile the main file with the goal
161
<font face="Courier New">?-[iri]</font>.</li>
162
<li>The module can be tested by compiling the <b>testiri.P</b>. and
163
executing the goals <font face="Courier New">?- testiris</font>. and
164
<font face="Courier New">?- testresolution</font>. </li>
165
<li>The module predicates can be used resorting to import declarations.
166
The full set of predicates is described in the following section.</li>
176
<span lang="pt"><font face="Arial Black" size="5" color="#0000FF">
177
<a name="sec4"></a>4. Usage of the IRI Module</font></span><p>
178
<font face="Arial Black" color="#0000FF">4.1 Parsing of IRI references:</font></p>
179
<p><font size="2">IRI references can be parsed using <b>parseIRIref/2</b>
180
and <b>parseIRIref/3</b>. For efficiency, the programmer should use <b>
181
parseIRIref/2</b> which requires terminated lists of Unicode character
185
<li><font face="Courier New">parseIRIref( + TermUCSList, IRIref )</font></li>
188
<p>Given a list of Unicode character codes, terminated with -1, <b>
189
parseIRIref/2</b> returns the term representation of the IRI reference.
190
Fails if the first argument is not a syntactically correct IRI
191
reference. Notice that separators of the several IRI components do
192
not appear in the IRI reference term representation.<br>
193
The production ihostname of Internationalized Resource Identifiers is
194
not fully implemented: it is only checked if the ihostname part does not
195
contain illegal characters. The syntax of IPv6 addresses is not checked,
196
and IPv4 addresses are checked only if UserInfo is present.</p>
197
<p><b><font color="#FF0000">Example:</font></b></p>
198
<p><font face="Courier New">| ?- append(
199
"http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING", [-1],
201
parseIRIref( _Codes, Ref ).<br>
203
Ref = iriref(scheme([104,116,116,112]),<br>
204
205
authority([],[119,119,119,46,105,99,115,46,117,99,105,46,101,100,117],[]),<br>
206
207
path(abs,[segment([112,117,98]),<br>
208
209
segment([105,101,116,102]),<br>
210
211
segment([117,114,105]),<br>
212
213
segment([104,105,115,116,111,114,105,99,97,108,46,104,116,109,108])<br>
214
216
218
[],<br>
219
220
fragment([87,65,82,78,73,78,71])<br>
221
);</font></p>
224
<li><font face="Courier New">parseIRIref( + Terminated, + ListOfCodes,
228
<p>The first argument <font face="Courier New">Terminated</font> may
229
take the values <font face="Courier New">yes</font> or
230
<font face="Courier New">no</font>, indicating respectively whether the
231
2nd argument list of Unicode character codes is terminated or not. A
232
call of the form <font face="Courier New">parseIRIref( yes, ListOfCodes,
233
IRIref )</font> is equivalent to <font face="Courier New">parseIRIref(
234
ListOfCodes, IRI )</font>. If the first argument is
235
<font face="Courier New">no</font>, then the symbol -1 is appended to
236
the 2nd argument and <b>parseIRIref/2</b> is called. Thus, this second
237
form should be used sparingly.</p>
241
<p><font face="Arial Black" color="#0000FF">4.2 Testing and Inspection of
242
IRI reference terms:</font></p>
244
<p>The following set of predicates determine the type of IRI reference
245
parsed or constructed:</p>
247
<li><font face="Courier New">isIRIref(+ IRIref )</font><br>
249
This predicates succeeds when its argument is an IRI term function
250
symbol. For efficiency, it does not check if its component arguments are
253
<li><font face="Courier New">isIRI(+ IRIref )</font><br>
255
This predicates succeed when its argument is an IRI, i.e. an IRI
256
reference with a non-empty scheme component.<br>
258
<li><font face="Courier New">isAbsoluteIRI(+ IRIref )</font><br>
260
This predicates succeed when its argument is an absolute IRI, i.e. an
261
IRI without fragment part.<br>
263
<li><font face="Courier New">isRelativeIRI(+ IRIref )</font><br>
265
This predicates succeed when its argument is a relative IRI, i.e. an IRI
266
reference with an empty scheme component.</li>
268
<p>To obtain the several components of an IRI reference term, the
269
following predicates may be used:</p>
271
<li><font face="Courier New">getIRIrefScheme(+ IRIref,Scheme)<br>
273
</font>Obtains the scheme component of a given IRI reference. The scheme
274
component is a term of the form <i>scheme( ListOfCodes )</i> or an empty
275
list, as described in <a href="#sec2">Section 2</a>.<br>
277
<li><font face="Courier New">getIRIrefAuthority(+ IRIref,Authority)<br>
279
</font>Obtains the authority component of a given IRI reference. The
280
authority component is a term of the form <i>authority( UserInfo, Host,
281
Port)</i> or an empty list, as described in <a href="#sec2">Section 2</a>.<br>
283
<li><font face="Courier New">getIRIrefPath(+ IRIref,Path)<br>
285
</font>Obtains the path component of a given IRI reference. The path
286
component is a term of the form <i>path( AbsRel, Segments )</i> or an
287
empty list, as described in <a href="#sec2">Section 2</a>.<br>
289
<li><font face="Courier New">getIRIrefQuery(+ IRIref,Query)<br>
291
</font>Obtains the query component of a given IRI reference. The query
292
component is a term of the form <i>query( ListOfCodes )</i> or an empty
293
list, as described in <a href="#sec2">Section 2</a>.<br>
295
<li><font face="Courier New">getIRIrefFragment(+ IRIref,Fragment)<br>
297
</font>Obtains the fragment component of a given IRI reference. The
298
fragment component is a term of the form <i>fragment( ListOfCodes )</i>
299
or an empty list, as described in <a href="#sec2">Section 2</a>.<br>
303
<p><font face="Arial Black" color="#0000FF">4.3 Construction of IRI
304
references:</font></p>
305
<p>The next predicates provide mechanisms to dynamically construct IRI
306
references. The advised method to construct IRIs is to parse them from
307
lists of Unicode character codes. The predicates described in this section
308
should be used with care since no checking of arguments is performed. </p>
310
<li><font face="Courier New"><span class="SpellE"><span class="GramE">
311
<font size="2"><span style="font-size: 10pt">createEmptyIRI</span></font></span></span><span class="GramE"><font size="2"><span style="font-size: 10pt">ref(</span></font></span><font size="2"><span style="font-size: 10pt">
314
</span></font></font>This predicate creates an empty IRI reference<br>
316
<li><font face="Courier New"><span class="SpellE"><span class="GramE">
317
<font size="2"><span style="font-size: 10pt">createIRI</span></font></span></span><span class="GramE"><font size="2"><span style="font-size: 10pt">ref(</span></font></span><font size="2"><span style="font-size: 10pt">
318
+ Scheme, + Authority, + Path, + Query, + Fragment, IRIref )<br>
320
</span></font></font>This predicate creates an IRI reference from the
321
several components of the IRI reference. The input arguments are either
322
empty lists or component terms as described in <a href="#sec2">Section 2</a>
326
<li><font face="Courier New"><span class="SpellE"><span class="GramE">
327
<font size="2"><span style="font-size: 10pt">setIRIrefScheme</span></font></span></span><span class="GramE"><font size="2"><span style="font-size: 10pt">(</span></font></span><font size="2"><span style="font-size: 10pt">
328
+ <span class="SpellE">OldIRIref</span>, + Scheme, <span class="SpellE">
329
NewIRIref</span> )<br>
331
</span></font></font>The predicate <b>setIRIrefScheme/3</b> replaces
332
the scheme component in the IRI reference term <font face="Courier New">
333
OldIRIref</font> by the list of Unicode character codes in argument
334
<font face="Courier New">Scheme</font>, returning the new IRI reference
335
term in the last argument <font face="Courier New">NewIRIref</font>.<br>
337
<li><font face="Courier"><span class="SpellE"><span class="GramE">
338
<font size="2"><span style="font-size: 10pt">setIRIrefAuthority</span></font></span></span><span class="GramE"><font size="2"><span style="font-size: 10pt">(</span></font></span></font><font size="2"><span style="font-size: 10pt"><font face="Courier">
339
+ <span class="SpellE">OldIRIref</span>, + <span class="SpellE">UserInfo</span>,
340
+ Host, + Port, <span class="SpellE">NewIRIref</span> )<br>
342
</span></font>The predicate <b>setIRIrefAuthority/5</b> replaces the
343
authority component in the IRI term <font face="Courier New">OldIRIref</font>
344
by the authority term constructed from the lists of Unicode character
345
codes arguments <font face="Courier New">UserInfo</font>,
346
<font face="Courier New">Host</font> and<font face="Courier New"> Port.</font>
347
The new IRI reference term is returned in the last argument
348
<font face="Courier New">NewIRIref.<br>
350
<li><font face="Courier"><span class="SpellE"><span class="GramE">
351
<font size="2"><span style="font-size: 10pt">setIRIref</span></font></span></span><span class="GramE"><font size="2"><span style="font-size: 10pt">Path(</span></font></span></font><font size="2"><span style="font-size: 10pt"><font face="Courier">
352
+ <span class="SpellE">OldIRIref</span>, + <span class="SpellE">AbsRel</span>,
353
+ Path, <span class="SpellE">NewIRIref</span> )<br>
355
</span></font>The predicate <b>setIRIrefPath/5</b> replaces the
356
path component in the IRI term <font face="Courier New">OldIRIref</font>
357
by the path term constructed from the list of segments in argument
358
<font face="Courier New">Path</font>, and the flag
359
<font face="Courier New">AbsRel</font>, which may take the values
360
<font face="Courier New">abs</font> or <font face="Courier New">rel</font>.
361
The new IRI reference term is returned in the last argument
362
<font face="Courier New">NewIRIref</font>.<br>
364
<li><font face="Courier New"><span class="SpellE"><span class="GramE">
365
<font size="2"><span style="font-size: 10pt">setIRIrefQuery</span></font></span></span><span class="GramE"><font size="2"><span style="font-size: 10pt">(</span></font></span><font size="2"><span style="font-size: 10pt">
366
+ <span class="SpellE">OldIRIref</span>, + Query, <span class="SpellE">
367
NewIRIref</span> )<br>
369
</span></font></font>The predicate <b>setIRIrefQuery/3</b> replaces the
370
query component in the IRI reference term <font face="Courier New">
371
OldIRIref</font> by the list of Unicode character codes in argument
372
<font face="Courier New">Query</font>, returning the new IRI reference
373
term in the last argument <font face="Courier New">NewIRIref</font>.<br>
375
<li><font face="Courier New"><span class="SpellE"><span class="GramE">
376
<font size="2"><span style="font-size: 10pt">setIRIrefFragment</span></font></span></span><span class="GramE"><font size="2"><span style="font-size: 10pt">(</span></font></span><font size="2"><span style="font-size: 10pt">
377
+ <span class="SpellE">OldIRIref</span>, + Query, <span class="SpellE">
378
NewIRIref</span> )<br>
380
</span></font></font>The predicate <b>setIRIrefFragment/3</b> replaces
381
the fragment component in the IRI reference term
382
<font face="Courier New">OldIRIref</font> by the list of Unicode
383
character codes in argument <font face="Courier New">Query</font>,
384
returning the new IRI reference term in the last argument
385
<font face="Courier New">NewIRIref</font>.</li>
387
<p><font face="Arial Black" color="#0000FF">4.4 Resolution of IRI
388
references:</font></p>
390
<p>The IRI module implements resolution of IRI references according to the
391
algorithms described in RFC 2396 bis. Therefore, empty references are
392
allowed and abnormal relative path ".." segments are removed from the
395
<li><font face="Courier New">resolveIRIref( + IRIref, + BaseIRI, ResIRI)<br>
397
The first argument of <b>resolveIRIref/3</b> is an arbitrary IRI
398
reference term, while the <font face="Courier New">BaseIRI </font>should
399
be an IRI term, i.e. with scheme component part. The resolved IRI is
400
returned in the last argument.<br>
402
<font color="#FF0000"><b>Example:</b></font><br>
404
<font face="Courier New">| ?- atom2iriref(
405
'http://www.example.com:8080/a/b/c', BaseIRI ), <br>
406
atom2iriref( '../x/y&query#123', RelIRI ), <br>
407
resolveIRIref( RelIRI, BaseIRI, ResIRI ), <br>
408
iriref2atom( ResIRI, Resolved ).<br>
410
BaseIRI = iriref(scheme([104,116,116,112]),<br>
411
412
authority([],[119,119,119,46,101,120,97,109,112,108,101,46,99,111,109],[56,48,56,48]),<br>
413
414
path(abs,[segment([97]),segment([98]),segment([99])]),<br>
415
417
419
421
RelIRI = iriref([],<br>
422
424
425
path(rel,[segment([46,46]),segment([120]),segment([121,38,113,117,101,114,121])]),<br>
426
428
429
fragment([49,50,51])<br>
430
432
ResIRI = iriref(scheme([104,116,116,112]),<br>
433
434
authority([],[119,119,119,46,101,120,97,109,112,108,101,46,99,111,109],[56,48,56,48]),<br>
435
436
path(abs,[segment([97]),segment([120]),segment([121,38,113,117,101,114,121])]),<br>
437
439
440
fragment([49,50,51])<br>
441
443
Resolved = http://www.example.com:8080/a/x/y&query#123;</font></li>
447
<p><font color="#0000FF" face="Arial Black">4.5 Conversion and mapping of
448
IRI references</font></p>
451
<li><font face="Courier New">atom2iriref( + AtomInUTF8, IRIref).<br>
453
This predicate converts an IRI reference represented as an UTF8 sequence
454
of octets to the IRI ref term representation. It fails if the atom is
455
not a syntactically correct IRI reference.<br>
457
<li><font face="Courier New">iriref2atom( + IRIref, AtomInUTF8 ).</font><br>
459
Predicate iriref2atom/2 converts the IRI reference term representation
460
to an Atom in UTF-8 encoding. <br>
462
<b><font color="#FF0000">Example:<br>
464
</font></b><font face="Courier New">| ?- atom2iriref(
465
'mailto:Carlos.Damasio@di.fct.unl.pt', IRIref ), <br>
466
iriref2atom(IRIref, Atom ).<br>
468
IRIref = iriref(scheme([109,97,105,108,116,111]),<br>
469
471
472
path(rel,[segment([67,97,114,108,111,115,46,68,97,109,97,115,105,111,<br>
473
474
64,100,105,46,102,99,116,46,117,110,108,46,112,116])]<br>
475
477
479
481
Atom = mailto:Carlos.Damasio@di.fct.unl.pt;<br>
484
<li><font face="Courier New">iriref2string( + IRIref, StringInUTF8) <br>
485
iriref2string( + IRIref, StringInUTF8, RestStringInUTF8 ).<br>
487
Predicates iriref2string convert an IRI refererence term representation
488
to a list of Unicode characters in UTF-8 encoding. The three argument
489
version returns an incomplete list, where RestStringInUTF8 is the
492
<li><font face="Courier New">iri2uri( + UCSList, URIList )<br>
493
iri2uri( + UCSList, URIList, RestURIList )</font><br>
495
Predicates iri2uri convert an IRI reference represented by a list of
496
Unicode character codes to a proper Universal Resource Identifier, using
497
the algoritm described in Internationalized Resource Identifiers.
498
The three argument version returns an incomplete list, where RestURIList
499
is the variable tail.<br>
501
<b><font color="#FF0000">Example:<br>
503
</font></b><font face="Courier New">| ?- iri2uri(
504
"mailto://Carlos.Dam�sio@di.fct.unl.pt", L ), <br>
505
atom_codes( URI, L ).<br>
508
[109,97,105,108,116,111,58,47,47,67,97,114,108,111,115,46,68,97,109,37,67,<br>
509
510
50,37,65,48,115,105,111,64,100,105,46,102,99,116,46,117,110,108,46,112,116]<br>
511
URI = mailto://Carlos.Dam%C2%A0sio@di.fct.unl.pt;<br>
514
<li><font face="Courier New">filename2uri( + UCSList, URIList )<br>
515
filename2uri( + UCSList, URIList, RestURIList )</font><br>
517
Predicates <b>filename2uri</b> assume that an absolute file path,
518
represented by a list of ASCII character codes to a Universal Resource
519
Identifier, escaping excluded charactes. The three argument version
520
returns an incomplete list, where RestURIList is the variable tail. This
521
predicate uses specific built-in XSB predicates to be able to detect the
522
unerlying operating system in order to recognize path separators: ''\'
523
in Windows-based. <br>
524
In the case of Windows operating systems, the absolute file path must
525
contain the drive letter. For non-windows operating systems, the path
526
must start with '/'.<br>
528
<b><font color="#FF0000">Example (Windows):<br>
530
</font></b><font face="Courier New">| ?- filename2uri( "C:\My
531
Documents\Jo%A0o", L, [-1] ), <br>
532
parseIRIref( L, _IRI ), <br>
533
iriref2atom( _IRI, FilePath ).<br>
536
[102,105,108,101,58,47,47,67,58,47,77,121,37,50,48,68,111,99,117,109,101,110,116,115,47,74,111,37,65,48,111,-1]<br>
537
FilePath = file://C:/My%20Documents/Jo%A0o;<br>
540
| ?- filename2uri( "C:/My Documents/Jo%A0o", L, [-1] ), <br>
541
parseIRIref( L, _IRI ),<br>
542
iriref2atom( _IRI, FilePath ).<br>
545
[102,105,108,101,58,47,47,67,58,47,77,121,37,50,48,68,111,99,117,109,101,110,116,115,47,74,111,37,65,48,111,-1]<br>
546
FilePath = file://C:/My%20Documents/Jo%A0o</font></li>
549
<p><b><font color="#FF0000">Example (Non-Windows):<br>
551
</font></b><font face="Courier New">| ?- filename2uri( "/My
552
Documents/Jo%A0o", L, [-1] ), <br>
553
parseIRIref( L, _IRI ), <br>
554
iriref2atom( _IRI, FilePath ).<br>
557
[102,105,108,101,58,47,77,121,37,50,48,68,111,99,117,109,101,110,116,115,47,74,111,37,65,48,111,-1]<br>
558
FilePath = file:/My%20Documents/Jo%A0o;<br>
562
<p> </font></td>
570
<p><span lang="pt"><font face="Arial Black" size="5" color="#0000FF">
571
<a name="sec5"></a>5.
572
Copyright</font></span><p>This is an academic and experimental tool. It
573
cannot be used for commercial purposes without explicit consent of the
582
<p><span lang="pt"><font face="Arial Black" size="5" color="#0000FF">
583
<a name="sec6"></a>6.
584
Disclaimer</font></span><p>This is an academic and experimental tool. I
585
do not give any guarantee of any form regarding the use of this tool.</td>
592
<td width="100%" valign="top">
593
Last update: October 30th, 2003</td>
b'\\ No newline at end of file'