1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
4
<title>an incomplete guide to mozilla/string</title>
6
<link rel="stylesheet" href="http://www.mozilla.org/projects/string/string-guide.css" title="remote stylesheet" type="text/css">
7
<link rel="alternate stylesheet" href="string-guide.css" title="local stylesheet" type="text/css">
10
<!-- ----|---------|---------|---------|---------|---------|---------|---------| -->
11
<!-- ...............................................................Front Matter -->
12
<h1>an incomplete guide to <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/string/">mozilla/string</a></h1>
13
<h1><font color="red">This document is now deprecated in favor of <a href="http://www.mozilla.org/projects/xpcom/string-guide.html">The new string guide</a>.</font></h1>
14
<div class="author-note">
15
<p>by <a href="http://ScottCollins.net/">Scott Collins</a><!-- /p -->
16
<p>last modified 8 April 2001<!-- /p -->
19
<div class="abstract">
22
This document <span class="LXRSHORTDESC">provides
23
an <a href="#users_guide">introduction</a> to the design and use of the string classes in mozilla,
24
<a href="#implementors_guide">detailed information</a> on their implementation and how one may extend them,
25
and <a href="#faq">answers</a> to frequently asked questions about strings</span>.
31
<h2><a name="contents">contents</a></h2>
33
<div class="contents">
35
<li><a href="#users_guide" >user's guide</a></li>
36
<li><a href="#implementors_guide">implementor's guide</a></li>
37
<li><a href="#faq" >frequently asked questions</a></li>
42
Please direct all comments, requests, and contributions to,
43
in order of preference,
44
the tracking bug <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=70076">#70076</a> for this document,
45
the author <a class="exact-uri" href="mailto:scc@mozilla.org?subject=string-guide">scc@mozilla.org</a>, and/or
46
the newsgroup <a class="exact-uri" href="news:netscape.public.mozilla.xpcom">news:netscape.public.mozilla.xpcom</a>
47
(should there be a strings newsgroup?)
50
<div class="author-note">
52
A note to potential editors:
53
don't even <strong>consider</strong> modifying this document with an HTML editor.
54
That would destroy the internal formatting,
55
and make patches unmanagable.
62
<!-- ...............................................................User's Guide -->
64
<h1><a name="users_guide">user's guide</a></h1>
66
<div class="author-note">
68
Strings in mozilla are a world apart from <span class="code">char*</span>s.
69
If you don't know why they are different,
70
this section is the place for you to start.
71
If you're already familiar with the hierarchy of string classes in mozilla,
72
then you might want to skip ahead to the <a href="#implementors_guide">implementor's guide</a>
73
or the <a href="#faq">FAQ</a>.
77
<div class="contents">
79
<li><a href="#users_guide_introduction">introduction</a></li>
80
<li><a href="#users_guide_how_to" >using the string classes correctly; using the correct string class</a></li>
81
<li><a href="#users_guide_iterators" >using string iterators</a></li>
82
<li><a href="#users_guide_summary" >summary</a></li>
86
<h2><a name="users_guide_introduction">introduction</a></h2>
87
<h3>what and what isn't a string?</h3>
89
A string is an opaque container holding a, possibly zero length, linear sequence of characters.
90
Understanding the implications of this statement is the foundation for understanding all mozilla's string classes.
93
<h3>readable and writable</h3>
94
<h3>dependent strings</h3>
99
<h2><a name="users_guide_how_to">using the string classes correctly; using the correct string class</a></h2>
100
<h3>basic string operations</h3>
102
<h4>concatenation</h4>
104
<h4>find and replace</h4>
106
<h4>calling a function that expects a different kind of string</h4>
107
<h4>converting between string classes</h4>
108
<h4>converting between encodings</h4>
109
<h3>selecting the right string class</h3>
110
<h4>user string classes</h4>
111
<h4>selecting the right string class for a parameter</h4>
112
<h4>selecting the right string class for a local variable</h4>
113
<h4>selecting the right string class for a member variable</h4>
114
<h4>selecting the right string class for a return value</h4>
115
<h4>selecting the right string class in IDL</h4>
118
<h2><a name="users_guide_iterators">using string iterators</a></h2>
119
<h3>what is an iterator?</h3>
120
<h3>reading iterators and writing iterators</h3>
121
<h3>`chunky' iterating for efficiency</h3>
122
<h3><span class="code">copy_string</span>, character sources and sinks</h3>
123
<h3>encoding conversion iterators</h3>
125
<h2><a name="users_guide_summary">summary</a></h2>
128
<!-- ........................................................Implementor's Guide -->
130
<h1><a name="implementors_guide">implementor's guide</a></h1>
132
<div class="author-note">
138
<div class="contents">
146
<!-- ........................................................................FAQ -->
148
<h1><a name="faq">frequently asked questions</a></h1>
150
<div class="author-note">
153
<div class="contents">
157
I have a wide string, i.e., an instance of a class derived from <span class="code">nsAString</span>
159
<li>I want a pointer to the characters</span>
160
<li>I want a narrow string</li>
161
<li>I want to <span class="code">printf</span> it</li>
165
I have a <span class="code">PRUnichar*</span>
167
<li>I want a wide string</span>
168
<li>I want a narrow string</span>
169
<li>I want to <span class="code">printf</span> it</li>
173
I have a narrow string, i.e., an instance of a class derived from <span class="code">nsACString</span>
175
<li>I want a pointer to the characters</span>
176
<li>I want a narrow string</li>
177
<li>I want to <span class="code">printf</span> it</li>
181
I have a <span class="code">char*</span>
183
<li>I want a wide string</span>
184
<li>I want a narrow string</span>
188
I have a literal character sequence, e.g., <span class="code">"Hello, World!\n"</span>
190
<li>I want a wide string</span>
191
<li>I want a narrow string</span>
194
<li>What's the best way to return a string?</li>
195
<li>How can I get a pointer to the characters in a string?</li>
196
<li>How can I <span class="code">printf</span> a string?</li>
202
<table class="chart">
205
<th colspan="5">you have some <span class="code">char</span>s</th>
209
<th><span class="code">'x'</span></th>
210
<th><span class="code">char c</span></th>
211
<th><span class="code">"foo"</span></th>
212
<th><span class="code">char* cp</span></th>
213
<th><span class="code">nsACString& cs</span></th>
216
<th class="row-label"><span class="code">char</span></th>
217
<td colspan="2">.</td>
218
<!-- "foo" --> <td><span class="code">[]</span></td>
219
<!-- char* cp --> <td><span class="code">[]</span></td>
220
<!-- nsACString& cs --> <td><a href="#faq_how_to_extract_a_character">extract a character</a></td>
223
<th class="row-label"><span class="code">PRUnichar</span></th>
224
<!-- 'x' --> <td><span class="code">PRUnichar('x')</span></td>
225
<!-- char c --> <td><span class="code">PRUnichar(c)</span></td>
226
<td colspan="3"><a href="#faq_how_to_convert_encoding">convert encoding</a>, <a href="#faq_how_to_extract_a_character">extract a character</a></td>
229
<th class="row-label"><span class="code">char*</span></th>
230
<!-- 'x' --> <td><span class="code">&</span></td>
231
<!-- char c --> <td><span class="code">&</span></td>
232
<!-- "foo" --> <td><span class="code">&</span></td>
233
<!-- char* cp --> <td>.</td>
234
<!-- nsACString& cs --> <td><a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
237
<th class="row-label"><span class="code">PRUnichar*</span></th>
238
<td colspan="5"><a href="#faq_how_to_convert_encoding">convert encoding</a>, <a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
241
<th class="row-label"><span class="code">nsACString</span></th>
242
<!-- 'x' --> <td><span class="code">NS_LITERAL_CSTRING("x")</span></td>
243
<!-- char c --> <td><a href="#faq_how_to_make_a_string">make a string</a></td>
244
<!-- "foo" --> <td><span class="code">NS_LITERAL_CSTRING("foo")</td>
245
<!-- char* cp --> <td><a href="#faq_how_to_make_a_string">make a string</a></td>
246
<!-- nsACString& cs --> <td>.</td>
249
<th class="row-label"><span class="code">nsAString</span></th>
250
<!-- 'x' --> <td><span class="code">NS_LITERAL_STRING("x")</span></td>
251
<!-- char c --> <td><a href="#faq_how_to_convert_encoding">convert encoding</a></td>
252
<!-- "foo" --> <td><span class="code">NS_LITERAL_STRING("foo")</span></td>
253
<td colspan="2"><a href="#faq_how_to_convert_encoding">convert encoding</a></td>
256
<th class="row-label">to call <span class="code">printf</span></th>
257
<td colspan="4">.</td>
258
<!-- nsACString& cs --> <td><a href="#faq_how_to_call_printf">call <span class="code">printf</span></a></td>
262
<table class="chart">
265
<th colspan="3">you have some <span class="code">PRUnichar</span>s</th>
269
<th><span class="code">PRUnichar w</span></th>
270
<th><span class="code">PRUnichar* wp</span></th>
271
<th><span class="code">nsAString& s</span></th>
274
<th class="row-label"><span class="code">char</span></th>
275
<!-- PRUnichar w --> <td></td>
276
<!-- PRUnichar* wp --> <td></td>
277
<!-- nsAString& s --> <td></td>
280
<th class="row-label"><span class="code">PRUnichar</span></th>
281
<!-- PRUnichar w --> <td></td>
282
<!-- PRUnichar* wp --> <td><span class="code">[]</span></td>
283
<!-- nsAString& s --> <td><a href="#faq_how_to_extract_a_character">extract a character</a></td>
286
<th class="row-label"><span class="code">char*</span></th>
287
<!-- PRUnichar w --> <td></td>
288
<!-- PRUnichar* wp --> <td></td>
289
<!-- nsAString& s --> <td></td>
292
<th class="row-label"><span class="code">PRUnichar*</span></th>
293
<!-- PRUnichar w --> <td><span class="code">&</span></td>
294
<!-- PRUnichar* wp --> <td></td>
295
<!-- nsAString& s --> <td><a href="#faq_how_to_get_a_pointer">get a pointer</a></td>
298
<th class="row-label"><span class="code">nsACString</span></th>
299
<!-- PRUnichar w --> <td></td>
300
<!-- PRUnichar* wp --> <td></td>
301
<!-- nsAString& s --> <td></td>
304
<th class="row-label"><span class="code">nsAString</span></th>
305
<!-- PRUnichar w --> <td></td>
306
<!-- PRUnichar* wp --> <td></td>
307
<!-- nsAString& s --> <td></td>
310
<th class="row-label">to call <span class="code">printf</span></th>
311
<!-- PRUnichar w --> <td></td>
312
<!-- PRUnichar* wp --> <td></td>
313
<!-- nsAString& s --> <td><a href="#faq_how_to_call_printf">call <span class="code">printf</span></a></td>
320
is there any string doc?
323
Yes, you're soaking in it!
328
<!-- getting a pointer -->
330
<a name="faq_how_to_get_a_pointer">I have a string, how do I get a pointer to the characters?</a>
333
You want to avoid this situation.
334
In your own interfaces, prefer string types over raw pointers.
335
Any interface that wants to process a string using a single pointer is making two expensive assumptions.
336
First, that the string is stored in one contiguous hunk; and
337
second, that the string is zero-terminated.
338
If this isn't the case,
339
then to get a pointer, storage must be allocated and the entire string must be copied to it and zero-terminated.
340
You may not be able to avoid needing a pointer when interacting with system calls.
343
Some string classes guarantee that they are `flat'.
344
That is, that their data is stored in one contiguous zero-terminated hunk.
345
This <strong>does not</strong> imply that there are no embedded nulls. Caveat emptor.
346
All strings that explicitly promise flatness
347
inherit from the class <span class="code">nsAFlatString</span>
348
or <span class="code">nsAFlatCString</span>
349
and can produce a constant pointer to their data with the <span class="code">get()</span> member function.
350
Even strings that don't explicitly promise to be flat
351
may happen to be flat.
352
The helper function <span class="code">PromiseFlatString</span> will produce
353
a <span class="code">const</span> dependent string that is guaranteed to be flat.
354
If you use this on a string that already happens to be flat,
355
the result is simply a reference through to that string.
357
<span class="code">PromiseFlatString</span> does the work to allocate, copy, terminate, and manage
358
a temporary flat string.
359
Since the result of <span class="code">PromiseFlatString</span> is a temporary,
360
you must be careful not to get and hold a pointer to its data for longer than the temporary itself lives.
363
<div class="source-code">
365
/* I have a string, how do I get a pointer to the characters? */
367
extern void EvilNarrowOSFunction( const char* ); // evil OS routines that want a pointers
368
extern void EvilWideOSFunction( const PRUnichar* );
370
void func( const nsAString& aString, const nsACString& aCString )
372
EvilWideOSFunction( NS_LITERAL_STRING("Hello, World!").<span class="notice">get()</span> );
373
// literal strings are flat already (as are |nsString|s, et al), just use |.get()|
375
EvilWideOSFunction( <span class="notice">PromiseFlatString(</span>aString<span class="notice">).get()</span> );
376
// for strings that don't explicitly guarantee flatness, use |PromiseFlatString|
379
// beware holding the pointer for longer than the life of the promise
380
<span class="warning">const PRUnichar* wp = PromiseFlatString(aString).get(); // BAD! |wp| dangles
381
EvilWideOSFunction(wp);</span>
383
// if you really need to use the pointer from |PromiseFlatString| in more than one expression...
384
const nsAFlatString& flat = <span class="notice">PromiseFlatString(</span>aString<span class="notice">)</span>;
385
EvilWideOSFunction(flat.<span class="notice">get()</span>);
386
SomeOtherFunction(flat.<span class="notice">get()</span>);
388
// similarly for |char| strings
389
EvilNarrowOSFunction( <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span> );
397
<!-- extracting a character -->
399
<a name="faq_how_to_extract_a_character">How do I get a particular character out of a string?</a>
402
Flat strings provide <span class="code">operator[]</span> and <span class="code">CharAt()</span>.
403
All strings provide <span class="code">First()</span>, <span class="code">Last()</span>, and access with iterators.
404
<strong>Don't</strong> promise a string flat just to do character indexing.
405
Prefer, instead, to get an iterator and <span class="code">advance</span> it to the position you care about.
408
<div class="source-code">
410
/* How do I get a particular character out of a string? */
412
PRUnichar Get5thCharacterOf( const nsAString& aString )
414
if ( aString.Length() >= 5 )
416
nsAString::const_iterator iter;
417
aString.BeginReading(iter); // make |iter| point to the beginning of |aString|
428
Using iterators isn't as bad as the example above makes it feel.
429
The typical use is for advancing through a string, examining many characters.
434
<!-- how to convert encoding -->
436
<a name="faq_how_to_convert_encoding">How do I convert from one encoding to another?</a>
443
<!-- how to make a string -->
445
<a name="faq_how_to_make_a_string">How do I create a string?</a>
451
<!-- how to return a string -->
453
What is the best way to return a string?
457
There are several reasonable ways to produce a string result from a function.
458
If you are already holding the answer as a sharable string,
459
you can simply return that string (pass-by-value).
461
the most efficient and flexible way to return a string is
462
to assign your result into a non-<span class="code">const</span> reference parameter.
463
Don't bother to create a sharable string from scratch with your generated result.
467
The two things you want to minimize in string manipulation are,
468
in order of importance,
470
moving characters around.
474
<div class="source-code">
476
/* What is the best way to return a string? */
482
void GetShortName( nsAString& aResult ) const;
483
nsCommonString GetFullName() const;
486
nsCommonString mFullName;
488
const PRUnichar* mShortName;
489
PRUint32 mShortNameLength;
494
foo::GetFullName() const
500
foo::GetShortName( nsAString& aResult ) const
502
aResult = DependentString(mShortName, mShortNameLength);
510
<a name="faq_how_to_call_printf">How do I <span class="code">printf</span> a string, e.g., for debugging.</a>
513
If your string is already narrow, you just have to worry about <a href="#faq_how_to_get_a_pointer">making it flat, and then getting a pointer</a>.
516
If your string happens to be wide,
517
you'll need to convert it before you can <span class="code">printf</span> something reasonable.
518
If it's just for debugging,
519
you probably wouldn't care if something odd was printed in the case of a Unicode character that didn't have
520
an ASCII equivalent. (If you have a UTF-8 terminal, the result is
521
perfectly legible and nothing odd is printed.)
522
The simplest thing in this case is to make a temporary conversion using <span class="code">NS_ConvertUTF16toUTF8</span>.
523
The result is conveniently flat already, so getting the pointer is simple.
524
Remember not to hold onto the pointer you get out of this beyond the lifetime of temporary.
527
<div class="source-code">
529
/* How do I |printf| a string? */
532
void PrintSomeStrings( const nsAString& aString, const PRUnichar* aKey, const nsACString& aCString )
534
// |printf|ing a narrow string is easy
535
printf("%s\n", <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span>); // GOOD
537
// the simplest way to get a |printf|-able |const char*| out of a string
538
printf("%s\n", <span class="notice">NS_ConvertUTF16toUTF8(</span>aKey<span class="notice">).get()</span>); // GOOD
540
// works just as well with an formal wide string type...
541
printf("%s\n", <span class="notice">NS_ConvertUTF16toUTF8(</span>aString<span class="notice">).get()</span>);
544
// But don't hold onto the pointer longer than the lifetime of the temporary!
545
<span class="warning">const char* cstring = NS_ConvertUTF16toUTF8(aKey).get(); // BAD! |cstring| is dangling
546
printf("%s\n", cstring);</span>
555
Here are the email answers I have yet to format into the FAQ.
556
Some of the URLs may be out-dated or moved.
557
The messages are in order from oldest to newest.
559
<p class="editnote">[Note : In June, 2003, these emails were modified
560
to better reflect what is stored in 'wide' string
561
classes (UTF-16 string instead of UCS-2) and what
562
related methods do as a part of the patch for <a href=
563
"http://bugzilla.mozilla.org/show_bug.cgi?id=183156"
564
title="replace UCS2 in function/class/method names with UTF16">bug 183156</a>.
565
Therefore, they're a little different from the original emails
566
written by <a href="http://ScottCollins.net/">Scott Collins</a>]
570
Date: Thu, 13 Apr 2000 19:41:47 -0400
575
<p>This message is all about strings and the various encodings that might
576
be used to interpret their contents, the ramifications of that, and
577
where we're heading. The point of this message is to say what we're
578
currently thinking, and get feedback. I apologize in advance for the
579
rambling, and for the fact that this message may accidentally mix
580
discussion of how things <strong>are</strong> and how they will be.
582
<p>There are many different possible encodings. Three in common use in
583
the Mozilla source base are: ASCII, UTF-16, and UTF-8. In ASCII, every
584
<!--the Mozilla source base are: ASCII, UCS2, and UTF8. In ASCII, every-->
585
character fits in 7-bits and is typically stored in an 8-bit byte. We
586
usually represent ASCII strings with <span class="code">nsCString</span>s, <span class="code">nsXPIDLCString</span>s,
587
or <span class="code">char</span> string literals. In UTF-16, characters occupy one 16-bit code unit (
588
<a href="http://www.unicode.org/glossary/index.html#BMP_character">
589
<abbr title="Basic Multilingual Plane">BMP</abbr>characters</a>)
590
or two 16-bit code units
591
(<a href="http://www.unicode.org/glossary/index.html#supplementary_character">
592
<abbr title="Supplementary Plane : Plane 1 through 16">non-BMP</abbr> characters</a>).
593
We usually represent UTF-16 strings as <span class="code">nsString</span>s, etc., i.e., two-byte
594
or `wide' strings. UTF-8 is a multi-byte encoding. A character might
595
occupy one, two, three, or four bytes. It is easiest to store and
596
manipulate such a string within a single-byte or `narrow' string
599
<p>None of our current string implementations know the encoding of the
600
data they hold at any given moment. An <span class="code">nsCString</span> might legitimately
601
hold data encoded in ASCII, UTF-8 or even EBCDIC for that matter.
603
<p>Operations that convert from one encoding to another, or operations
604
that are encoding sensitive (e.g., <span class="code">to_upper</span>), rightly belong in
605
i18n. The fact that our current string interfaces automatically and
606
implicitly convert between wide and narrow strings is actually the
607
source of many errors in two particular categories: (1) unintended
608
extra work, (2) mistaken re-encoding, e.g., accidentally `converting'
609
a UTF-8 string to UTF-16 by pretending the UTF-8 string is ASCII and then
610
padding with <span class="code">'\0'</span>s.
612
<p>We've known these were bad for a long time, and have been trying to
613
find the right way to fix them. The current thinking is to just byte
614
the bullet and eliminate implicit conversions. That has interesting
617
<div class="source-code">
619
void foo( const nsString& aUTF16string );
621
foo("hello"); // works! constructs a temporary |nsString| by
622
// converting the ASCII literal with padding.
623
// Note: this requires an allocation
627
<p>Though we've always hated this form since it requires a heap
628
allocation. In current code, we recommend
630
<div class="source-code">
632
foo( nsAutoString("hello") );
636
<p>which still copy/converts, but at least it probably doesn't need to do
637
a heap allocation. In the best of all worlds, no conversion, copying,
638
or allocation would be necessary. To do that, you would need to be
639
able to directly specify a UTF-16 string, e.g., with the <span class="code">L"hello"</span>
640
notation, and wrap that in an interface that just held a pointer.
643
<div class="source-code">
645
void foo( const nsAReadableString& aUTF16string );
647
foo( nsLiteralString(L"hello") );
651
<p>There are problems with this example, however. The <span class="code">L</span> notation
652
specifically makes objects that are arrays of <span class="code">wchar_t</span>, which under
653
GCC is a 4-byte element. This leads to incompatibility with JS, and
654
the annoyance of possibly bloated storage (I'm sort of minimizing the
655
situation here. It's worse that I make it sound). More about tricks
656
to get around this in a bit, but first, let me talk about what to do
657
in the meantime while we're just getting rid of implicit constructors.
658
Initially to get around this problem (what problem? The problem that
659
<span class="code">foo("hello")</span> stopped compiling on my machine when I threw the
660
switch) I made a routine called <span class="code">NS_ConvertToString</span> which looked like
663
<div class="source-code">
667
NS_ConvertToString( const char* anASCIIstring )
669
nsAutoString aUCS2string;
670
aUCS2string.AssignWithConversion(anASCIIstring);
676
<p>Which lets me write
678
<div class="source-code">
680
foo( NS_ConvertToString("hello") );
684
<p>This was <strong>OK</strong>, but in discussion there were concerns about performance
685
on machines that didn't <span class="code">inline</span> well, and issues about naming. In
686
that meeting we came up with an alternate naming strategy that we
687
think has room for growth and an implementation more likely to be
688
efficient on every platform. The implementation is to define a new
689
class that derives from <span class="code">nsAutoString</span>, but allows construction from a
690
<span class="code">char*</span>
692
<div class="source-code">
694
class NS_ConvertASCIItoUTF16 : public nsAutoString
697
NS_ConvertASCIItoUTF16( const char* );
703
<p>Which gives identical (though renamed) notation for calling <span class="code">foo</span>:
705
<div class="source-code">
707
foo( NS_ConvertASCIItoUTF16("hello") );
711
<p>It looks like a function call to an explicit encoding conversion. It
712
acts like a function call to an explicit encoding conversion. It <strong>is</strong>
713
a function call to an explicit encoding conversion. We think that
714
this naming pattern has room for growth. In the meeting, we concluded
715
that the best representation for encoding conversions is a family of
716
functions, and <span class="code">NS_ConvertASCIItoUTF16</span> fits right in. We think that
717
XPCOM probably can't live without the ASCII to UTF-16 conversion (though
718
as explicit as possible) but that all others rightly belong in i18n
721
<p>You can probably deduce from the clues in <span class="code">NS_ConvertToString</span>, above,
722
that constructors weren't the only thing that became explicit.
723
Assignment, appending, comparison, et al, got renamed so that when
724
assigning, appending, or comparing to a value in a different encoding
725
the `WithConversion' form must be used. E.g.,
727
<div class="source-code">
729
nsString aUTF16string;
730
nsCString anASCIIstring;
733
aUTF16string += anASCIIstring; // Currently legal, but not for long
734
aUTF16string.Append(anASCIIstring); // same
736
aUTF16string.AppendWithConversion(anASCIIstring); // the new way
738
if ( aUTF16string == anASCIIstring ) // Sorry, this is going away too
741
if ( aUTF16string.EqualsWithConversion(anASCIIstring) )
746
<p>Yes, it's long and annoying. Just like the extra work you were
747
implicitly asking to have done, perhaps incorrectly. There are other
748
reasons to rename these functions. When <span class="code">nsString</span> and <span class="code">nsCString</span>
749
defined a ton of, e.g., <span class="code">Append</span>s each there was no problem, because
750
nobody wanted to override <span class="code">Append</span>. Now, with strings inheriting from
751
abstract base classes we immediately run into the problem that
752
overriding and overloading don't mix very well in C++. Because of a
753
feature of C++ called name hiding, it is problematic to override only
754
a single signature of a name overloaded in a base class. The base
755
<span class="code">nsAWritableString</span> provides several <span class="code">Append</span>s, all for objects of
756
(hopefully) the same encoding. <span class="code">nsString</span> can't easily add a bunch of
757
new <span class="code">Append</span>s (the converting ones) without running face first into
758
the name hiding problem. The discussion of the fix for this is mostly
759
unrelated to encoding issues, so I'll defer it to another post.
761
<p>In hindsight, after the meeting, it seemed clear that all the
762
`WithConversion' forms would be better named
764
<div class="source-code">
766
xxxConvertingASCIItoUTF16
767
xxxConvertingUTF16toASCII
771
<p>however, the <strong>real</strong> goal (probably) is to move most such conversions
772
into i18n. Just bringing attention to the previously implicit
773
conversions is a good first step. Renaming these conversions as just
774
suggested is probably the right thing to do, though it sort of
775
validates them, which I'm not sure we really want. This is a decision
776
we need to discuss further.
778
<p>Now, back to the string literal problem above. One possible solution
779
is to use a macro. Imagine
781
<div class="source-code">
783
NS_LITERAL_STRING("Hello")
787
<p>which on a machine where the <span class="code">L</span> trick works, turns into
789
<div class="source-code">
791
nsLiteralString(L"Hello")
795
<p>but on a machine where there is trouble, turns into something less
796
appealing, but more likely to work, like
798
<div class="source-code">
800
NS_ConvertASCIItoUTF16("Hello")
804
<p>Another solution is to add a compilation step that fixes <span class="code">L</span> strings
805
on bad platforms to be non-<span class="code">L</span> strings, but padded with <span class="code">\0</span>s. E.g.,
806
<span class="code">L"Hello"</span> gets preprocessed into <span class="code">"\000H\000e\000l\000l\000o\000"</span>.
807
This solution is more annoying to the developer, where the prior
808
solution is more annoying during the runtime.
810
<p>Before we go to too much trouble on this specific feature, we will
811
probably want to do more measurement to see just how much and how
812
often we are converting constant literal strings, and why.
815
<p>I'm currently ripping through the tree fixing things to use the
816
`WithConversion' forms where appropriate. I was also converting
817
things to use <span class="code">NS_ConvertToString</span> where appropriate; unless I get
818
talked out of it, I want to switch midstream to
819
<span class="code">NS_ConvertASCIItoUTF16</span>, then go back and fix up the
820
<span class="code">NS_ConvertToString</span> instances later. I've set things up so I can
821
check in as I go. After all these conversions have been done, I'll be
822
able to throw the switch (what switch? NEW_STRING_APIS) which will
823
make <span class="code">nsString</span> inherit from <span class="code">nsAWritableString</span>, etc. and allow us to
824
start exploiting these other opportunities (e.g., for literal strings,
825
shared strings, etc. See
826
<a class="exact-uri" href="http://bugzilla.mozilla.org/show_bug.cgi?id=28221">http://bugzilla.mozilla.org/show_bug.cgi?id=28221</a> for details and
829
<p>I guess I'm expecting comments on:
832
<li>how really annoying this whole topic is
833
<li>how bad <span class="code">L"xxx"</span> is
834
<li>whether to move forward with <span class="code">NS_ConvertASCIItoUTF16</span>
835
<li>whether we should move to xxxConvertingASCIItoUTF16 etc instead
837
<li>arguments about where encoding conversions should live
838
<li>arguments about whether going between 1 and 2 byte storage is an
840
<li>questions about stuff I didn't mention or didn't explain well
841
<li>pointing out stuff I'm just plain wrong about, or things I forgot
845
<p>So as not to jumble the discussion, I'll be separately posting other
846
requests for comments about specific features of the design of the new
849
<p>I hope this helps keep everybody filled in on what we're thinking and
850
able to point out what we're forgetting or screwing up :-)
858
Date: Wed, 19 Apr 2000 21:12:47 -0400
859
Subject: more string info
862
<p> <a class="exact-uri" href="news://news.mozilla.org/scc-705460.16423913042000@news.mozilla.org">news://news.mozilla.org/scc-705460.16423913042000@news.mozilla.org</a>
870
Date: Fri, 26 May 2000 15:31:37 -0400
871
Subject: Re: Question on ==
874
<p>I would prefer you compare with <span class="code">Equals</span> (which should really be named
875
<span class="code">IsEqualTo</span>) rather than <span class="code">operator==()</span> because of this:
877
<div class="source-code">
889
<p>Comparing two raw `string' pointers doesn't compare the characters
890
they point to, but instead compares the bits of the pointers. For
891
this reason, I may eventually make comparison of a string with a
892
pointer using operators just go away.
900
Date: Wed, 14 Jun 2000 14:38:55 -0400
901
Subject: Re: Fix to XprtDefs.h
904
<p>Yes, we're aware that turning off <span class="code">wchar_t</span> support makes <span class="code">wchar_t</span> be
905
a synonym for <span class="code">unsigned short</span> under Metrowerks. We know that the
906
current version of VC++ also makes these types equivalent. In theory,
907
though, the types are distinct even when they are the same size and
908
shape. By using real <span class="code">wchar_t</span> support, we are forced to recognize
909
the distinction and navigate it appropriately with <span class="code">reinterpret_cast</span>
910
(via <span class="code">NS_REINTERPRET_CAST</span>). The win here is that we aren't caught by
911
compiler changes that suddenly make some set of compilers compliant
912
and therefore break our code. We will add an autoconf test that lets
913
UNIX compilers opt in to our string scheme when they have an
914
appropriately shaped <span class="code">wchar_t</span>. If these happen to be compliant
915
compilers, all will be well. If they don't, the casts don't hurt,
916
because they are type correct. We are writing our code to meet the
917
standard as we move forward.
919
<p>The win for us is realized by the following macros
921
<div class="source-code">
923
#ifdef HAVE_CPP_2BYTE_WCHAR_T
924
#define NS_LITERAL_STRING(s) nsLiteralString(L##s, \
925
(sizeof(L##s)/sizeof(wchar_t))-1)
927
#define NS_LITERAL_STRING(s) NS_ConvertASCIItoUTF16(s, \
933
<p>An <span class="code">nsLiteralString</span> points directly to the literal characters. No
934
copying, no conversion, and the length calculation happens at compile
935
time. This has turned out to be as large a savings as 15% of code
936
space and 8% of data space, net, in our string test harness It's
937
faster as well, again by eliminating the copying, conversion, and
938
length calculation. We don't know yet what those numbers translate
939
into in our real code base, but we have high hopes.
941
<p>I don't want to be in the position to ask you to change your code. I
942
don't think it's appropriate for me to do so. The AIM application
943
that is your client is our client as well. They need to resolve this
944
difference between us in whatever way they think best. That may mean
945
asking you if changing your apis is the right thing to do. Or it may
946
mean applying the casts. Our code-base and yours, Justin, are more
947
like cousins. I don't think you should have to change just to conform
948
to us. You may think my arguments for using real <span class="code">wchar_t</span> have
949
merit, and adopt similar usage just because you agree; but I think the
950
only obligation you have is to follow the technical solution you think
951
is right for your code.
953
<p>If you decide to make this api change, it will mean shipping a new
954
binary (on Mac) for your library to clients who want to switch over to
955
the new api (since the name mangling will be different, and therefore,
956
the link requirements will change).
966
Date: Thu, 15 Jun 2000 19:36:55 -0400
967
Subject: Re: Checkin approval for bug 32336
970
<div class="source-code">
972
S.Equals(NS_LITERAL_STRING("bar"), PR_TRUE, 3)
976
<p>doesn't compile because there is no three parameter form for <span class="code">Equals</span>.
977
For all definitions of <span class="code">Equals</span> on strings, see "nsAReadableString.h"
979
<p><a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
981
<p>There is an <span class="code">EqualsWithConversion</span> that takes three parameters.
983
<p> <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString2.h#731">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString2.h#731</a>
985
<p>It is ``EqualsWithConversion'' because it admits the possibility of an
986
encoding specific transformation, in this case to provide
987
case-insensitive comparison. This also wouldn't compile, however,
988
since, at the moment, an <span class="code">nsLiteralString</span> doesn't provide an operator
989
to produce a <span class="code">const PRUnichar*</span> (though perhaps it should), and it
990
doesn't satisfy the other interfaces that match this call, e.g., a
991
<span class="code">const nsString&</span>.
993
<p>Perhaps I need to move case-insensitive comparison up out of
994
<span class="code">nsString</span> into a global encoding specific transformations and
995
algorithms file (which was on its way anyway as Waterson, knows); this
996
use is one bit of evidence to support this. In the short term, this
997
can be fixed (if we think the current behavior is wrong) by providing
998
<span class="code">operator const CharT*() const</span> on literal string.
1000
<p>If you can live with out case-folding, the earlier form is preferred
1002
<div class="source-code">
1004
S == NS_LITERAL_STRING("bar")
1008
<p>if you can't, then one of the fixes I mentioned is in order.
1016
Date: Thu, 15 Jun 2000 19:47:12 -0400
1017
Subject: Re: [Fwd: how to use nsString ?]
1020
<pre class="email-quote">
1021
>I see these same examples time and again in the embedding
1022
>samples/docs, but I can't compile them.
1025
<p>Apologies. Documentation mentioning strings is getting out of date.
1026
Here are some specific answers.
1029
<pre class="email-quote">
1030
>nsString URLString("http://www.mozilla.org");
1033
<p>...is now perhaps best expressed as
1035
nsString URLString( NS_LITERAL_STRING("http://www.mozilla.org") );
1037
<p>since an <span class="code">nsString</span> is a sequence of 2-byte wide characters, and the
1038
routines that implicitly convert 1-byte sequences (like the literal
1039
sequence you specified, "http:...") are now gone.
1041
<p>Up until not too long ago, one would have had to say
1043
<div class="source-code">
1046
URLString.AssignWithConversion("http://www.mozilla.org");
1050
<p>The <span class="code">NS_LITERAL_STRING</span> construction is new machinery that has the
1051
potential to make many operations much more efficient.
1053
<pre class="email-quote">
1054
>nsString URLString;
1055
>URLString.SetString("www.mozilla.org");
1058
<p><span class="code">SetString</span> was a synonym for <span class="code">Assign</span> or assignment with
1059
<span class="code">operator=()</span>, it too went away. The equivalent is the second
1060
example I gave above, that is, the one with <span class="code">AssignWithConversion</span>.
1062
<p><span class="code">Assign</span> still exists. <span class="code">AssignWithConversion</span> takes on that
1063
functionality for assignments that require encoding transformations
1064
(e.g., from ASCII to UTF16). <span class="code">SetString</span> is gone, since it was always
1065
a synonym for <span class="code">Assign</span>.
1067
<p>Learn more about the general APIs for strings that we are trying to
1068
move to by examining
1070
<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
1071
<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h</a>
1081
Date: Thu, 15 Jun 2000 21:26:51 -0400
1082
Subject: Re: Checkin approval for bug 32336
1085
<pre class="email-quote">
1086
>I *need* the count attribute, because I need to compare only the first
1087
>chars (that's inherent to the logic).
1090
<p>This is what substrings are for. In that case, you could use
1092
<div class="source-code">
1094
Substring(S, 0, 3) == NS_LITERAL_STRING("bar")
1098
<p>As for case-folding, it's best if you can case-fold everything up
1099
front, instead of doing it repeatedly. I'll have to get back to you
1100
on a general solution to that problem, or what my schedule for getting
1101
it checked in would be. I'm sorry, I know that's not what you needed
1102
to hear. If the source string is an <span class="code">nsString</span>, you can continue to
1103
exploit its implementation of these routines, e.g., <span class="code">ToLower</span> all
1114
Date: Mon, 19 Jun 2000 14:23:47 -0400
1115
Subject: Re: string fu
1118
<pre class="email-quote">
1119
>It seems less convenient to have to first check path.IsEmpty, and
1120
>then if false get path.Last and test it.
1123
<p>What would you prefer? That extracting a character not in the string
1124
always return <span class="code">CharT(0)</span>? Can't do it for two reasons: (1) <span class="code">0</span> may be
1125
a valid character in a particular encoding, so it can't be used in
1126
general as a ``no character at that position'' marker; and (2) I can't
1127
control what an individual string implementation does when asked to
1128
get an out-of-bounds fragment, it's explicitly undefined. That means
1129
the result of <span class="code">CharAt</span> is explicitly undefined for indexes outside the
1130
defined contents of the string. As a debugging convenience, I have
1131
made this assert, but it has always been the case that retrieving such
1132
a character had undefined results ... even in [the old] code.
1134
<p>OK, you might say, well at least let me ask for a character that is
1135
only off the end by one. E.g., <span class="code">Last</span> of an empty string. Reason (1)
1136
from above still applies. How bad is it to say, for the case you gave
1138
<div class="source-code">
1140
PRBool needsDelim = PR_FALSE;
1141
if ( !path.IsEmpty() )
1143
PRUnichar last = path.Last();
1144
needsDelim = !(last == '/' || last == '\\');
1149
<p>In general, you probably want to opt out of a whole lot of work when
1150
the source string is empty. It is slightly less convenient, but it
1151
doesn't tie us to a bunch of implementation specific mojo.
1154
<pre class="email-quote">
1155
>Can we fix GetUnicode in this case?
1158
<p>This is an annoying property of auto strings, e.g., that they always
1159
have an allocated buffer. I'm happy to fix this bug, however, be
1160
aware that <span class="code">GetUnicode</span> and <span class="code">GetBuffer</span> are artifacts of [the old]
1161
implementation that we don't want to support. They are not part of
1162
the abstract interface. We will keep them no longer than we have to.
1163
They don't support our multi-fragment paradigm. People who require a
1164
contiguous hunk of characters in the future, and are unwilling to
1165
switch over to chunky-iterators, may be forced to copy the string to
1166
their own buffer. There will be an implementation of narrow character
1167
string that guarantees contiguous allocation and a zero-terminator,
1168
much as <span class="code">nsCString</span> does now, for compatibility with platform uses,
1169
but this won't be the default string class.
1177
Date: Mon, 19 Jun 2000 17:22:31 -0400
1180
<p>Clarifying String Sematics
1182
<p>Recently, I added an assert to the string operations that extract
1183
characters, namely <span class="code">First()</span>, <span class="code">Last()</span>, <span class="code">CharAt()</span>, and
1184
<span class="code">operator[]()</span>. This assert fires when any of these routines are used
1185
to access a character outside the defined contents of the string. For
1186
<span class="code">First()</span> and <span class="code">Last()</span> that means whenever they are applied to an
1187
empty string. For <span class="code">CharAt()</span> and <span class="code">operator[]()</span>, that means whenever
1188
they are used to access an index outside the range of
1189
<span class="code">0</span>..<span class="code">Length()-1</span>. There have been some complaints, however, the
1190
result was always undefined. What follows is extracted from an email
1191
exchange between me and warren on this topic. I hope it clarifies
1195
<pre class="email-quote">
1196
>I hit your funky CharAt assertion tonight in this piece of code:
1199
>nsIOService::ResolveRelativePath(
1200
> const char *relativePath,
1201
> const char* basePath,
1204
> nsCAutoString name;
1205
> nsCAutoString path(basePath);
1207
> PRUnichar last = path.Last();
1208
> PRBool needsDelim = !(last == '/' || last == '\\' || last ==
1212
>where basePath is null. It seems less convenient to have to first
1213
>check path.IsEmpty, and then if false get path.Last and test it.
1217
<pre class="email-quote">
1218
>What would you prefer? That extracting a character not in the
1219
>string always return <span class="code">CharT(0)</span>? Can't do it for two reasons:
1220
>(1) <span class="code">0</span> may be a valid character in a particular encoding, so it
1221
>can't be used in general as a ``no character at that position''
1222
>marker; and (2) I can't control what an individual string
1223
>implementation does when asked to get an out-of-bounds fragment,
1224
>it's explicitly undefined. That means the result of <span class="code">CharAt</span> is
1225
>explicitly undefined for indexes outside the defined contents of
1226
>the string. As a debugging convenience, I have made this assert,
1227
>but it has always been the case that retrieving such a character
1228
>had undefined results ... even in [the old] code.
1230
>OK, you might say, well at least let me ask for a character that
1231
>is only off the end by one. E.g., <span class="code">Last</span> of an empty string.
1232
>Reason (1) from above still applies. How bad is it to say, for the
1235
> PRBool needsDelim = PR_FALSE;
1236
> if ( !path.IsEmpty() )
1238
> PRUnichar last = path.Last();
1239
> needsDelim = !(last == '/' || last == '\\');
1242
>In general, you probably want to opt out of a whole lot of work
1243
>when the source string is empty. It is slightly less convenient,
1244
>but it doesn't tie us to a bunch of implementation specific mojo.
1247
<p>Warren also asks:
1248
<pre class="email-quote">
1249
>Here's another issue, perhaps more serious. If I say this:
1251
> foo(const PRUnichar* s) {
1252
> nsAutoString str(s);
1256
>where s is null, bar will get passed a zero-length PRUnichar
1257
>sequence instead of null. This makes it so that you can't just
1258
>test for the argument == null. You have to nsCRT::strlen(arg) == 0
1259
>which is much less efficient. Can we fix GetUnicode in this case?
1263
<pre class="email-quote">
1264
>This is an annoying property of auto strings, e.g., that they
1265
>always have an allocated buffer. I'm happy to fix this bug,
1266
>however, be aware that <span class="code">GetUnicode</span> and <span class="code">GetBuffer</span> are artifacts
1267
>of [the old] implementation that we don't want to support. They
1268
>are not part of the abstract interface. We will keep them no
1269
>longer than we have to. They don't support our multi-fragment
1270
>paradigm. People who require a contiguous hunk of characters in
1271
>the future, and are unwilling to switch over to chunky-iterators,
1272
>may be forced to copy the string to their own buffer. There will
1273
>be an implementation of narrow character string that guarantees
1274
>contiguous allocation and a zero-terminator, much as <span class="code">nsCString</span>
1275
>does now, for compatibility with platform uses, but this won't be
1276
>the default string class.
1279
<p>In a later message, Chris Waterson asks a related question
1280
<pre class="email-quote">
1281
>scc: should we add <span class="code">operator PRUnichar*()</span> to
1282
>NS_ConvertASCIItoUTF16?
1286
<pre class="email-quote">
1287
>It seems reasonable. A lot more reasonable that forcing people to
1288
>call <span class="code">GetUnicode()</span>. I alluded to platform specific classes in an
1289
>earlier message to warren that you were cc'd on, Chris. I imagine
1290
>that the <span class="code">...Convert...</span> routines would be required to produce
1291
>contiguous allocation 0-terminated strings (though the as yet
1292
>unimplemented <span class="code">...Copy...</span> forms, of course wouldn't. So <span class="code">operator
1293
>const PRUnichar*() const</span> makes perfect sense to me here.
1296
<p>Hope this makes sense,
1303
Date: Tue, 20 Jun 2000 04:05:31 -0400
1304
Subject: Re: NS_LITERAL_STRING is broken
1307
<p>The behavior you describe sounds exactly like when you say
1309
<div class="source-code">
1311
const char* foobar = "foobar";
1313
... NS_LITERAL_STRING(foobar).get() ...
1317
<p>because in this case, the thing passed in is a <span class="code">const char*</span>.
1318
<span class="code">NS_LITERAL_STRING</span> is not meant to be used in this way. It is only
1319
meant to be used around a <span class="code">"</span> delimited string. The type of such is
1320
<span class="code">const char[N]</span> where N is the number of characters in the string + 1
1321
for the zero terminator it helpfully adds. <span class="code">sizeof</span> such a type is
1322
<span class="code">N</span>.
1324
<p>Are you sure you had the actual string as an argument, as in your
1325
example to me? Or could the actual code have been like my sample,
1334
Date: Thu, 29 Jun 2000 13:35:10 -0400
1338
<pre class="email-quote">
1339
> + if (Length() == 0) { return nsnull; }
1347
<a class="exact-uri" href="news://news.mozilla.org/scc-314ABF.14261619062000@news.mozilla.org">news://news.mozilla.org/scc-314ABF.14261619062000@news.mozilla.org</a>
1349
<p>It's just plain wrong to let people try to index into a string outside
1350
its defined contents. I can't just return <span class="code">'\0'</span> or <span class="code">PRUnichar('\0')</span>
1351
there as that <strong>could</strong> be a legal value to have somewhere in your
1352
string for some encodings ... and the encoding is not specified. So
1353
your patch has the basic problem of defeating my plan to stop people
1354
from doing this bad thing.
1356
<p>The second problem with your patch is that you use the symbolic
1357
constant <span class="code">nsnull</span>, which is ostensibly a pointer value; <span class="code">Last</span> returns
1358
a character. <span class="code">nsnull</span> is not appropriate for that purpose. In fact,
1359
C++ gurus pretty much eschew the use of symbolic constants for <span class="code">0</span>.
1360
<span class="code">NULL</span> is to be avoided. <span class="code">nsnull</span> is wrong-headed in that it presumes
1361
we could have some <strong>other</strong> application specific value for <span class="code">NULL</span>. We
1362
can't, it would never work. It's just wasted brain-print. Always use
1363
<span class="code">0</span> for these situations, and if you want to communicate the fact that
1364
something is a pointer type, either use a comment or a
1365
(construction-style) cast, like so (graded examples from worst to
1369
<li>F: FindChildByNameWithHint("Chuck", nsnull);
1371
<li>D: FindChildByNameWithHint("Chuck", NULL);
1373
<li>C: FindChildByNameWithHint("Chuck", /* Child* */ 0);
1375
<li>B: typedef Child* Child_ptr;
1376
FindChildByNameWithHint("Chuck", Child_ptr(0));
1378
<li>A: FindChildByNameWithHint("Chuck", 0);
1381
<p>Don't let this discourage you; keep up the good work :-)
1389
Date: Tue, 8 Aug 2000 23:47:16 -0400
1390
Subject: Re: nsWritingIterator?
1393
<pre class="email-quote">
1394
>Can you give me any pointers to examples, or docs, or just some
1398
<a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_iterators.html">http://ScottCollins.net/Journal/discussion/string_iterators.html</a>
1402
<p>I can personally walk you through any specific scenario you need.
1410
Date: Wed, 9 Aug 2000 02:35:03 -0400
1411
Subject: Re: nsWritingIterator?
1414
<p>You got it right... it's <span class="code">nsWritingIterator<CharT></span> for whichever
1415
character type you care about, either <span class="code">char</span> or <span class="code">PRUnichar</span>. You
1416
_can_ use this iterator like a character pointer ... that is, you can
1417
dereference it, assign into its dereference, etc. It is more
1418
efficient, though, to directly address a particular range of
1419
characters around where it points by asking it for its actual
1420
character pointer with <span class="code">get</span>, and knowing that there are
1421
<span class="code">size_forward()</span> characters available ahead of that pointer and
1422
<span class="code">size_backward()</span> characters available behind it. After examining
1423
those characters by hand, you can advance the iterator beyond the
1424
characters you have examined (and possibly into the next chunk, should
1425
one exist) by adding into it (with +=) the count of the characters you
1428
<p>Here are three examples of running through a string and modifying some
1429
of the characters in it. All use <span class="code">nsWritingIterator</span>s.
1432
<div class="source-code">
1434
// inefficient, but works in a pinch:
1435
// iterators can hide all details of chunks by acting like
1436
// a raw character pointer
1438
nsWritingIterator<PRUnichar> s = S.BeginWriting();
1439
nsWritingIterator<PRUnichar> done_with_string = S.EndWriting();
1441
// for each character in the string |S|
1442
while ( s != done_with_string )
1444
// if the character is lower case, capitalize it
1445
if ( 'a' <= *s && *s <= 'z' )
1453
// iterators provide a mechanism by which you can process
1454
// a chunk-at-a-time
1456
nsWritingIterator<PRUnichar> iter = S.BeginWriting();
1457
nsWritingIterator<PRUnichar> done_with_string = S.EndWriting();
1459
// for each chunk of the string
1460
while ( iter != done_with_string )
1462
size_t N = iter.size_forward(); // # of chars in this chunk
1463
PRUnichar* s = iter.get();
1464
PRUnichar* done_with_chunk = s + N;
1466
// for each character in this chunk
1467
for ( ; s < done_with_chunk; ++s )
1469
// if the character is lower case, capitalize it
1470
if ( 'a' <= *s && *s <= 'z' )
1471
*s = *s - 'a' + 'A';
1474
// advance the iterator past characters
1475
// we examined (and into the next chunk, if any)
1482
// pull your transformation into a `sink', and |copy_string|
1483
// will efficiently pump any kind of string into it
1489
write( PRUnichar* s, PRUint32 N )
1490
// processes one chunk, called repeatedly by |copy_string|
1492
PRUnichar* done_with_chunk = s + N;
1494
// for each character in this chunk
1495
for ( ; s < done_with_chunk; ++s )
1497
// if the character is lower case, capitalize it
1498
if ( 'a' <= *s && *s <= 'z' )
1499
*s = *s - 'a' + 'A';
1504
copy_string(S.BeginWriting(), S.EndWriting(), Capitalize());
1510
<p>Does this show it better?
1518
Date: Thu, 17 Aug 2000 18:23:22 -0400
1521
<pre class="email-quote">
1522
>I tried looking at the string header files but they
1523
>are awfully complicated.
1526
<p>I'll explain things in a little <strong>more</strong> detail than you need, then so
1527
that some of the stuff you see in these headers will make more sense.
1528
I'll also answer your questions out of order.
1530
<p>First: the string hierarchy looks like this
1532
<a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_hierarchy.gif">http://ScottCollins.net/Journal/discussion/string_hierarchy.gif</a>
1534
<p>The two most important headers are:
1536
<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAReadableString.h</a>
1537
<a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h">http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsAWritableString.h</a>
1539
<p>These abstract classes, <span class="code">nsAReadable[C]String</span>, and
1540
<span class="code">nsAWritable[C]String</span> are typically what you will want to use in the
1541
interfaces of new code. If you write a piece of code that takes a
1542
string for input, consider, e.g.,
1544
<div class="source-code">
1546
void consumes_a_string( const nsAReadableString& aInput );
1550
<p>If you write a piece of code that modifies a string, consider
1552
<div class="source-code">
1554
void modifies_a_string( nsAWritableString& aResult );
1559
<p>When creating your own classes, member strings will typically be
1560
<span class="code">nsString</span>s. When you can't avoid creating a short string that you
1561
need only temporarily during a function, you will typically use
1562
<span class="code">nsAutoString</span>. When someone passes you a raw pointer, or a raw
1563
pointer and a length, representing a buffer of characters that you may
1564
examine, but won't own, you can treat it like a string by wrapping it
1565
in an <span class="code">nsLiteralString</span>, e.g.,
1567
<div class="source-code">
1570
reads_a_buffer( const PRUnichar* aInput, PRUint32 aInputLength )
1572
nsLiteralString input(aInput, aInputLength);
1573
// doesn't allocate or copy
1580
<p>You will use <span class="code">nsLiteralString</span> around quoted constant strings as well,
1581
though typically through the <span class="code">NS_LITERAL_STRING</span> macro, to avoid doing
1582
a length calculation
1584
<div class="source-code">
1586
NS_LITERAL_STRING("x")
1592
<div class="source-code">
1594
nsLiteralString(L"x", (sizeof(L"x")/sizeof(PRUnichar) - 1))
1598
<p>if <span class="code">L</span> notation works as needed on your platform.
1600
Those are the basics. Now onto your questions:
1603
<pre class="email-quote">
1604
>For example this won't compile. [...]
1606
>str1 += L"abc " + str2 + L"def";
1610
<p><span class="code">L"abc "</span> makes a an object that is a <span class="code">const wchar_t[5]</span>, and none of
1611
the string code knows about <span class="code">wchar_t</span>. The main reason is that
1612
<span class="code">wchar_t</span> is not necessarily the right size (it can be 4 bytes under
1613
gcc). If you wrap these constant expressions in <span class="code">NS_LITERAL_STRING</span>,
1614
as described above, you should get the right thing, e.g.,
1616
<div class="source-code">
1618
str1 += NS_LITERAL_STRING("abc ") + str2 + NS_LITERAL_STRING("def");
1623
<pre class="email-quote">
1625
>function(const PRUnichar *foo);
1626
>call function(L"abc " + str2);
1628
>It won't create a temporary nsString.
1631
<p>This one, I have a quick and easy explanation for. If <span class="code">function</span> was
1634
<div class="source-code">
1636
function( const nsAReadableString& )
1640
<p>then, no problem, since a <span class="code">nsPromiseConcatenation</span> (which was the
1641
result of adding those two things together) <strong>is</strong> a readable string.
1642
No other objects need to be created; no copying needs to be performed.
1644
<p>In all cases, we want the creation of <span class="code">nsString</span>s et al, to be
1645
<span class="code">explicit</span>, since creation is unbelievably expensive, requiring heap
1646
allocation, locks, copying, etc.
1648
<p>I hope this answers both your posts,
1656
Date: Thu, 17 Aug 2000 20:57:08 -0400
1657
Subject: re our conversation
1660
return ToNewUnicode( nsLiteralCString(buffer) );
1669
Date: Fri, 18 Aug 2000 02:52:45 -0400
1670
Subject: Re: More questions and new string API
1673
<pre class="email-quote">
1674
>1) How do I return a static string?
1676
>const nsAReadableString& foo() {return NS_LITERAL_STRING("x");}
1677
>errors on taking the address of a temporary variable.
1680
<p>Unfortunately, <span class="code">NS_LITERAL_STRING</span>s definition is not particularly
1681
amenable to this use. Instead, you would have to say something like
1684
<div class="source-code">
1686
const nsAReadableString&
1689
#ifdef HAVE_CPP_2BYTE_WCHAR_T
1690
static nsLiteralString static_foo(L"x", 1);
1692
static nsLiteralString static_foo;
1693
static PRBool initialized = PR_FALSE;
1696
static_foo.AssignWithConversion("x", 1);
1697
initialized = PR_TRUE;
1706
<pre class="email-quote">
1707
>2) I'm using these with the STL library in an XPCOM component.
1708
>What type should I use with map? This doesn't work...
1710
>typedef map<const nsAReadableString&, myType*> mapStringMyType;
1711
>mapStringMyType foo;
1712
>foo.find(nsAReadableString); - I want to find on a ReadableString
1715
<p>I don't know what errors you are getting; but it probably doesn't work
1716
because a reference isn't an assignable type. This is just a guess.
1719
<div class="source-code">
1721
map<const nsAReadableString*, myType*>
1725
<p>If you actually want the map to manage ownership of the keys, then
1726
you'll want to use a concrete type, e.g.,
1728
<div class="source-code">
1730
map<nsString, myType*>
1736
<div class="source-code">
1738
map<nsSharedStringPtr, myType*>
1742
<p>Or maybe there's something else wrong. Send me the error messages.
1743
If you end up using a pointer, then of course you'll have to supply a
1744
comparison function to the <span class="code">map</span> template. You won't be satisfied
1745
with the default comparison of pointers :-) Sorry I couldn't answer
1746
this one more completely.
1749
<pre class="email-quote">
1750
>3) How do a get a raw PRUnichar pointer out of nsAReadableString
1751
>when I need to call something that wants 'unsigned short *'?
1754
<p>The problem with this scenario is that an <span class="code">nsAReadableString</span> doesn't
1755
promise that all its data is contiguous, nor that it is
1756
zero-terminated, which is what I suspect you want in this case. If
1757
the function you want to call can take {pointer, length} tuples, and
1758
can consume the string in hunks without zero termination ... then you
1759
can use <span class="code">copy_string</span> to pump the string into your function, see
1761
<a class="exact-uri" href="http://ScottCollins.net/Journal/discussion/string_iterators.html">http://ScottCollins.net/Journal/discussion/string_iterators.html</a>
1763
<p>If not, and you absolutely have to have a contiguous zero-terminated
1764
buffer, then there is a new facility (part of the DOMAPI branch) that
1765
does what you need. It's not checked in on the trunk; it should
1766
be in early next week. It is <span class="code">nsPromiseFlatString</span>. This class
1767
promises a contiguous zero-terminated buffer; and has an <span class="code">operator
1768
PRUnichar*</span> to produce a pointer to that buffer automatically. If the
1769
underlying class <strong>is</strong> one that happens to be a single fragment and
1770
zero-terminated, then, like <span class="code">nsPromiseSubstring</span> and
1771
<span class="code">nsPromiseConcatenation</span>, this class merely holds a reference into the
1772
original data. If, however, the underlying string is multi-fragment
1773
or not zero-terminated, then <span class="code">nsPromiseFlatString</span> allocates a
1774
contiguous buffer of appropriate size and copies the fragmented string
1775
data to it. So given
1777
<div class="source-code">
1779
void ReadBuffer( PRUnichar* );
1783
<p>You can call this as efficiently as possible with an arbitrary string
1786
<div class="source-code">
1788
ReadBuffer( nsPromiseFlatString(aString) );
1793
<p>If the function you are calling needs to take ownership of the buffer
1794
you hand it, then you will probably call <span class="code">ToNewUnicode</span> like so
1796
<div class="source-code">
1798
void ConsumeBuffer( PRUnichar* );
1800
ConsumeBuffer( ToNewUnicode(aString) );
1804
<p>The global function <span class="code">ToNewUnicode</span> is declared in "nsReadableUtils.h",
1805
and was only recently added to the build. It is currently being used
1806
in the DOMAPI branch. It is part of the build, but the file
1807
"dlldeps.c" in XPCOM may need to be modified to ensure it is exported
1808
on your platform if you are building the tip.
1810
Needless to say, you want to avoid functions that require bare
1811
pointers for several reasons: (a) they typically assume
1812
zero-termination, which is not guaranteed by the normal encodings; (b)
1813
they require contiguous allocation, which may not be possible; (c)
1814
they scan for the end of the string, at linear cost (if the encoding
1815
makes it possible at all), when the length could be known in advance.
1816
If you have to do it, the above mechanisms work, but be aware of the
1817
cost and the potential need to copy.
1820
<pre class="email-quote">
1821
>4) How do I declare a local variable to hold a nsAReadableString?
1822
>and a member variable?
1825
<p><span class="code">nsAReadableString</span> is an abstract type. So you can't have a concrete
1826
instance of it. All strings in the hierarchy are readable strings.
1827
If you just want a reference to a readable string, you can say, e.g.,
1829
<div class="source-code">
1833
const nsAReadableString& mString;
1836
foo( const nsAReadableString& aString ) : mString(aString) { }
1841
<p>...similarly with pointers; but I suspect you are looking for
1842
something more concrete. An <span class="code">nsString</span> is a <span class="code">nsAReadableString</span>, and
1843
is the typical thing you want as a member variable. An <span class="code">nsAutoString</span>
1844
is also an <span class="code">nsAReadableString</span> and is typically what you would use for
1845
a short (in length) temporary (in lifetime) local variable, as I
1846
mentioned in my previous post.
1849
<pre class="email-quote">
1850
>5) If I call a function that returns a PRUnichar* and I want t
1851
>use it as a nsAReadableString should I wrap it in a
1855
<p>Yes, though remember, an <span class="code">nsLiteralString</span> assumes the lifetime of the
1856
underlying data is under someone else's control. If the called
1857
function gives you a buffer that you need to <span class="code">delete</span>, you will have
1858
to manage that yourself. Currently, people often use <span class="code">nsXPIDLString</span>
1859
to handle that. XPIDL strings are <strong>not</strong> part of the hierarchy. They
1860
are only used as a sort of string-<span class="code">auto_ptr</span>. However, I'm
1861
integrating their functionality into <span class="code">nsString</span>. There is no problem
1862
in wrapping the same pointer in both as two separate local variables,
1863
one to give you the readable interface, and one to manage the
1866
<p>If it's OK with you, I'd like to post this reply (including your
1867
quoted questions) to n.p.m.xpcom and also put a copy near the string
1868
iterator discussion I provided a link to above, so that other people
1869
with similar questions can see these answers.
1879
Date: Sun, 3 Sep 2000 03:52:17 -0400
1882
<p>In article <8nu9m2$eo14@secnews.netscape.com>, "Jon Smirl"
1883
<jonsmirl@mediaone.com> wrote:
1885
> I have the new strings up and running in my app. They work as
1887
> I haven't found any bugs. Thanks for the good job in designing and
1888
> implementing them. Here's are a summary of issues I've encountered
1891
<p>Thanks, and I appreciate your comments and insights.
1895
> 1) Should there be a nsSegmentedString derived from nsString instead
1896
> of building segment support into nsString? None of my strings are
1898
> I keep executing code that is supports it. nsPromiseFlatString would
1899
> be trivial in the non-segmented case.
1901
<p>The general case is that a string does not promise to have contiguous
1902
data. A specific case is that, for some implementations, it does.
1903
You couldn't do it the other way around, because a segmented string
1904
couldn't satisfy all the promises of a flat string. However, through
1905
the use of chunky iterators, operating on strings that happen to be
1906
flat is very efficient. In fact, <span class="code">nsPromiseFlatString</span> is trivial in
1907
the non-segmented case. In addition, I'll be adding an abstract flat
1908
class into the hierarchy, which will present additional interface ...
1909
in your local routines where you actually have declared a concrete
1910
string instance that happens to be flat, the compiler will give you
1911
the benefit of using the flat specific routines (e.g., a substring
1912
object over a flat string is simpler than the general purpose
1913
substring). I need to be cautious about this, though, since I don't
1914
automatically want people propagating the flat type through their
1915
interfaces. That would put us in the same boat we're in right now ...
1916
where routines only work on a specific kind of string, which denies
1917
other parts of the code the opportunity to use an implementation
1918
beneficial to its specific needs, and typically for no good reason.
1921
> 2) Should nsAWritableString have a way to get the buffer and then
1923
> I need to get the buffer to pass it to OS calls. I'm doing this now
1924
> by passing around nsStrings instead of the interface. If I just use
1925
> the interface I encur an extra copy since I have to use a temporary
1928
<p>A specific string implementation could promise this, but in general, a
1929
writable could not. After all, a writable doesn't even guarantee
1930
contiguous storage. To some degree, this is what
1931
<span class="code">nsPromiseFlatString</span> is for. However, this is a readable promise
1932
only. It will also be the case that <span class="code">ns[C]String</span>s, in the very near
1933
future will be able to just assume ownership of an arbitrary buffer
1934
allocated on the free store with the XPCOM allocators ... getting one
1935
to give up its buffer, on the other hand, presents some problems. Do
1936
you have a lot of places where the system writes into your string
1937
buffer space? Or do you have a lot of system routines that return you
1938
new buffers? I can imagine using <span class="code">nsPromiseFlatString</span> for this, but
1939
what happens when the OS alters the underlying data? If the promise
1940
had generated that flat data on behalf of a multi-fragment string,
1941
should it now put the changes back? It's possible to do, I just want
1942
to know if it's correct to allow this situation to happen.
1947
> 3) There needs to be a NS_LITERAL_CHAR() to go along with
1948
> NS_LITERAL_STRING().
1954
> Having NS_LITERAL_STRING() all over the code clutters
1955
> it up and makes it hard to tell what the code is doing, could we
1956
> have a standard short alias for this?
1958
<p>Yes, I'll try to think of something ... perhaps <span class="code">NS_LSTR</span>?
1961
> 4) nsLiteralString should support n.ToInteger(&error);
1963
<p><span class="code">ToInteger</span> is actually a bad interface. It's only good if your
1964
entire string is the number; this encourages you to edit your string
1965
until it is one, or perhaps copy the numeric part to another string.
1966
Better if you just <span class="code">sscanf</span> a string (don't know if I can provide
1967
that in the general case, but I'm thinking about it), or else use
1968
regular C++ extractors (which wouldn't be too hard for me to
1969
provide), or else I could give you a <span class="code">ToInteger</span> that works on a pair
1970
of iterators, extracting the integer from the digits between them.
1973
> 5) There should be a global define for an interface to a readonly
1976
<p>Yes, there will be.
1980
> 6) Something is wrong with concatenation....
1982
<p>Hopefully I've fixed this now.
1986
> 8) A forward definition is missing in the h files
1988
<p>I'll check it out.
1992
<p>My understanding is that you have already found the answers to your
1995
<p>I hope this helps,
2002
Date: Wed, 20 Sep 2000 17:32:13 -0400
2003
Subject: Re: how to free an nsString::ToNewCString
2006
<pre class="email-quote">
2007
>What's the current approved way to free an nsString::ToNewCString?
2010
<p><span class="code">nsMemory::Free</span>
2018
<p>You use several <span class="code">NS_ConvertASCIItoUTF16("...").get()</span>, these should be
2020
NS_LITERAL_STRING("...").get()
2022
<p>Don't do this to the very first case where you aren't wrapping an actual literal string.
2023
The first instance would should exploit <span class="code">NS_LITERAL_STRING</span> technology as well,
2024
around the initial declarations of the strings ... probably want to do this with
2025
<span class="code">NS_NAMED_LITERAL_STRING</span>.
2031
Date: Thu, 12 Oct 2000 00:57:28 -0400
2032
Subject: string answers
2035
<div class="source-code">
2038
DoSomething( nsAWritableString& answer )
2042
nsXPIDLString registry_data;
2043
Fetch("key", getter_Shares(registry_data));
2045
nsLiteralString path(not_my_string);
2047
PRInt32 first_colon = path.FindChar(PRUnichar(':'));
2048
if ( first_colon != -1 )
2050
// convert ... extract path from |path|
2051
nsCOMPtr<nsILocalFile> localFile( do_CreateInstance(CID, &rv)
2056
localFile->SetPersistentDescriptor(NS_ConvertUTF16toUTF8(path));
2058
nsXPIDLString converted_path;
2059
localFile->GetUnicodePath(getter_Copies(converted_path));
2060
answer = converted_path.get();
2080
Date: Thu, 12 Oct 2000 02:03:49 -0400
2081
Subject: Re: and the answer is ...
2084
<p>You can see from the line of code that you're on, that this should
2085
have been fine. <span class="code">nsMemory::Alloc</span> would be asked to allocate a 1 byte
2086
object. But it failed trying to allocate that. Which suggests that
2087
the allocator was busy and non-reentrant and the debugger tried to
2090
<p>Of course, this doesn't solve your problem. Perhaps we need to go
2091
back to the idea of a function that returns a pointer to the first
2094
<div class="source-code">
2097
debug_string( const nsAReadableCString& aCString )
2099
nsReadingIterator<char> iter;
2100
aCString.BeginReading(iter);
2101
return aCString.IsEmpty() ? "" : iter.get();
2106
<p>This code should work regardless of what the allocator is doing. The
2107
downsides are (a) it only returns the first hunk of the string, in the
2108
case of a multi-fragment string; and (b) that hunk <strong>might</strong> not be
2119
Date: Thu, 12 Oct 2000 08:30:32 -0400
2120
Subject: Re: Self healing the cache :-)
2123
<p>At 3:04 PM -0400 10/11/00, Mike Shaver wrote:
2124
<pre class="email-quote">
2125
>NS_LITERAL_STRING(NS_XPCOM_SHUTDOWN_OBSERVER_ID);
2128
<p>Macro ugliness makes <span class="code">NS_LITERAL_STRING</span> inappropriate for use over
2129
other macros. In other words:
2131
<div class="source-code">
2133
NS_LITERAL_STRING("foo")
2137
<p>is <strong>good</strong>.
2139
<div class="source-code">
2142
NS_LITERAL_STRING(FOO)
2146
<p>is <strong>bad</strong>. Why? Because it turns into
2148
<div class="source-code">
2150
nsLiteralString(LFOO, sizeof(LFOO)...
2154
<p>and there is no <span class="code">LFOO</span>. Sorry. If you have to do this to a
2155
macro-ized string, do the magic by hand, e.g.,
2157
<div class="source-code">
2159
nsLiteralString(FOO, sizeof(FOO)/sizeof(PRUnichar)
2160
+ sizeof(PRUnichar('\0')))
2164
<p>or else if you don't care that <span class="code">nsLiteralString</span> will scan for the
2167
<div class="source-code">
2169
nsLiteralString(FOO)
2181
Date: Thu, 12 Oct 2000 08:36:14 -0400
2182
Subject: Re: Self healing the cache :-)
2185
<p>Actually, I'm not even sure you can do it by hand, since you didn't
2187
<div class="source-code">
2193
<p>and <strong>can't</strong> do that cross-platform. The other way around this is to
2194
define a global instead of a macro, that is, instead of saying
2196
<div class="source-code">
2202
<p>at the top of your file, say
2204
<div class="source-code">
2206
NS_NAMED_LITERAL_STRING(FOO, "foo")
2210
<p>or else, if the macro was used only in one spot ... perhaps you could
2211
just eliminate the macro in favor of <span class="code">NS_NAMED_LITERAL</span> in situ.
2213
<p>Arghh. In this case, you may be stuck with the extra work of
2214
<span class="code">AssignWithConversion</span>.
2222
Date: Sun, 3 Dec 2000 16:38:07 -0400
2223
Subject: Re: another copy_string question
2226
<pre class="email-quote">
2227
>Is there a way to tell, inside the write() sink, if one is in the
2228
>final hunk? I need to do some special processing at the end.
2231
<p>No, there isn't. But you could move such special processing into the
2232
destructor of the sink. Remember, the sink is passed by reference, so
2233
you can exactly control its lifetime.
2235
<div class="source-code">
2239
nsReadingIterator<PRUnichar> sourceStart = aStr.BeginReading();
2240
nsReadingIterator<PRUnichar> sourceEnd = aStr.EndReading();
2241
copy_string(sourceStart, sourceEnd, sink);
2242
// |sink| destructor executed here
2255
Date: Fri, 15 Dec 2000 20:02:08 -0400
2256
Subject: fragment of code
2259
<div class="source-code">
2261
nsPromiseFlatString flatKey(aReadable);
2274
Date: Tue, 16 Jan 2001 16:47:37 -0400
2275
Subject: Re: a few string questions...
2278
>I've accumulated a few questions I've been wanting to ask you, mostly
2279
>about string stuff. Nothing urgent, but I want to ask them before I
2280
>forget. So here goes...:
2282
>1) Is it acceptable to use nsLiteralCString or nsLiteralString on
2283
>something that's not a literal? This can be useful in some places,
2284
>for example, to convert a char* to PRUnichar*:
2286
>PRUnichar* new = ToNewUnicode(nsLiteralCString(myCharPtr));
2288
<p>This is explicitly allowed. That's why I'm proposing to change the
2289
names of those classes to <span class="code">nsLocal[C]String</span>.
2292
>2) Should nsString2x.h and nsString2x.cpp go away? They look like a
2293
>never-completed rewrite or something...
2295
<p>Yes. They should go away. They are uncompleted [old] bullshit,
2296
exactly as you diagnosed.
2298
<p>I'll look into the other two questions.
2306
Date: Thu, 1 Feb 2001 15:12:41 -0400
2307
Subject: Re: [Fwd: bad string, bad string]
2310
<p>We've been removing implicit conversion operators because they
2311
_always_ lead to trouble. Usually they make it harder to pick the
2312
right function when overloading is involved and in the past they have
2313
led to huge performance suckage because we ended up doing conversions
2314
when we didn't need to because the implicit operator made us pick the
2317
<p>It's borderline when the class implements something that is <strong>so</strong>
2318
close, as with a guaranteed flat string or an <span class="code">nsCOMPtr</span> ... but the
2319
general recommendation is to avoid implicit conversions.
2329
Date: Tue, 6 Feb 2001 18:52:23 -0400
2330
Subject: seeking review for bug #57087
2334
<a class="exact-uri" href="http://bugzilla.mozilla.org/show_bug.cgi?id=57087">http://bugzilla.mozilla.org/show_bug.cgi?id=57087</a>
2337
<a class="exact-uri" href="http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24576">http://bugzilla.mozilla.org/showattachment.cgi?attach_id=24576</a>
2339
<p>This patch is supposed to add the ability to define very long literal
2340
strings more easily by breaking lines, e.g.,
2342
<div class="source-code">
2344
NS_MULTILINE_LITERAL( NS_L("This is the start of a very long line")
2345
NS_L(" which actually continues across")
2346
NS_L(" a couple more.") )
2350
<p>The main danger in this scheme is callers who omit the inner <span class="code">NS_L</span>
2351
wrapping. Though I believe this will be caught at compile time as the
2352
wrong type initializer.
2354
<p>Seeking input from everybody, and waterson in particular.
2362
Date: Wed, 14 Feb 2001 16:09:10 -0400
2363
Subject: Re: Question...
2366
<p>There are some utilities in "xpcom/ds/nsReadableUtils.h". In
2367
particular, if you want to get back a new heap-allocated ASCII string
2368
with the minimal work, you would say
2370
<div class="source-code">
2372
PRUnichar* sourceChars = ...;
2374
char* destChars = ToNewCString(nsLiteralString(sourceChars));
2379
<p>It's more efficient if you happen to already know the length. If you
2380
don't, don't bother counting, that's what I'll do in the constructor
2381
for <span class="code">nsLiteralString</span>. If you do, then call like this
2383
<div class="source-code">
2385
destChars = ToNewCString( nsLiteralString(sourceChars, length) );
2389
<p>Other routines in that file will help you if, for instance, you wanted
2390
to translate into a buffer you had already allocated.
2400
Date: Fri, 23 Feb 2001 03:12:58 -0400
2401
Subject: string snippet
2404
<div class="source-code">
2410
nsReadingIterator<char> search_start;
2411
aInput.BeginReading(search_start);
2413
nsReadingIterator<char> search_end;
2414
aInput.EndReading(search_end);
2416
if ( FindCharInReadable(':', search_start, search_end) )
2419
return ToNewCString( Substring(aInput, search_start, search_end)
2432
Date: Wed, 7 Mar 2001 19:44:08 -0400
2433
Subject: string help
2436
<p>Here you go, Mike:
2438
http://scottcollins.net/journal/discussion/mjudge-scratch.cpp
2447
Date: Fri, 9 Mar 2001 20:56:07 -0400
2448
Subject: Re: string assertions
2451
<p>If you get an iterator into a string and you advance it all the way to
2452
the end of the string, and then <strong>keep</strong> trying to advance it, you hit
2453
this assert. This could happen, for example if you tried to copy 10
2454
characters out of a 9 character string. I've tried to make this
2455
impossible to get to. As far as I know, all my routines trim requests
2456
in advance of manipulating iterators. When you see this, you should
2457
get the stack. That will take you right to the bad spot.
2465
Date: Sat, 31 Mar 2001 11:04:03 -0400
2466
Subject: Re: Sun bustage and string advice
2469
<p>You do know you are comparing two pointers now? It seems unlikely
2470
those two pointers would ever be the same pointer. You probably want
2471
to say something like
2473
<div class="source-code">
2475
NS_LITERAL_STRING("foo").Equals(aTopic) // or
2477
NS_LITERAL_STRING("foo") == nsLiteralString(aTopic)
2481
<p>...so that you compare the <strong>contents</strong> of two strings. Right now,
2482
you're just testing to see if two pointers both point to the same
2483
location in memory. A lot of people make this mistake. I would like
2484
to make it obvious to people that comparing two pointers does not
2485
compare strings. Can you tell me what gave you that impression so
2486
that I can figure out how to better educate people not to do this? By
2487
the way, it's not that I don't <strong>want</strong> to make this compare two
2488
strings; it's that in C++, you can't override operations for built-in
2489
types. And pointers are built-in types. So I can't make
2490
<span class="code">operator==(const PRUnichar*, const PRUnichar*)</span> do anything different
2491
than it already does, which is the same thing it does for any other
2503
<!-- .................................................................End Matter -->