1
@c Copyright (C) 1996-2013 John W. Eaton
3
@c This file is part of Octave.
5
@c Octave is free software; you can redistribute it and/or modify it
6
@c under the terms of the GNU General Public License as published by the
7
@c Free Software Foundation; either version 3 of the License, or (at
8
@c your option) any later version.
10
@c Octave is distributed in the hope that it will be useful, but WITHOUT
11
@c ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
12
@c FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
15
@c You should have received a copy of the GNU General Public License
16
@c along with Octave; see the file COPYING. If not, see
17
@c <http://www.gnu.org/licenses/>.
22
@cindex character strings
26
A @dfn{string constant} consists of a sequence of characters enclosed in
27
either double-quote or single-quote marks. For example, both of the
38
represent the string whose contents are @samp{parrot}. Strings in
39
Octave can be of any length.
41
Since the single-quote mark is also used for the transpose operator
42
(@pxref{Arithmetic Ops}) but double-quote marks have no other purpose in Octave,
43
it is best to use double-quote marks to denote strings.
45
Strings can be concatenated using the notation for defining matrices. For
46
example, the expression
49
[ "foo" , "bar" , "baz" ]
53
produces the string whose contents are @samp{foobarbaz}. @xref{Numeric Data
54
Types}, for more information about creating matrices.
57
* Escape Sequences in String Constants::
61
* Manipulating Strings::
62
* String Conversions::
63
* Character Class Functions::
66
@node Escape Sequences in String Constants
67
@section Escape Sequences in String Constants
68
@cindex escape sequence notation
69
In double-quoted strings, the backslash character is used to introduce
70
@dfn{escape sequences} that represent other characters. For example,
71
@samp{\n} embeds a newline character in a double-quoted string and
72
@samp{\"} embeds a double quote character. In single-quoted strings, backslash
73
is not a special character. Here is an example showing the difference:
84
Here is a table of all the escape sequences used in Octave (within
85
double quoted strings). They are the same as those used in the C
90
Represents a literal backslash, @samp{\}.
93
Represents a literal double-quote character, @samp{"}.
96
Represents a literal single-quote character, @samp{'}.
99
Represents the null character, control-@@, ASCII code 0.
102
Represents the ``alert'' character, control-g, ASCII code 7.
105
Represents a backspace, control-h, ASCII code 8.
108
Represents a formfeed, control-l, ASCII code 12.
111
Represents a newline, control-j, ASCII code 10.
114
Represents a carriage return, control-m, ASCII code 13.
117
Represents a horizontal tab, control-i, ASCII code 9.
120
Represents a vertical tab, control-k, ASCII code 11.
123
Represents the octal value @var{nnn}, where @var{nnn} are one to three
124
digits between 0 and 7. For example, the code for the ASCII ESC
125
(escape) character is @samp{\033}.
127
@item \x@var{hh}@dots{}
128
Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal
129
digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or
130
@samp{a} through @samp{f}). Like the same construct in @sc{ansi} C,
131
the escape sequence continues until the first non-hexadecimal digit is seen.
132
However, using more than two hexadecimal digits produces undefined results.
135
In a single-quoted string there is only one escape sequence: you may insert a
136
single quote character using two single quote characters in succession. For
142
@result{} I can't escape
146
In scripts the two different string types can be distinguished if necessary
147
by using @code{is_dq_string} and @code{is_sq_string}.
149
@DOCSTRING(is_dq_string)
151
@DOCSTRING(is_sq_string)
153
@node Character Arrays
154
@section Character Arrays
156
The string representation used by Octave is an array of characters, so
157
internally the string @nospell{@qcode{"dddddddddd"}} is actually a row vector
158
of length 10 containing the value 100 in all places (100 is the ASCII code of
159
@qcode{"d"}). This lends itself to the obvious generalization to character
160
matrices. Using a matrix of characters, it is possible to represent a
161
collection of same-length strings in one variable. The convention used in
162
Octave is that each row in a character matrix is a separate string, but letting
163
each column represent a string is equally possible.
165
The easiest way to create a character matrix is to put several strings
166
together into a matrix.
169
collection = [ "String #1"; "String #2" ];
173
This creates a 2-by-9 character matrix.
175
The function @code{ischar} can be used to test if an object is a character
180
To test if an object is a string (i.e., a character vector and not a character
181
matrix) you can use the @code{ischar} function in combination with the
182
@code{isvector} function as in the following example:
189
ischar (collection) && isvector (collection)
192
ischar ("my string") && isvector ("my string")
197
One relevant question is, what happens when a character matrix is
198
created from strings of different length. The answer is that Octave
199
puts blank characters at the end of strings shorter than the longest
200
string. It is possible to use a different character than the
201
blank character using the @code{string_fill_char} function.
203
@DOCSTRING(string_fill_char)
205
This shows a problem with character matrices. It simply isn't possible to
206
represent strings of different lengths. The solution is to use a cell array of
207
strings, which is described in @ref{Cell Arrays of Strings}.
209
@node Creating Strings
210
@section Creating Strings
212
The easiest way to create a string is, as illustrated in the introduction,
213
to enclose a text in double-quotes or single-quotes. It is however
214
possible to create a string without actually writing a text. The
215
function @code{blanks} creates a string of a given length consisting
216
only of blank characters (ASCII code 32).
221
* Concatenating Strings::
222
* Converting Numerical Data to Strings::
225
@node Concatenating Strings
226
@subsection Concatenating Strings
228
Strings can be concatenated using matrix notation
229
(@pxref{Strings}, @ref{Character Arrays}) which is often the most natural
234
fullname = [fname ".txt"];
235
email = ["<" user "@@" domain ">"];
240
In each case it is easy to see what the final string will look like. This
241
method is also the most efficient. When using matrix concatenation the parser
242
immediately begins joining the strings without having to process
243
the overhead of a function call and the input validation of the associated
246
Nevertheless, there are several other functions for concatenating string
247
objects which can be useful in specific circumstances: @code{char},
248
@code{strvcat}, @code{strcat}, and @code{cstrcat}. Finally, the general
249
purpose concatenation functions can be used: see @ref{XREFcat,,cat},
250
@ref{XREFhorzcat,,horzcat}, and @ref{XREFvertcat,,vertcat}.
253
@item All string concatenation functions except @code{cstrcat}
254
convert numerical input into character data by taking the corresponding ASCII
255
character for each element, as in the following example:
259
char ([98, 97, 110, 97, 110, 97])
265
@code{char} and @code{strvcat}
266
concatenate vertically, while @code{strcat} and @code{cstrcat} concatenate
267
horizontally. For example:
271
char ("an apple", "two pears")
277
strcat ("oc", "tave", " is", " good", " for you")
278
@result{} octave is good for you
282
@item @code{char} generates an empty row in the output
283
for each empty string in the input. @code{strvcat}, on the other hand,
284
eliminates empty strings.
288
char ("orange", "green", "", "red")
296
strvcat ("orange", "green", "", "red")
303
@item All string concatenation functions except @code{cstrcat} also accept cell
304
array data (@pxref{Cell Arrays}). @code{char} and
305
@code{strvcat} convert cell arrays into character arrays, while @code{strcat}
306
concatenates within the cells of the cell arrays:
310
char (@{"red", "green", "", "blue"@})
318
strcat (@{"abc"; "ghi"@}, @{"def"; "jkl"@})
327
@item @code{strcat} removes trailing white space in the arguments (except
328
within cell arrays), while @code{cstrcat} leaves white space untouched. Both
329
kinds of behavior can be useful as can be seen in the examples:
333
strcat (["dir1";"directory2"], ["/";"/"], ["file1";"file2"])
339
cstrcat (["thirteen apples"; "a banana"], [" 5$";" 1$"])
340
@result{} thirteen apples 5$
345
Note that in the above example for @code{cstrcat}, the white space originates
346
from the internal representation of the strings in a string array
347
(@pxref{Character Arrays}).
358
@node Converting Numerical Data to Strings
359
@subsection Converting Numerical Data to Strings
360
Apart from the string concatenation functions (@pxref{Concatenating Strings})
361
which cast numerical data to the corresponding ASCII characters, there are
362
several functions that format numerical data as strings. @code{mat2str} and
363
@code{num2str} convert real or complex matrices, while @code{int2str} converts
364
integer matrices. @code{int2str} takes the real part of complex values and
365
round fractional values to integer. A more flexible way to format numerical
366
data as strings is the @code{sprintf} function (@pxref{Formatted Output},
367
@ref{XREFsprintf,,sprintf}).
375
@node Comparing Strings
376
@section Comparing Strings
378
Since a string is a character array, comparisons between strings work
379
element by element as the following example shows:
383
GNU = "GNU's Not UNIX";
384
spaces = (GNU == " ")
386
0 0 0 0 0 1 0 0 0 1 0 0 0 0
390
@noindent To determine if two strings are identical it is necessary to use the
391
@code{strcmp} function. It compares complete strings and is case
392
sensitive. @code{strncmp} compares only the first @code{N} characters (with
393
@code{N} given as a parameter). @code{strcmpi} and @code{strncmpi} are the
394
corresponding functions for case-insensitive comparison.
404
@DOCSTRING(validatestring)
406
@node Manipulating Strings
407
@section Manipulating Strings
409
Octave supports a wide range of functions for manipulating strings.
410
Since a string is just a matrix, simple manipulations can be accomplished
411
using standard operators. The following example shows how to replace
412
all blank characters with underscores.
417
"First things first, but not necessarily in that order";
418
quote( quote == " " ) = "_"
420
First_things_first,_but_not_necessarily_in_that_order
424
For more complex manipulations, such as searching, replacing, and
425
general regular expressions, the following functions come with Octave.
451
@DOCSTRING(ostrsplit)
463
@DOCSTRING(regexprep)
465
@DOCSTRING(regexptranslate)
469
@node String Conversions
470
@section String Conversions
472
Octave supports various kinds of conversions between strings and
473
numbers. As an example, it is possible to convert a string containing
474
a hexadecimal number to a floating point number.
499
@DOCSTRING(str2double)
511
@DOCSTRING(do_string_escapes)
513
@DOCSTRING(undo_string_escapes)
515
@node Character Class Functions
516
@section Character Class Functions
518
Octave also provides the following character class test functions
519
patterned after the functions in the standard C library. They all
520
operate on string arrays and return matrices of zeros and ones.
521
Elements that are nonzero indicate that the condition was true for the
522
corresponding character in the string array. For example:
526
isalpha ("!Q@@WERT^Y&")
527
@result{} [ 0, 1, 0, 1, 1, 1, 1, 0, 1, 0 ]
557
@DOCSTRING(isstrprop)