3
<title>Manipulating Strings - Untitled</title>
4
<meta http-equiv="Content-Type" content="text/html">
5
<meta name="description" content="Untitled">
6
<meta name="generator" content="makeinfo 4.11">
7
<link title="Top" rel="start" href="index.html#Top">
8
<link rel="up" href="Strings.html#Strings" title="Strings">
9
<link rel="prev" href="Comparing-Strings.html#Comparing-Strings" title="Comparing Strings">
10
<link rel="next" href="String-Conversions.html#String-Conversions" title="String Conversions">
11
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
12
<meta http-equiv="Content-Style-Type" content="text/css">
13
<style type="text/css"><!--
14
pre.display { font-family:inherit }
15
pre.format { font-family:inherit }
16
pre.smalldisplay { font-family:inherit; font-size:smaller }
17
pre.smallformat { font-family:inherit; font-size:smaller }
18
pre.smallexample { font-size:smaller }
19
pre.smalllisp { font-size:smaller }
20
span.sc { font-variant:small-caps }
21
span.roman { font-family:serif; font-weight:normal; }
22
span.sansserif { font-family:sans-serif; font-weight:normal; }
28
<a name="Manipulating-Strings"></a>
29
Next: <a rel="next" accesskey="n" href="String-Conversions.html#String-Conversions">String Conversions</a>,
30
Previous: <a rel="previous" accesskey="p" href="Comparing-Strings.html#Comparing-Strings">Comparing Strings</a>,
31
Up: <a rel="up" accesskey="u" href="Strings.html#Strings">Strings</a>
35
<h3 class="section">5.3 Manipulating Strings</h3>
37
<p>Octave supports a wide range of functions for manipulating strings.
38
Since a string is just a matrix, simple manipulations can be accomplished
39
using standard operators. The following example shows how to replace
40
all blank characters with underscores.
42
<pre class="example"> quote = ...
43
"First things first, but not necessarily in that order";
44
quote( quote == " " ) = "_"
46
First_things_first,_but_not_necessarily_in_that_order
48
<p>For more complex manipulations, such as searching, replacing, and
49
general regular expressions, the following functions come with Octave.
51
<p><a name="doc_002ddeblank"></a>
54
— Function File: <b>deblank</b> (<var>s</var>)<var><a name="index-deblank-271"></a></var><br>
55
<blockquote><p>Remove trailing blanks and nulls from <var>s</var>. If <var>s</var>
56
is a matrix, <var>deblank</var> trims each row to the length of longest
57
string. If <var>s</var> is a cell array, operate recursively on each
58
element of the cell array.
59
</p></blockquote></div>
61
<p><a name="doc_002dfindstr"></a>
64
— Function File: <b>findstr</b> (<var>s, t, overlap</var>)<var><a name="index-findstr-272"></a></var><br>
65
<blockquote><p>Return the vector of all positions in the longer of the two strings
66
<var>s</var> and <var>t</var> where an occurrence of the shorter of the two starts.
67
If the optional argument <var>overlap</var> is nonzero, the returned vector
68
can include overlapping positions (this is the default). For example,
70
<pre class="example"> findstr ("ababab", "a")
72
findstr ("abababa", "aba", 0)
77
<p><a name="doc_002dindex"></a>
80
— Function File: <b>index</b> (<var>s, t</var>)<var><a name="index-index-273"></a></var><br>
81
— Function File: <b>index</b> (<var>s, t, direction</var>)<var><a name="index-index-274"></a></var><br>
82
<blockquote><p>Return the position of the first occurrence of the string <var>t</var> in the
83
string <var>s</var>, or 0 if no occurrence is found. For example,
85
<pre class="example"> index ("Teststring", "t")
88
<p>If <var>direction</var> is ‘<samp><span class="samp">"first"</span></samp>’, return the first element found.
89
If <var>direction</var> is ‘<samp><span class="samp">"last"</span></samp>’, return the last element found.
90
The <code>rindex</code> function is equivalent to <code>index</code> with
91
<var>direction</var> set to ‘<samp><span class="samp">"last"</span></samp>’.
93
<p><strong>Caution:</strong> This function does not work for arrays of
98
<strong>See also:</strong> find, rindex.
99
</p></blockquote></div>
101
<p><a name="doc_002drindex"></a>
104
— Function File: <b>rindex</b> (<var>s, t</var>)<var><a name="index-rindex-275"></a></var><br>
105
<blockquote><p>Return the position of the last occurrence of the character string
106
<var>t</var> in the character string <var>s</var>, or 0 if no occurrence is
109
<pre class="example"> rindex ("Teststring", "t")
112
<p><strong>Caution:</strong> This function does not work for arrays of
117
<strong>See also:</strong> find, index.
118
</p></blockquote></div>
120
<p><a name="doc_002dstrfind"></a>
123
— Function File: <var>idx</var> = <b>strfind</b> (<var>str, pattern</var>)<var><a name="index-strfind-276"></a></var><br>
124
— Function File: <var>idx</var> = <b>strfind</b> (<var>cellstr, pattern</var>)<var><a name="index-strfind-277"></a></var><br>
125
<blockquote><p>Search for <var>pattern</var> in the string <var>str</var> and return the
126
starting index of every such occurrence in the vector <var>idx</var>.
127
If there is no such occurrence, or if <var>pattern</var> is longer
128
than <var>str</var>, then <var>idx</var> is the empty array <code>[]</code>.
130
<p>If the cell array of strings <var>cellstr</var> is specified instead of the
131
string <var>str</var>, then <var>idx</var> is a cell array of vectors, as specified
136
<strong>See also:</strong> findstr, strmatch, strcmp, strncmp, strcmpi, strncmpi.
137
</p></blockquote></div>
139
<p><a name="doc_002dstrmatch"></a>
142
— Function File: <b>strmatch</b> (<var>s, a, "exact"</var>)<var><a name="index-strmatch-278"></a></var><br>
143
<blockquote><p>Return indices of entries of <var>a</var> that match the string <var>s</var>.
144
The second argument <var>a</var> may be a string matrix or a cell array of
145
strings. If the third argument <code>"exact"</code> is not given, then
146
<var>s</var> only needs to match <var>a</var> up to the length of <var>s</var>. Nul
147
characters match blanks. Results are returned as a column vector.
148
</p></blockquote></div>
150
<p><a name="doc_002dstrtok"></a>
153
— Function File: [<var>tok</var>, <var>rem</var>] = <b>strtok</b> (<var>str, delim</var>)<var><a name="index-strtok-279"></a></var><br>
155
<p>Find all characters up to but not including the first character which
156
is in the string delim. If <var>rem</var> is requested, it contains the
157
remainder of the string, starting at the first deliminator. Leading
158
delimiters are ignored. If <var>delim</var> is not specified, space is assumed.
162
<p><a name="doc_002dsplit"></a>
165
— Function File: <b>split</b> (<var>s, t, n</var>)<var><a name="index-split-280"></a></var><br>
166
<blockquote><p>Divides the string <var>s</var> into pieces separated by <var>t</var>, returning
167
the result in a string array (padded with blanks to form a valid
168
matrix). If the optional input <var>n</var> is supplied, split <var>s</var>
169
into at most <var>n</var> different pieces.
173
<pre class="example"> split ("Test string", "t")
178
<pre class="example"> split ("Test string", "t", 2)
184
<p><a name="doc_002dstrrep"></a>
187
— Function File: <b>strrep</b> (<var>s, x, y</var>)<var><a name="index-strrep-281"></a></var><br>
188
<blockquote><p>Replaces all occurrences of the substring <var>x</var> of the string <var>s</var>
189
with the string <var>y</var>. For example,
191
<pre class="example"> strrep ("This is a test string", "is", "&%$")
192
"Th&%$ &%$ a test string"
196
<p><a name="doc_002dsubstr"></a>
199
— Function File: <b>substr</b> (<var>s, offset, len</var>)<var><a name="index-substr-282"></a></var><br>
200
<blockquote><p>Return the substring of <var>s</var> which starts at character number
201
<var>offset</var> and is <var>len</var> characters long.
203
<p>If <var>offset</var> is negative, extraction starts that far from the end of
204
the string. If <var>len</var> is omitted, the substring extends to the end
209
<pre class="example"> substr ("This is a test string", 6, 9)
212
<p>This function is patterned after AWK. You can get the same result by
213
<var>s</var><code> (</code><var>offset</var><code> : (</code><var>offset</var><code> + </code><var>len</var><code> - 1))</code>.
214
</p></blockquote></div>
216
<p><a name="doc_002dregexp"></a>
219
— Loadable Function: [<var>s</var>, <var>e</var>, <var>te</var>, <var>m</var>, <var>t</var>, <var>nm</var>] = <b>regexp</b> (<var>str, pat</var>)<var><a name="index-regexp-283"></a></var><br>
220
— Loadable Function: [<small class="dots">...</small>] = <b>regexp</b> (<var>str, pat, opts, <small class="dots">...</small></var>)<var><a name="index-regexp-284"></a></var><br>
222
<p>Regular expression string matching. Matches <var>pat</var> in <var>str</var> and
223
returns the position and matching substrings or empty values if there are
226
<p>The matched pattern <var>pat</var> can include any of the standard regex
227
operators, including:
230
<dt><code>.</code><dd>Match any character
231
<br><dt><code>* + ? {}</code><dd>Repetition operators, representing
233
<dt><code>*</code><dd>Match zero or more times
234
<br><dt><code>+</code><dd>Match one or more times
235
<br><dt><code>?</code><dd>Match zero or one times
236
<br><dt><code>{}</code><dd>Match range operator, which is of the form <code>{</code><var>n</var><code>}</code> to match exactly
237
<var>n</var> times, <code>{</code><var>m</var><code>,}</code> to match <var>m</var> or more times,
238
<code>{</code><var>m</var><code>,</code><var>n</var><code>}</code> to match between <var>m</var> and <var>n</var> times.
240
<br><dt><code>[...] [^...]</code><dd>List operators, where for example <code>[ab]c</code> matches <code>ac</code> and <code>bc</code>
241
<br><dt><code>()</code><dd>Grouping operator
242
<br><dt><code>|</code><dd>Alternation operator. Match one of a choice of regular expressions. The
243
alternatives must be delimited by the grouping operator <code>()</code> above
244
<br><dt><code>^ $</code><dd>Anchoring operator. <code>^</code> matches the start of the string <var>str</var> and
245
<code>$</code> the end
248
<p>In addition the following escaped characters have special meaning. It should
249
be noted that it is recommended to quote <var>pat</var> in single quotes rather
250
than double quotes, to avoid the escape sequences being interpreted by octave
251
before being passed to <code>regexp</code>.
254
<dt><code>\b</code><dd>Match a word boundary
255
<br><dt><code>\B</code><dd>Match within a word
256
<br><dt><code>\w</code><dd>Matches any word character
257
<br><dt><code>\W</code><dd>Matches any non word character
258
<br><dt><code>\<</code><dd>Matches the beginning of a word
259
<br><dt><code>\></code><dd>Matches the end of a word
260
<br><dt><code>\s</code><dd>Matches any whitespace character
261
<br><dt><code>\S</code><dd>Matches any non whitespace character
262
<br><dt><code>\d</code><dd>Matches any digit
263
<br><dt><code>\D</code><dd>Matches any non-digit
266
<p>The outputs of <code>regexp</code> by default are in the order as given below
269
<dt><var>s</var><dd>The start indices of each of the matching substrings
271
<br><dt><var>e</var><dd>The end indices of each matching substring
273
<br><dt><var>te</var><dd>The extents of each of the matched token surrounded by <code>(...)</code> in
276
<br><dt><var>m</var><dd>A cell array of the text of each match.
278
<br><dt><var>t</var><dd>A cell array of the text of each token matched.
280
<br><dt><var>nm</var><dd>A structure containing the text of each matched named token, with the name
281
being used as the fieldname. A named token is denoted as
282
<code>(?<name>...)</code>
285
<p>Particular output arguments or the order of the output arguments can be
286
selected by additional <var>opts</var> arguments. These are strings and the
287
correspondence between the output arguments and the optional argument
290
<p><table summary=""><tr align="left"><td valign="top" width="20%"></td><td valign="top" width="30%">'start' </td><td valign="top" width="30%"><var>s</var> </td><td valign="top" width="20%">
291
<br></td></tr><tr align="left"><td valign="top" width="20%"></td><td valign="top" width="30%">'end' </td><td valign="top" width="30%"><var>e</var> </td><td valign="top" width="20%">
292
<br></td></tr><tr align="left"><td valign="top" width="20%"></td><td valign="top" width="30%">'tokenExtents' </td><td valign="top" width="30%"><var>te</var> </td><td valign="top" width="20%">
293
<br></td></tr><tr align="left"><td valign="top" width="20%"></td><td valign="top" width="30%">'match' </td><td valign="top" width="30%"><var>m</var> </td><td valign="top" width="20%">
294
<br></td></tr><tr align="left"><td valign="top" width="20%"></td><td valign="top" width="30%">'tokens' </td><td valign="top" width="30%"><var>t</var> </td><td valign="top" width="20%">
295
<br></td></tr><tr align="left"><td valign="top" width="20%"></td><td valign="top" width="30%">'names' </td><td valign="top" width="30%"><var>nm</var> </td><td valign="top" width="20%">
296
<br></td></tr></table>
298
<p>A further optional argument is 'once', that limits the number of returned
299
matches to the first match. Additional arguments are
302
<dt>matchcase<dd>Make the matching case sensitive.
303
<br><dt>ignorecase<dd>Make the matching case insensitive.
304
<br><dt>stringanchors<dd>Match the anchor characters at the beginning and end of the string.
305
<br><dt>lineanchors<dd>Match the anchor characters at the beginning and end of the line.
306
<br><dt>dotall<dd>The character <code>.</code> matches the newline character.
307
<br><dt>dotexceptnewline<dd>The character <code>.</code> matches all but the newline character.
308
<br><dt>freespacing<dd>The pattern can include arbitrary whitespace and comments starting with
310
<br><dt>literalspacing<dd>The pattern is taken literally.
312
</p></blockquote></div>
314
<p><a name="doc_002dregexpi"></a>
317
— Loadable Function: [<var>s</var>, <var>e</var>, <var>te</var>, <var>m</var>, <var>t</var>, <var>nm</var>] = <b>regexpi</b> (<var>str, pat</var>)<var><a name="index-regexpi-285"></a></var><br>
318
— Loadable Function: [<small class="dots">...</small>] = <b>regexpi</b> (<var>str, pat, opts, <small class="dots">...</small></var>)<var><a name="index-regexpi-286"></a></var><br>
320
<p>Case insensitive regular expression string matching. Matches <var>pat</var> in
321
<var>str</var> and returns the position and matching substrings or empty values
322
if there are none. See <code>regexp</code> for more details
323
</p></blockquote></div>
325
<p><a name="doc_002dregexprep"></a>
328
— Loadable Function: <var>string</var> = <b>regexprep</b> (<var>string, pat, repstr, options</var>)<var><a name="index-regexprep-287"></a></var><br>
329
<blockquote><p>Replace matches of <var>pat</var> in <var>string</var> with <var>repstr</var>.
331
<p>The replacement can contain <code>$i</code>, which substitutes
332
for the ith set of parentheses in the match string. E.g.,
333
<pre class="example">
334
regexprep("Bill Dunn",'(\w+) (\w+)','$2, $1')
337
<p>returns "Dunn, Bill"
339
<p><var>options</var> may be zero or more of
341
<dt>‘<samp><span class="samp">once</span></samp>’<dd>Replace only the first occurrence of <var>pat</var> in the result.
343
<br><dt>‘<samp><span class="samp">warnings</span></samp>’<dd>This option is present for compatibility but is ignored.
345
<br><dt>‘<samp><span class="samp">ignorecase or matchcase</span></samp>’<dd>Ignore case for the pattern matching (see <code>regexpi</code>).
346
Alternatively, use (?i) or (?-i) in the pattern.
348
<br><dt>‘<samp><span class="samp">lineanchors and stringanchors</span></samp>’<dd>Whether characters ^ and $ match the beginning and ending of lines.
349
Alternatively, use (?m) or (?-m) in the pattern.
351
<br><dt>‘<samp><span class="samp">dotexceptnewline and dotall</span></samp>’<dd>Whether . matches newlines in the string.
352
Alternatively, use (?s) or (?-s) in the pattern.
354
<br><dt>‘<samp><span class="samp">freespacing or literalspacing</span></samp>’<dd>Whether whitespace and # comments can be used to make the regular expression more readable.
355
Alternatively, use (?x) or (?-x) in the pattern.
362
<strong>See also:</strong> regexp,regexpi.
363
</p></blockquote></div>