~ubuntu-branches/ubuntu/lucid/commons-httpclient/lucid

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
<?xml version="1.0" encoding="ISO-8859-1"?>

<document>

  <properties>
    <title>HttpClient Tutorial</title>
    <author email="adrian@ephox.com">Adrian Sutton</author>
    <revision>$Id: tutorial.xml,v 1.2.2.3 2004/02/23 23:09:05 olegk Exp $</revision>
  </properties>

  <body>
    <section name="Overview">
      <p>This tutorial is designed to provide a basic overview of how to use
        <em>HttpClient</em>.  When you have completed the tutorial you will have written
        a simple application that downloads a page using <em>HttpClient</em>.</p>

      <p>It is assumed that you have an understanding of how to program in
      Java and are familiar with the development environment you are using.</p>
    </section>

    <section name="Getting Ready">
      <p>The first thing you need to do is get a copy of <em>HttpClient</em> and it's
      dependencies.  Currently the only required dependency is
      <a href="/commons/logging/">commons-logging</a>.  This tutorial was
      written for <em>HttpClient</em> 2.0 and at a minimum requires 2.0 Alpha
      2.  You will also need JDK 1.2.2 or above.</p>

    <p>Once you've downloaded <em>HttpClient</em> and commons-logging you'll need to
      put them on your classpath.  There is also an optional dependency on JSSE
      which is required for HTTPS connections;  this is not required for this
      tutorial.</p>
    </section>

    <section name="Concepts">
      <p>The general process for using <em>HttpClient</em> consists of a number of
      steps:</p>

      <ol>
        <li>Create an instance of <code>HttpClient</code>.</li>
        <li>Create an instance of one of the methods (GetMethod in this
        case).  The URL to connect to is passed in to the the method
        constructor.</li>
        <li>Tell <code>HttpClient</code> to execute the method.</li>
        <li>Read the response.</li>
        <li>Release the connection.</li>
        <li>Deal with the response.</li>
      </ol>

      <p>We'll cover how to perform each of these steps below.  Notice that we
      go through the entire process regardless of whether the server returned
      an error or not.  This is important because HTTP 1.1 allows multiple
      requests to use the same connection by simply sending the requests one
      after the other.  Obviously, if we don't read the entire response to
      the first request, the left over data will get in the way of the second
      response.  <em>HttpClient</em> tries to handle this but to avoid problems it is
      important to always read the entire response and release the connection.</p>

      <div style="font-style: italic; border: 1px solid #888; margin-left: 7px; margin-right: 7px; margin-top: 1em; margin-bottom: 1px;">
        <p>
          It is important to always read the entire
          response and release the connection regardless of whether the server
          returned an error or not.
        </p>
      </div>
    </section>

    <section name="Instantiating HttpClient">
      <p>The no argument constructor for <code>HttpClient</code> provides a good set of
      defaults for most situations so that is what we'll use.</p>

      <source>HttpClient client = new HttpClient();</source>
    </section>

    <section name="Creating a Method">
      <p>The various methods defined by the HTTP specification correspond to
        the various classes in <em>HttpClient</em> which implement the HttpMethod
      interface.  These classes are all found in the package
      <code>org.apache.commons.httpclient.methods</code>.</p>

      <p>We will be using the Get method which is a simple method that simply
      takes a URL and gets the document the URL points to.</p>

      <source>HttpMethod method = new GetMethod("http://www.apache.org/");</source>
    </section>

    <section name="Execute the Method">
      <p>The actual execution of the method is performed by calling
      <code>executeMethod</code> on the client and passing in the method to
      execute.  Since networks connections are unreliable, we also need to deal
      with any errors that occur.</p>

      <p>There are two kinds of exceptions that could be thrown by
      executeMethod, <code>HttpRecoverableException</code> and
      <code>IOException</code>.</p>
    
        <subsection name="HttpRecoverableException">
          <p>A HttpRecoverableException is thrown when an error occurs that is
          likely to be a once-off problem.  Usually the request will succeed on
          a second attempt, so retrying the connection is generally
          recommended.  Note that HttpRecoverableException actually extends
          IOException so you can just ignore it and catch the IOException if
          your application does not retry the request.</p>
        </subsection>
          
        <subsection name="IOException">
          <p>An IOException is thrown when the request cannot be sent at all
          and retrying the connection is also likely to fail.  This may be
          caused by a number of situations including the server being down,
          inability to resolve the domain name or the server refusing the
          connection.</p>
        </subsection>

        <p>The other useful piece of information is the status code that is
        returned by the server.  This code is returned by executeMethod as an
        int and can be used to determine if the request was successful or not
        and can sometimes indicate that further action is required by the
        client such as providing authentication credentials.</p>

        <source><![CDATA[
          int statusCode = -1;
          // We will retry up to 3 times.
          for (int attempt = 0; statusCode == -1 && attempt < 3; attempt++) {
              try {
                  // execute the method.
                  statusCode = client.executeMethod(method);
              } catch (HttpRecoverableException e) {
                  System.err.println("A recoverable exception occurred,
                  retrying.  " + e.getMessage());
              } catch (IOException e) {
                  System.err.println("Failed to download file.");
                  e.printStackTrace();
                  System.exit(-1);
              }
          }
          // Check that we didn't run out of retries.
          if (statusCode == -1) {
              System.err.println("Failed to recover from exception.");
              System.exit(-2);
          }]]>
        </source>
    </section> 

    <section name="Read the Response">
      <p>It is vital that the response body is always read regardless of the
      status returned by the server.  There are three ways to do this:</p>

      <ul>
        <li>Call <code>method.getResponseBody()</code>.  This will return a
        byte array containing the data in the response body.</li>
        <li>Call <code>method.getResponseBodyAsString()</code>.  This will
        return a String containing the response body.  Be warned though that
        the conversion from bytes to a String is done using the default
        encoding so this method may not be portable across all platforms.</li>
        <li>Call <code>method.getResponseBodyAsStream()</code> and read the
        entire contents of the stream then call <code>stream.close()</code>.
        This method is best if it is possible for a lot of data to be received
        as it can be buffered to a file or processed as it is read.  Be sure to
        always read the entirety of the data and call close on the stream.</li>
      </ul>

      <p>For this tutorial we will use <code>getResponseBody()</code> for simplicity.</p>

      <source>byte[] responseBody = method.getResponseBody();</source>
    </section>

    <section name="Release the Connection">
      <p>This is a crucial step to keep things flowing.  We must tell
        <em>HttpClient</em> that we are done with the connection and that it can now be
        reused.  Without doing this <em>HttpClient</em> will wait indefinitely for a
        connection to free up so that it can be reused.</p>

      <source>method.releaseConnection();</source>
    </section>

    <section name="Deal with the Repsonse">
      <p>We've now completed our interaction with <em>HttpClient</em> and can just
      concentrate on doing what we need to do with the data.  In our case,
      we'll just print it out to the console.</p>

      <p>It's worth noting that if you were retrieving the response as a stream
      and processing it as it is read, this step would actually be combined
      with reading the connection, and when you'd finished processing all the
      data, you'd then close the input stream and release the connection.</p>

      <p>Note: We should pay attention to character encodings here instead of
      just using the system default.</p>

      <source>System.err.println(new String(responseBody));</source>
    </section>

    <section name="Final Source Code">
      <p>When we put all of that together plus a little bit of glue code we get
      the program below.</p>

      <source><![CDATA[
        import org.apache.commons.httpclient.*;
        import org.apache.commons.httpclient.methods.*;
        import java.io.*;

        public class HttpClientTutorial {
          
          private static String url = "http://www.apache.org/";

          public static void main(String[] args) {
            // Create an instance of HttpClient.
            HttpClient client = new HttpClient();

            // Create a method instance.
            HttpMethod method = new GetMethod(url);

            // Execute the method.
            int statusCode = -1;
            // We will retry up to 3 times.
            for (int attempt = 0; statusCode == -1 && attempt < 3; attempt++) {
              try {
                // execute the method.
                statusCode = client.executeMethod(method);
              } catch (HttpRecoverableException e) {
                System.err.println(
                  "A recoverable exception occurred, retrying." + 
                  e.getMessage());
              } catch (IOException e) {
                System.err.println("Failed to download file.");
                e.printStackTrace();
                System.exit(-1);
              }
            }
            // Check that we didn't run out of retries.
            if (statusCode == -1) {
              System.err.println("Failed to recover from exception.");
              System.exit(-2);
            }

            // Read the response body.
            byte[] responseBody = method.getResponseBody();

            // Release the connection.
            method.releaseConnection();

            // Deal with the response.
            // Use caution: ensure correct character encoding and is not binary data
            System.err.println(new String(responseBody));
          }
        }
      ]]></source>
    </section>
  </body>
</document>