Class HttpRequest

java.lang.Object
sunlabs.brazil.util.http.HttpRequest

public class HttpRequest extends Object
Sends an HTTP request to some target host and gets the answer back. Similar to the URLConnection class.

Caches connections to hosts, and reuses them if possible. Talks HTTP/1.1 to the hosts, in order to keep alive connections as much as possible.

The sequence of events for using an HttpRequest is similar to how URLConnection is used:

  1. A new HttpRequest object is constructed.
  2. The setup parameters are modified:
  3. The host (or proxy) is contacted and the HTTP request is issued:
  4. The response headers and body are examined:
  5. The connection is closed:

In the common case, all the setup parameters are initialized to sensible values and won't need to be modified. Most users will only need to construct a new HttpRequest object and then call getInputStream to read the contents. The rest of the member variables and methods are only needed for advanced behavior.

The HttpRequest class is intended to be a replacement for the URLConnection class. It operates at a lower level and makes fewer decisions on behavior. Some differences between the HttpRequest class and the URLConnection class follow:

  • there are no undocumented global variables (specified in System.getProperties) that modify the behavior of HttpRequest.
  • HttpRequest does not automatically follow redirects.
  • HttpRequest does not turn HTTP responses with a status code other than "200 OK" into IOExceptions. Sometimes it may be necessary and even quite useful to examine the results of an "unsuccessful" HTTP request.
  • HttpRequest issues HTTP/1.1 requests and handles HTTP/0.9, HTTP/1.0, and HTTP/1.1 responses.
  • the URLConnection class leaks open sockets if there is an error reading the response or if the target does not use Keep-Alive, and depends upon the garabge collector to close and release the open socket in these cases, which is unreliable because it may lead to intermittently running out of sockets if the garbage collector doesn't run often enough.
  • If the user doesn't read all the data from an URLConnection, there are bugs in its implementation (as of JDK1.2) that may cause the program to block forever and/or read an insufficient amount of data before trying to reuse the underlying socket.

A number of the fields in the HttpRequest object are public, by design. Most of the methods mentioned above are convenience methods; the underlying data fields are meant to be accessed for more complicated operations, such as changing the socket factory or accessing the raw HTTP response line. Note however, that the order of the methods described above is important. For instance, the user cannot examine the response headers (by calling getResponseHeader or by examining the variable responseHeaders) without first having connected to the host.

However, if the user wants to modify the default behavior, the HttpRequest uses the value of a number of variables and automatically sets some HTTP headers when sending the request. The user can change these settings up until the time connect is called, as follows:

variable version
By default, the HttpRequest issues HTTP/1.1 requests. The user can set version to change this to HTTP/1.0.
variable method
If method is null (the default), the HttpRequest decides what the HTTP request method should be as follows: If the user has called getOutputStream, then the method will be "POST", otherwise the method will be "GET".
variable proxyHost
If the proxy host is specified, the HTTP request will be sent via the specified proxy:
  • connect opens a connection to the proxy.
  • uses the "Proxy-Connection" header to keep alive the connection.
  • sends a fully qualified URL in the request line, for example "http://www.foo.com/index.html". The fully qualified URL tells the proxy to forward the request to the specified host.
Otherwise, the HTTP request will go directly to the host:
  • connect opens a connection to the remote host.
  • uses the "Connection" header to keep alive the connection.
  • sends a host-relative URL in the request line, for example "/index.html". The relative URL is derived from the fully qualified URL used to construct this HttpRequest.
header "Connection" or "Proxy-Connection"
The HttpRequest sets the appropriate connection header to "Keep-Alive" to keep alive the connection to the host or proxy (respectively). By setting the appropriate connection header, the user can control whether the HttpRequest tries to use Keep-Alives.
header "Host"
The HTTP/1.1 protocol requires that the "Host" header be set to the name of the machine being contacted. By default, this is derived from the URL used to construct the HttpRequest, and is set automatically if the user does not set it.
header "Content-Length"
If the user calls getOutputStream and writes some data to it, the "Content-Length" header will be set to the amount of data that has been written at the time that connect is called.

Once all data has been read from the remote host, the underlying socket may be automatically recycled and used again for subsequent requests to the same remote host. If the user is not planning on reading all the data from the remote host, the user should call close to release the socket. Although it happens under the covers, the user should be aware that if an IOException occurs or once data has been read normally from the remote host, close is called automatically. This is to ensure that the minimal number of sockets are left open at any time.

The input stream that getInputStream provides automatically hides whether the remote host is providing HTTP/1.1 "chunked" encoding or regular streaming data. The user can simply read until reaching the end of the input stream, which signifies that all the available data from this request has been read. If reading from a "chunked" source, the data is automatically de-chunked as it is presented to the user. Currently, no access is provided to the underlying raw input stream.

Version:
2.7
Author:
Colin Stevens (colin.stevens@sun.com)
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected boolean
     
    static String
    The default HTTP version string to send to the remote host when issuing requests.
    static String
    The default proxy host for HTTP requests.
    static int
    The default proxy port for HTTP requests.
    static boolean
    setting this to "true" causing all http headers to be printed on the standard error stream; useful for debugging client/server interactions.
    static int
    Timeout (in msec) to drain an input stream that has been closed before the entire HTTP response has been read.
    The host extracted from the URL used to construct this HttpRequest.
    static int
    Maximum length of a line in the HTTP response headers (sanity check).
    The HTTP method, such as "GET", "POST", or "HEAD".
    The cache of idle sockets.
    int
    The port extracted from the URL used to construct this HttpRequest.
    If non-null, sends this HTTP request via the specified proxy host and port.
    int
    The proxy port.
    The headers for the HTTP request.
    The headers that were present in the HTTP response.
    An artifact of HTTP/1.1 chunked encoding.
    The factory for constructing new Sockets objects used to connect to remote hosts when issuing HTTP requests.
    The status line from the HTTP response.
    The URL used to construct this HttpRequest.
    The HTTP version string.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.
    Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    addHeaders(String tokens, Properties props)
    Convenience method for adding request headers by looking them up in a properties object.
    void
    Gracefully closes this HTTP request when user is done with it.
    void
    Connect to the target host (or proxy), send the request, and read the response headers.
    void
    Interrupts this HTTP request.
    Return the content as a string.
    getContent(String encoding)
    Get the content as a string.
    int
    Convenience method to get the "Content-Length" header from the HTTP response.
     
    Gets an input stream that can be used to read the body of the HTTP response.
    Gets an output stream that can be used for uploading data to the host.
    int
    Gets the HTTP response status code.
    Gets the value associated with the given case-insensitive header name from the HTTP response.
    static void
    main(String[] args)
    Grab http document(s) and save them in the filesystem.
    static void
    removePointToPointHeaders(MimeHeaders headers, boolean response)
    Removes all the point-to-point (hop-by-hop) headers from the given mime headers.
    void
    setMethod(String method)
    Sets the HTTP method to the specified value.
    void
    setProxy(String proxyHost, int proxyPort)
    Sets the proxy for this request.
    void
    Sets a request header in the HTTP request that will be issued.

    Methods inherited from class Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DRAIN_TIMEOUT

      public static int DRAIN_TIMEOUT
      Timeout (in msec) to drain an input stream that has been closed before the entire HTTP response has been read.

      If the user closes the HttpRequest before reading all of the data, but the remote host has agreed to keep this socket alive, we need to read and discard the rest of the response before issuing a new request. If it takes longer than DRAIN_TIMEOUT to read and discard the data, we will just forcefully close the connection to the remote host rather than waiting to read any more.

      Default value is 10000.

    • LINE_LIMIT

      public static int LINE_LIMIT
      Maximum length of a line in the HTTP response headers (sanity check).

      If an HTTP response line is longer than this, the response is considered to be malformed.

      Default value is 1000.

    • defaultHTTPVersion

      public static String defaultHTTPVersion
      The default HTTP version string to send to the remote host when issuing requests.

      The default value can be overridden on a per-request basis by setting the version instance variable.

      Default value is "HTTP/1.1".

      See Also:
    • defaultProxyHost

      public static String defaultProxyHost
      The default proxy host for HTTP requests. If non-null, then all new HTTP requests will be sent via this proxy. If null, then all new HTTP requests are sent directly to the host specified when the HttpRequest object was constructed.

      The default value can be overridden on a per-request basis by calling the setProxy method or setting the proxyHost instance variables.

      Default value is null.

      See Also:
    • defaultProxyPort

      public static int defaultProxyPort
      The default proxy port for HTTP requests.

      Default value is 80.

      See Also:
    • socketFactory

      public static SocketFactory socketFactory
      The factory for constructing new Sockets objects used to connect to remote hosts when issuing HTTP requests. The user can set this to provide a new type of socket, such as SSL sockets.

      Default value is null, which signifies plain sockets.

    • pool

      public static HttpSocketPool pool
      The cache of idle sockets. Once a request has been handled, the now-idle socket can be remembered and reused later if another HTTP request is made to the same remote host.
    • url

      public URL url
      The URL used to construct this HttpRequest.
    • host

      public String host
      The host extracted from the URL used to construct this HttpRequest.
      See Also:
    • port

      public int port
      The port extracted from the URL used to construct this HttpRequest.
      See Also:
    • proxyHost

      public String proxyHost
      If non-null, sends this HTTP request via the specified proxy host and port.

      Initialized from defaultProxyHost, but may be changed by the user at any time up until the HTTP request is actually sent.

      See Also:
    • proxyPort

      public int proxyPort
      The proxy port.
      See Also:
    • connected

      protected boolean connected
    • method

      public String method
      The HTTP method, such as "GET", "POST", or "HEAD".

      May be set by the user at any time up until the HTTP request is actually sent.

    • version

      public String version
      The HTTP version string.

      Initialized from defaultHTTPVersion, but may be changed by the user at any time up until the HTTP request is actually sent.

    • requestHeaders

      public MimeHeaders requestHeaders
      The headers for the HTTP request. All of these headers will be sent when the connection is actually made.
    • displayAllHeaders

      public static boolean displayAllHeaders
      setting this to "true" causing all http headers to be printed on the standard error stream; useful for debugging client/server interactions.
    • status

      public String status
      The status line from the HTTP response. This field is not valid until after connect has been called and the HTTP response has been read.
    • responseHeaders

      public MimeHeaders responseHeaders
      The headers that were present in the HTTP response. This field is not valid until after connect has been called and the HTTP response has been read.
    • responseTrailers

      public MimeHeaders responseTrailers
      An artifact of HTTP/1.1 chunked encoding. At the end of an HTTP/1.1 chunked response, there may be more MimeHeaders. It is only possible to access these MimeHeaders after all the data from the input stream returned by getInputStream has been read. At that point, this field will automatically be initialized to the set of any headers that were found. If not reading from an HTTP/1.1 chunked source, then this field is irrelevant and will remain null.
  • Constructor Details

    • HttpRequest

      public HttpRequest(URL url)
      Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.

      The host specified by the URL is not contacted at this time.

      Parameters:
      url - A fully qualified "http:" URL.
      Throws:
      IllegalArgumentException - if url is not an "http:" URL.
    • HttpRequest

      public HttpRequest(String url)
      Creates a new HttpRequest object that will send an HTTP request to fetch the resource represented by the URL.

      The host specified by the URL is not contacted at this time.

      Parameters:
      url - A string representing a fully qualified "http:" URL.
      Throws:
      IllegalArgumentException - if url is not a well-formed "http:" URL.
  • Method Details

    • setMethod

      public void setMethod(String method)
      Sets the HTTP method to the specified value. Some of the normal HTTP methods are "GET", "POST", "HEAD", "PUT", "DELETE", but the user can set the method to any value desired.

      If this method is called, it must be called before connect is called. Otherwise it will have no effect.

      Parameters:
      method - The string for the HTTP method, or null to allow this HttpRequest to pick the method for itself.
    • setProxy

      public void setProxy(String proxyHost, int proxyPort)
      Sets the proxy for this request. The HTTP proxy request will be sent to the specified proxy host.

      If this method is called, it must be called before connect is called. Otherwise it will have no effect.

      Parameters:
      proxyHost - The proxy that will handle the request, or null to not use a proxy.
      proxyPort - The port on the proxy, for the proxy request. Ignored if proxyHost is null.
    • setRequestHeader

      public void setRequestHeader(String key, String value)
      Sets a request header in the HTTP request that will be issued. In order to do fancier things like appending a value to an existing request header, the user may directly access the requestHeaders variable.

      If this method is called, it must be called before connect is called. Otherwise it will have no effect.

      Parameters:
      key - The header name.
      value - The value for the request header.
      See Also:
    • getOutputStream

      public OutputStream getOutputStream() throws IOException
      Gets an output stream that can be used for uploading data to the host.

      If this method is called, it must be called before connect is called. Otherwise it will have no effect.

      Currently the implementation is not as good as it could be. The user should avoid uploading huge amounts of data, for some definition of huge.

      Throws:
      IOException
    • connect

      public void connect() throws UnknownHostException, IOException
      Connect to the target host (or proxy), send the request, and read the response headers. Any setup routines must be called before the call to this method, and routines to examine the result must be called after this method.

      Throws:
      UnknownHostException - if the target host (or proxy) could not be contacted.
      IOException - if there is a problem writing the HTTP request or reading the HTTP response headers.
    • getInputStream

      public HttpInputStream getInputStream() throws IOException
      Gets an input stream that can be used to read the body of the HTTP response. Unlike the other convenience methods for accessing the HTTP response, this one automatically connects to the target host if not already connected.

      The input stream that getInputStream provides automatically hides the differences between "Content-Length", no "Content-Length", and "chunked" for HTTP/1.0 and HTTP/1.1 responses. In all cases, the user can simply read until reaching the end of the input stream, which signifies that all the available data from this request has been read. (If reading from a "chunked" source, the data is automatically de-chunked as it is presented to the user. There is no way to access the raw underlying stream that contains the HTTP/1.1 chunking packets.)

      Throws:
      IOException - if there is problem connecting to the target.
      See Also:
    • close

      public void close()
      Gracefully closes this HTTP request when user is done with it.

      The user can either call this method or close on the input stream obtained from the getInputStream method -- the results are the same.

      When all the response data is read from the input stream, the input stream is automatically closed (recycled). If the user is not going to read all the response data from input stream, the user must call close to release the resources associated with the open request. Otherwise the program may consume all available sockets, waiting forever for the user to finish reading.

      Note that the input stream is automatically closed if the input stream throws an exception while reading.

      In order to interrupt a pending I/O operation in another thread (for example, to stop a request that is taking too long), the user should call disconnect or interrupt the blocked thread. The user should not call close in this case because close will not interrupt the pending I/O operation.

      Closing the request multiple times is allowed.

      In order to make sure that open sockets are not left lying around the user should use code similar to the following:

      OutputStream out = ...
      HttpRequest http = new HttpRequest("http://bob.com/index.html");
      try {
          HttpInputStream in = http.getInputStream();
          in.copyTo(out);
      } finally {
          // Copying to "out" could have failed.  Close "http" in case
          // not all the data has been read from it yet.
          http.close();
      }
      
    • disconnect

      public void disconnect()
      Interrupts this HTTP request. Can be used to halt an in-progress HTTP request from another thread, by causing it to throw an InterruptedIOException during the connect or while reading from the input stream, depending upon what state this HTTP request is in when it is disconnected.
      See Also:
    • getResponseCode

      public int getResponseCode()
      Gets the HTTP response status code. From responses like:
      HTTP/1.0 200 OK
      HTTP/1.0 401 Unauthorized
      
      this method extracts the integers 200 and 401 respectively. Returns -1 if the response status code was malformed.

      If this method is called, it must be called after connect has been called. Otherwise the information is not yet available and this method will return -1.

      For advanced features, the user can directly access the status variable.

      Returns:
      The integer status code from the HTTP response.
      See Also:
    • getResponseHeader

      public String getResponseHeader(String key)
      Gets the value associated with the given case-insensitive header name from the HTTP response.

      If this method is called, it must be called after connect has been called. Otherwise the information is not available and this method will return null.

      For advanced features, such as enumerating over all response headers, the user should directly access the responseHeaders variable.

      Parameters:
      key - The case-insensitive name of the response header.
      Returns:
      The value associated with the given name, or null if there is no such header in the response.
      See Also:
    • getContentLength

      public int getContentLength()
      Convenience method to get the "Content-Length" header from the HTTP response.

      If this method is called, it must be called after connect has been called. Otherwise the information is not available and this method will return -1.

      Returns:
      The content length specified in the response headers, or -1 if the length was not specified or malformed (not a number).
      See Also:
    • removePointToPointHeaders

      public static void removePointToPointHeaders(MimeHeaders headers, boolean response)
      Removes all the point-to-point (hop-by-hop) headers from the given mime headers.
      Parameters:
      headers - The mime headers to be modified.
      response - true to remove the point-to-point response headers, false to remove the point-to-point request headers.
      See Also:
    • addHeaders

      public int addHeaders(String tokens, Properties props)
      Convenience method for adding request headers by looking them up in a properties object.
      Parameters:
      tokens - a white space delimited set of tokens that refer to headers that will be added to the HTTP request.
      props - Keys of the form [token].name and [token].value are used to lookup additional HTTP headers to be added to the request.
      Returns:
      The number of headers added to the request
      See Also:
    • getContent

      public String getContent(String encoding) throws IOException, UnsupportedEncodingException
      Get the content as a string. Uses the character encoding specified in the HTTP headers if available. Otherwise the supplied encoding is used, or (if encoding is null), the platform default encoding.
      Parameters:
      encoding - The ISO character encoding to use, if the encoding can't be determined by context.
      Returns:
      The content as a string.
      Throws:
      IOException
      UnsupportedEncodingException
    • getContent

      public String getContent() throws IOException, UnsupportedEncodingException
      Return the content as a string.
      Throws:
      IOException
      UnsupportedEncodingException
    • getEncoding

      public String getEncoding()
    • main

      public static void main(String[] args) throws Exception
      Grab http document(s) and save them in the filesystem. This is a simple batch HTTP url fetcher. Usage:
      java ... sunlabs.brazil.request.HttpRequest [-v(erbose)] [-h(headers)] [-pinvalid input: '<'http://proxyhost:port>] url...
      
      -v
      Verbose. Print the target URL and destination file on stderr
      -h
      Print all the HTTP headers on stderr
      -phttp://proxyhost:port
      The following url's are to be fetched via a proxy.
      The options and url's may be given in any order. Use "-p" by itself to disable the proxy for all following requests.

      There are many limitations: only HTTP GET requests are supported, the output filename is derived autmatically from the URL and can't be overridden, if a destination file already exists, it is overwritten.

      Throws:
      Exception