Working with HTTPHandling mail messages is certainly interesting, and mail protocols are probably still the most widespread Internet protocols. The other popular protocol is HTTP, which is used by web servers and web browsers. I'll devote the rest of this chapter to this protocol (along with a discussion of HTML); the following two chapters also discuss it. On the client side of the Web, the main activity is browsing—reading HTML files. Besides building a custom browser, you can embed the Internet Explorer ActiveX control within your program (as I've done in the WebDemo example in Chapter 12, "From COM to COM+"). You can also directly activate the browser installed on the user's computer—for example, opening an HTML page by calling the ShellExecute method (defined in the ShellApi unit): Using ShellExecute, you can simply execute a document, such as a file. Windows will start the program associated with the HTM extension, using the action passed as the parameter (in this case, open, but passing nil would have invoked the standard action producing the same effect). You can use a similar call to view a website, by using a string like "http://www.example.com" instead of a filename. In this case, the system recognizes the http section of the request as requiring a web browser and launches it. On the server side, you generate and make available the HTML pages. At times, it may be enough to have a way to produce static pages, occasionally extracting new data from a database table to update the HTML files as needed. In other cases, you'll need to generate pages dynamically based on a request from a user. As a starting point, I'll discuss HTTP by building a simple but complete client and server; then I'll move on to discuss HTML producer components. In Chapter 20, I'll move from this "core technology" level to the RAD development style for the web supported by Delphi, introducing the web server extension technologies (CGI, ISAPI, and Apache modules) and discussing the WebBroker and WebSnap architectures. Grabbing HTTP ContentAs an example of the use of the HTTP protocols, I've written a specific search application. The program hooks onto the Google website, searches for a keyword, and retrieves the first 100 sites found. Instead of showing the resulting HTML file, the program parses it to extract only the URLs of the related sites to a list box. The description of these sites is kept in a separate string list and is displayed as you click a list-box item. So, the program demonstrates two techniques at once: retrieving a web page and parsing its HTML code. To demonstrate how you should work with blocking connections, such as those used by Indy, I've implemented the program using a background thread for the processing. This approach also gives you the advantage of being able to start multiple searches at once. The thread class used by the WebFind application receives as input a URL to look for, strUrl. The class has two output procedures, AddToList and ShowStatus, to be called inside the Synchronize method. The code of these two methods sends some results or some feedback to the main form, respectively adding a line to the list box and changing the status bar's SimpleText property. The key method of the thread is Execute. Before we look at it, however, here is how the thread is activated by the main form: const strSearch = 'http://www.google.com/search?as_q='; procedure TForm1.BtnFindClick(Sender: TObject); var FindThread: TFindWebThread; begin // create suspended, set initial values, and start FindThread := TFindWebThread.Create (True); FindThread.FreeOnTerminate := True; // grab the first 100 entries FindThread.strUrl := strSearch + EditSearch.Text +'&num=100'; FindThread.Resume; end; The URL string is made of the main address of the search engine, followed by some parameters. The first, as_q, indicates the words you are looking for. The second, num=100, indicates the number of sites to retrieve; you cannot use numbers at will but are limited to a few alternatives, with 100 being the largest possible value.
The thread's Execute method, activated by the Resume call, calls the two methods doing the work (shown in Listing 19.1). In the first, GrabHtml, the program connects to the HTTP server using a dynamically created IdHttp component and reads the HTML with the result of the search. The second method, HtmlToList, extracts the URLs referring to other websites from the result, the strRead string.
Listing 19.1: The TFindWebThread Class (of the WebFind Program)
unit FindTh; interface uses Classes, IdComponent, SysUtils, IdHTTP; type TFindWebThread = class(TThread) protected Addr, Text, Status: string; procedure Execute; override; procedure AddToList; procedure ShowStatus; procedure GrabHtml; procedure HtmlToList; procedure HttpWork (Sender: TObject; AWorkMode: TWorkMode; const AWorkCount: Integer); public strUrl: string; strRead: string; end; implementation { TFindWebThread } uses WebFindF; procedure TFindWebThread.AddToList; begin if Form1.ListBox1.Items.IndexOf (Addr) < 0 then begin Form1.ListBox1.Items.Add (Addr); Form1.DetailsList.Add (Text); end; end; procedure TFindWebThread.Execute; begin GrabHtml; HtmlToList; Status := 'Done with ' + StrUrl; Synchronize (ShowStatus); end; procedure TFindWebThread.GrabHtml; var Http1: TIdHTTP; begin Status := 'Sending query: ' + StrUrl; Synchronize (ShowStatus); Http1 := TIdHTTP.Create (nil); try Http1.Request.UserAgent := 'User-Agent: NULL'; Http1.OnWork := HttpWork; strRead := Http1.Get (StrUrl); finally Http1.Free; end; end; procedure TFindWebThread.HtmlToList; var strAddr, strText: string; nText: integer; nBegin, nEnd: Integer; begin Status := 'Extracting data for: ' + StrUrl; Synchronize (ShowStatus); strRead := LowerCase (strRead); repeat // find the initial part HTTP reference nBegin := Pos ('href=http', strRead); if nBegin <> 0 then begin // get the remaining part of the string, starting with 'http' strRead := Copy (strRead, nBegin + 5, 1000000); // find the end of the HTTP reference nEnd := Pos ('>', strRead); strAddr := Copy (strRead, 1, nEnd - 1); // move on strRead := Copy (strRead, nEnd + 1, 1000000); // add the URL if 'google' is not in it if Pos ('google', strAddr) = 0 then begin nText := Pos ('</a>', strRead); strText := copy (strRead, 1, nText - 1); // remove cached references and duplicates if (Pos ('cached', strText) = 0) then begin Addr := strAddr; Text := strText; AddToList; end; end; end; until nBegin = 0; end; procedure TFindWebThread.HttpWork(Sender: TObject; AWorkMode: TWorkMode; const AWorkCount: Integer); begin Status := 'Received ' + IntToStr (AWorkCount) + ' for ' + strUrl; Synchronize (ShowStatus); end; procedure TFindWebThread.ShowStatus; begin Form1.StatusBar1.SimpleText := Status; end; end.
The program looks for subsequent occurrences of the href=http substring, copying the text up to the closing > character. If the found string contains the word google, or its target text includes the word cached, it is omitted from the result. You can see the effect of this code in the output shown in Figure 19.4. You can start multiple searches at the same time, but be aware that the results will be added to the same memo component. ![]() Figure 19.4: The WebFind application can be used to search for a list of sites on the Google search engine. The WinInet APIWhen you need to use the FTP and HTTP protocols, as alternatives to using particular VCL components, you can use a specific API provided by Microsoft in the WinInet DLL. This library is part of the core operating system and implements the FTP and HTTP protocols on top of the Windows sockets API. With just three calls—InternetOpen, InternetOpenURL, and InternetReadFile—you can retrieve a file corresponding to any URL and store a local copy or analyze it. Other simple methods can be used for FTP; I suggest you look for the source code of the WinInet.pas Delphi unit, which lists all the functions.
The InternetOpen function establishes a generic connection and returns a handle you can use in the InternetOpenURL call. This second call returns a handle to the URL that you can pass to the InternetReadFile function in order to read blocks of data. In the following sample code, the data is stored in a local string. When all the data has been read, the program closes the connection to the URL and the Internet session by calling the InternetCloseHandle function twice: var hHttpSession, hReqUrl: HInternet; Buffer: array [0..1023] of Char; nRead: Cardinal; strRead: string; nBegin, nEnd: Integer; begin strRead := ''; hHttpSession := InternetOpen ('FindWeb', INTERNET_OPEN_TYPE_PRECONFIG, nil, nil, 0); try hReqUrl := InternetOpenURL (hHttpSession, PChar(StrUrl), nil, 0,0,0); try // read all the data repeat InternetReadFile (hReqUrl, @Buffer, sizeof (Buffer), nRead); strRead := strRead + string (Buffer); until nRead = 0; finally InternetCloseHandle (hReqUrl); end; finally InternetCloseHandle (hHttpSession); end; end; Browsing on Your OwnAlthough I doubt you are interested in writing a new web browser, it might be interesting to see how you can grab an HTML file from the Internet and display it locally, using the HTML viewer available in CLX (the TextBrowser control). Connecting this control to an Indy HTTP client, you can quickly come up with a simplistic text-only browser with limited navigation. The core is TextBrowser1.Text := IdHttp1.Get (NewUrl); where NewUrl is complete location of the web resource you want to access. In the BrowseFast example, this URL is entered in a combo box, which keeps track of recent requests. The effect of a similar call is to return the textual portion of a web page (see Figure 19.5), because grabbing the graphic content requires much more complex coding. The TextBrowser control really is better defined as a local file viewer than as a browser. I've added to the program only very limited support for hyperlinks. When a user moves the mouse over a link, its link text is copied to a local variable (NewRequest), which is then used in case of a click on the control to compute the new HTTP request to forward. Merging the current address (LastUrl) with the request, though, is far from trivial, even with the help of the IdUrl class provided by Indy. Here is my code, which handles only the simplest cases: procedure TForm1.TextBrowser1Click(Sender: TObject); var Uri: TIdUri; begin if NewRequest <> '' then begin Uri := TIdUri.Create (LastUrl); if Pos ('http:', NewRequest) > 0 then GoToUrl (NewRequest) else if NewRequest [1] = '/' then GoToUrl ('http://' + Uri.Host + NewRequest) else GoToUrl ('http://' + Uri.Host + Uri.Path + NewRequest); end; end; Again, this example is trivial and far from usable, but building a browser involves little more than the ability to connect via HTTP and display HTML files. A Simple HTTP ServerThe situation with the development of an HTTP server is quite different. Building a server to deliver static pages based on HTML files is far from simple, although one of the Indy demos provides a good starting point. However, a custom HTTP server might be interesting when building a totally dynamic site, something I'll focus on in more detail in Chapter 20. To show you how to begin the development of a custom HTTP server, I've built the HttpServ example. This program has a form with a list box used for logging requests and an IdHTTPServer component with these settings: object IdHTTPServer1: TIdHTTPServer Active = True DefaultPort = 8080 OnCommandGet = IdHTTPServer1CommandGet end The server uses port 8080 instead of the standard port 80, so that you can run it alongside another web server. All the custom code is in the OnCommandGet event handler, which returns a fixed page plus some information about the request: procedure TForm1.IdHTTPServer1CommandGet(AThread: TIdPeerThread; RequestInfo: TIdHTTPRequestInfo; ResponseInfo: TIdHTTPResponseInfo); var HtmlResult: String; begin // log Listbox1.Items.Add (RequestInfo.Document); // respond HtmlResult := '<h1>HttpServ Demo</h1>' + '<p>This is the only page you''ll get from this example.</p><hr>' + '<p>Request: ' + RequestInfo.Document + '</p>' + '<p>Host: ' + RequestInfo.Host + '</p>' + '<p>Params: ' + RequestInfo.UnparsedParams + '</p>' + '<p>The headers of the request follow: <br>' + RequestInfo.RawHeaders.Text + '</p>'; ResponseInfo.ContentText := HtmlResult; end; By passing a path and some parameters in the command line of the browser, you'll see them reinterpreted and displayed. For example, Figure 19.6 shows the effect of this command line: http://localhost:8080/test?user=marco If this example seems too trivial, you'll see a slightly more interesting version in the next section, where I discuss the generation of HTML with Delphi's producer components.
