Delphi Programming Guide
Delphi Programmer 

Menu  Table of contents
Bookmark and Share

Part I - Foundations
  Chapter 1 Delphi 7 and Its IDE
  Chapter 2 The Delphi Programming Language
  Chapter 3 The Run-Time Library
  Chapter 4 Core Library classes
  Chapter 5 Visual Controls
  Chapter 6 Building the User Interface
  Chapter 7 Working with Forms
Part II - Delphi Object-Oriented Architectures
  Chapter 8 The Architecture of Delphi Applications
  Chapter 9 Writing Delphi Components
  Chapter 10 Libraries and Packages
  Chapter 11 Modeling and OOP Programming (with ModelMaker)
  Chapter 12 From COM to COM+
Part III - Delphi Database-Oriented Architectures
  Chapter 13 Delphi's Database Architecture
  Chapter 14 Client/Server with dbExpress
  Chapter 15 Working with ADO
  Chapter 16 Multitier DataSnap Applications
  Chapter 17 Writing Database Components
  Chapter 18 Reporting with Rave
Part IV - Delphi, the Internet, and a .NET Preview
  Chapter 19 Internet Programming: Sockets and Indy
  Chapter 20 Web Programming with WebBroker and WebSnap
  Chapter 21 Web Programming with IntraWeb
  Chapter 22 Using XML Technologies
  Chapter 23 Web Services and SOAP
  Chapter 24 The Microsoft .NET Architecture from the Delphi Perspective
  Chapter 25 Delphi for .NET Preview: The Language and the RTL
       
  Appendix A Extra Delphi Tools by the Author
  Appendix B Extra Delphi Tools from Other Sources
  Appendix C Free Companion Books on Delphi
       
  Index    
  List of Figures    
  List of tables    
  List of Listings    
  List of Sidebars  

 
Previous Section Next Section

Processing Large XML Documents

As you have seen, there are often many different techniques to accomplish the same task with XML. In many cases you can choose any solution with the goal of writing less and more maintainable code; but when you need to process a large number of XML documents or very large XML documents, you must consider efficiency.

Discussing theory by itself is not terribly useful, so I've built an example you can use (and modify) to test different solutions. The example is called LargeXml, and it covers a specific area: moving data from a database to an XML file and back. The example can open a dataset (using dbExpress) and then replicate the data many times in a ClientDataSet in memory. The structure of the in-memory ClientDataSet is cloned from that of the data access component:

SimpleDataSet1.Open;
ClientDataSet1.FieldDefs := SimpleDataSet1.FieldDefs;
ClientDataSet1.CreateDataSet;

After using a radio group to determine the amount of data you want to process (some options require minutes on a slow computer), the data is cloned with this code:

while ClientDataSet1.RecordCount < nCount do
begin
  SimpleDataSet1.RecNo := Random (SimpleDataSet1.RecordCount) + 1;
  ClientDataSet1.Insert;
  ClientDataSet1.Fields [0].AsInteger := Random (10000);
  for I := 1 to SimpleDataSet1.FieldCount - 1 do
    ClientDataSet1.Fields [i].AsString := SimpleDataSet1.Fields [i].AsString;
  ClientDataSet1.Post;
end;

From a ClientDataSet to an XML Document

Now that the program has a (large) dataset in memory, it provides three different ways to save the dataset to a file. The first is to save the XMLData of the ClientDataSet directly to a file, obtaining an attribute-based document. This is probably not the format you want, so the second solution is to apply an XmlMapper transformation with an XMLTransformClient component. The third solution involves processing the dataset directly and writing out each record to a file:

procedure TForm1.btnSaveCustomClick(Sender: TObject);
var
  str: TFileStream;
  s: string;
  i: Integer;
begin
  str := TFileStream.Create ('data3.xml', fmCreate);
  try
    ClientDataSet1.First;
    s := '<?xml version="1.0" standalone="yes" ?><employee>';
    str.Write(s[1], Length (s));
   
    while not ClientDataSet1.EOF do
    begin
      s := '';
      for i := 0 to ClientDataSet1.FieldCount - 1 do
        s := s + MakeXmlstr (ClientDataSet1.Fields[i].FieldName,
          ClientDataSet1.Fields[i].AsString);
      s := MakeXmlStr ('employeeData', s);
      str.Write(s[1], length (s));
      ClientDataSet1.Next
    end;
    s := '</employee>';
    str.Write(s[1], length (s));
  finally
    str.Free;
  end;
end;

This code uses a simple (but effective) support function to create XML nodes:

function MakeXmlstr (node, value: string): string;
begin
  Result := '<' + node + '>' + value + '</' + node + '>';
end;

If you run the program, you can see the time taken by each operation, as shown in Figure 22.13. Saving the ClientDataSet data is the fastest approach, but you probably don't get the result you want. Custom streaming is only slightly slower; but you should consider that this code doesn't require you to first move the data to a ClientDataSet, because you can apply it directly even to a unidirectional dbExpress dataset. You should forget using the code based on the XmlMapper for a large dataset, because it is hundreds of times slower, even for a small dataset (I haven't been able to try a large dataset, because the process takes too long). For example, the 50 milliseconds required by custom streaming for a small dataset become more than 10 seconds when I use the mapping, and the result is very similar.

Click To expand
Figure 22.13: The LargeXml example in action

From an XML Document to a ClientDataSet

Once you have a large XML document, obtained by a program (as in this case) or from an external source, you need to process it. As you have seen, XmlMapper support is far too slow, so you are left with three alternatives: an XSL transformation, a SAX, or a DOM. XSL trans-formations will probably be fast enough, but in this example I've opened the document with a SAX; it's the fastest approach and doesn't require much code. The program can also load a document in a DOM, but I haven't written the code to navigate the DOM and save the data back to a ClientDataSet.

In both cases, I've tested the OpenXml engine versus the MSXML DOM. This allows you to see the two SAX solutions compared, because (unluckily) the code is slightly different. I can summarize the results here: Using the MSXML SAX is slightly faster than using the OpenXml SAX (the difference is about 20 percent), whereas loading in the DOM marks a large advantage in favor of MSXML.

The MSXML SAX code uses the same architecture discussed in the SaxDemo1 example, so here I've listed only the code of the handlers you use. As you can see, at the beginning of an employeeData element you insert a new record, which is posted when the same node is closed. Lower-level nodes are added as fields of the current record. Here is the code:

procedure TMyDataSaxHandler.startElement(var strNamespaceURI, strLocalName,
  strQName: WideString; const oAttributes: IVBSAXAttributes);
begin
  inherited;
  if strLocalName = 'employeeData' then
    Form1.clientdataset2.Insert;
  strCurrent := '';
end;
   
procedure TMyDataSaxHandler.characters(var strChars: WideString);
begin
  inherited;
  strCurrent := strCurrent + RemoveWhites(strChars);
end;
   
procedure TMyDataSaxHandler.endElement(var strNamespaceURI, strLocalName,
  strQName: WideString);
begin
  if strLocalName = 'employeeData' then
    Form1.clientdataset2.Post;
  if stack.Count > 2 then
    Form1.ClientDataSet2.FieldByName (strLocalName).AsString := strCurrent;
  inherited;
end;

The code for the event handlers in the OpenXml version is similar. All that changes are the interface of the methods and the names of the parameters:

type
  TDataSaxHandler = class (TXmlStandardHandler)
  protected
    stack: TStringList;
    strCurrent: string;
  public
    constructor Create(aowner: TComponent); override;
    function endElement(const sender: TXmlCustomProcessorAgent;
      const locator: TdomStandardLocator;
      namespaceURI, tagName: wideString): TXmlParserError; override;
    function PCDATA(const sender: TXmlCustomProcessorAgent;
      const locator: TdomStandardLocator; data: wideString):
      TXmlParserError; override;
    function startElement(const sender: TXmlCustomProcessorAgent;
      const locator: TdomStandardLocator; namespaceURI, tagName: wideString;
      attributes: TdomNameValueList): TXmlParserError; override;
    destructor Destroy; override;
  end;

It is also more difficult to invoke the SAX engine, as shown in the following code (from which I've removed the code for the creation of the dataset, the timing, and the logging):

procedure TForm1.btnReadSaxOpenClick(Sender: TObject);
var
  agent: TXmlStandardProcessorAgent;
  reader: TXmlStandardDocReader;
  filename: string;
begin
  Log := memoLog.Lines;
  filename := ExtractFilePath (Application.Exename) + 'data3.xml';
  agent := TXmlStandardProcessorAgent.Create(nil);
  reader:= TXmlStandardDocReader.Create (nil);
  try
    reader.NextHandler := TDataSaxHandler.Create (nil); // our custom class
    agent.reader := reader;
    agent.processFile(filename, filename);
  finally
    agent.free;
    reader.free;
  end;
end;

 
Previous Section Next Section


 


 


Copyright © 2004-2016 "Delphi Sources". Delphi Programming Guide
     Twitter     Facebook