[xquery-talk] Re: Finding a name and the resulting

Graham Reeds grahamr at ntlworld.com
Wed Aug 30 01:45:39 PDT 2006


Sorry about the delay in replying to the questions - other matters to 
attend to.

 > Hey, that was not a question that can be answered using yes/no! ;-)

Sorry about that.  Didn't read your response properly.

 >
 > I think you really have to come up with the problem solution in
 > non-XQuery terms before we (the list) can help you implement that in
 > XQuery. E.g. find out how you can determine which table cell relates
 > to which user, what do you want do to with multiple values for one
 > user, are there any exceptions etc.

The table that I deemed would be the easiest is work with is 4 cells 
wide with the possibility of having just 2 cells populated with data.

The cells are simply name-value pairs with the first cell the name  and 
the second cell the value (an alpha-numeric).  To conserve screen space 
the original authors placed 2 name-value pairs per row - awkward I know 
(they didn't even hyper link them instead had to go to another screen to 
see how far in the workers are on the project).

An ascii example of the layout:

+------+---------+------+----------+
|  Tom  | ABC123| Dick   | DEF456 |
+------+---------+------+----------+
|  Harry | IJK789  |          |                |
+------+---------+------+----------+

This table is nested within other tables for layout and really is an 
antiquated system - the amount of hours I have put in this (between 
other tasks) I think I could of written the features and learnt the 
finer points of java in the same time (c++ is my first language).

Currently I have program that can read in a page that using a 
combination of Nux, Xom, TagSoup and Saxon.  In trying to implement 
http://www-128.ibm.com/developerworks/xml/library/j-jtp03225.html 
scraping of the Yahoo stock quote for IBM using the below code simply 
gives the output of <table /> instead of 81.40.  I may of misinterpreted 
how to get the value out of results but I should of got slightly more 
than a closed table.  It is entirely possible though that tagsoup has 
nuked all possibility of extracting the expected value.  That is 
something I need to look into.

Anyway, thanks for all your continued help.

Graham Reeds.

source:

     public void getPage()
     {
         try
         {
             XMLReader tagsoup = 
XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
             Document doc = new 
Builder(tagsoup).build("http://finance.yahoo.com/q?s=IBM");

             String query = "<table>\n"+
"{\n"+
"  for $d in //td\n"+
"  where contains($d/text()[1], \"Last Trade\")\n"+
"  return <tr><td> { data($d/following-sibling::td) } </td></tr>\n"+
"}\n"+
"</table>";
             Nodes results = XQueryUtil.xquery(doc, query);

             for (int i=0; i < results.size(); i++)
             {
                 System.out.println(results.get(i).toXML());
//                System.out.println(results.get(i));
             }
         }
         catch (/* the various exceptions */)
         {
          // ...
         }



More information about the talk mailing list