[xquery-talk] Re: Finding a name and the resulting

Wolfgang Hoschek wolfgang.hoschek at mac.com
Tue Aug 29 18:05:30 PDT 2006


TagSoup outputs an XML document in the XHTML namespace, so a query  
needs to look for namespaced nodes, and declare the xhtml namespaces,  
for example as in:

[hoschek /Users/hoschek/unix/devel/nux] curl 'http:// 
finance.yahoo.com/q?s=IBM' > test5.xml
[hoschek /Users/hoschek/unix/devel/nux] fire-xquery --validate=html -- 
query='{declare namespace xhtml="http://www.w3.org/1999/xhtml"; // 
xhtml:td}' test5.xml

Without specifying the namespaces, the query will come up with an  
empty result sequence.

Wolfgang.

On Aug 29, 2006, at 4:45 PM, Graham Reeds wrote:

> Sorry about the delay in replying to the questions - other matters  
> to attend to.
>
> > Hey, that was not a question that can be answered using yes/no! ;-)
>
> Sorry about that.  Didn't read your response properly.
>
> >
> > I think you really have to come up with the problem solution in
> > non-XQuery terms before we (the list) can help you implement that in
> > XQuery. E.g. find out how you can determine which table cell relates
> > to which user, what do you want do to with multiple values for one
> > user, are there any exceptions etc.
>
> The table that I deemed would be the easiest is work with is 4  
> cells wide with the possibility of having just 2 cells populated  
> with data.
>
> The cells are simply name-value pairs with the first cell the name   
> and the second cell the value (an alpha-numeric).  To conserve  
> screen space the original authors placed 2 name-value pairs per row  
> - awkward I know (they didn't even hyper link them instead had to  
> go to another screen to see how far in the workers are on the  
> project).
>
> An ascii example of the layout:
>
> +------+---------+------+----------+
> |  Tom  | ABC123| Dick   | DEF456 |
> +------+---------+------+----------+
> |  Harry | IJK789  |          |                |
> +------+---------+------+----------+
>
> This table is nested within other tables for layout and really is  
> an antiquated system - the amount of hours I have put in this  
> (between other tasks) I think I could of written the features and  
> learnt the finer points of java in the same time (c++ is my first  
> language).
>
> Currently I have program that can read in a page that using a  
> combination of Nux, Xom, TagSoup and Saxon.  In trying to implement  
> http://www-128.ibm.com/developerworks/xml/library/j-jtp03225.html  
> scraping of the Yahoo stock quote for IBM using the below code  
> simply gives the output of <table /> instead of 81.40.  I may of  
> misinterpreted how to get the value out of results but I should of  
> got slightly more than a closed table.  It is entirely possible  
> though that tagsoup has nuked all possibility of extracting the  
> expected value.  That is something I need to look into.
>
> Anyway, thanks for all your continued help.
>
> Graham Reeds.
>
> source:
>
>     public void getPage()
>     {
>         try
>         {
>             XMLReader tagsoup = XMLReaderFactory.createXMLReader 
> ("org.ccil.cowan.tagsoup.Parser");
>             Document doc = new Builder(tagsoup).build("http:// 
> finance.yahoo.com/q?s=IBM");
>
>             String query = "<table>\n"+
> "{\n"+
> "  for $d in //td\n"+
> "  where contains($d/text()[1], \"Last Trade\")\n"+
> "  return <tr><td> { data($d/following-sibling::td) } </td></tr>\n"+
> "}\n"+
> "</table>";
>             Nodes results = XQueryUtil.xquery(doc, query);
>
>             for (int i=0; i < results.size(); i++)
>             {
>                 System.out.println(results.get(i).toXML());
> //                System.out.println(results.get(i));
>             }
>         }
>         catch (/* the various exceptions */)
>         {
>          // ...
>         }
>
> _______________________________________________
> talk at x-query.com
> http://x-query.com/mailman/listinfo/talk



More information about the talk mailing list