[xquery-talk] Re: Finding a name and the resulting
Wolfgang Hoschek
wolfgang.hoschek at mac.com
Tue Aug 29 18:05:30 PDT 2006
TagSoup outputs an XML document in the XHTML namespace, so a query
needs to look for namespaced nodes, and declare the xhtml namespaces,
for example as in:
[hoschek /Users/hoschek/unix/devel/nux] curl 'http://
finance.yahoo.com/q?s=IBM' > test5.xml
[hoschek /Users/hoschek/unix/devel/nux] fire-xquery --validate=html --
query='{declare namespace xhtml="http://www.w3.org/1999/xhtml"; //
xhtml:td}' test5.xml
Without specifying the namespaces, the query will come up with an
empty result sequence.
Wolfgang.
On Aug 29, 2006, at 4:45 PM, Graham Reeds wrote:
> Sorry about the delay in replying to the questions - other matters
> to attend to.
>
> > Hey, that was not a question that can be answered using yes/no! ;-)
>
> Sorry about that. Didn't read your response properly.
>
> >
> > I think you really have to come up with the problem solution in
> > non-XQuery terms before we (the list) can help you implement that in
> > XQuery. E.g. find out how you can determine which table cell relates
> > to which user, what do you want do to with multiple values for one
> > user, are there any exceptions etc.
>
> The table that I deemed would be the easiest is work with is 4
> cells wide with the possibility of having just 2 cells populated
> with data.
>
> The cells are simply name-value pairs with the first cell the name
> and the second cell the value (an alpha-numeric). To conserve
> screen space the original authors placed 2 name-value pairs per row
> - awkward I know (they didn't even hyper link them instead had to
> go to another screen to see how far in the workers are on the
> project).
>
> An ascii example of the layout:
>
> +------+---------+------+----------+
> | Tom | ABC123| Dick | DEF456 |
> +------+---------+------+----------+
> | Harry | IJK789 | | |
> +------+---------+------+----------+
>
> This table is nested within other tables for layout and really is
> an antiquated system - the amount of hours I have put in this
> (between other tasks) I think I could of written the features and
> learnt the finer points of java in the same time (c++ is my first
> language).
>
> Currently I have program that can read in a page that using a
> combination of Nux, Xom, TagSoup and Saxon. In trying to implement
> http://www-128.ibm.com/developerworks/xml/library/j-jtp03225.html
> scraping of the Yahoo stock quote for IBM using the below code
> simply gives the output of <table /> instead of 81.40. I may of
> misinterpreted how to get the value out of results but I should of
> got slightly more than a closed table. It is entirely possible
> though that tagsoup has nuked all possibility of extracting the
> expected value. That is something I need to look into.
>
> Anyway, thanks for all your continued help.
>
> Graham Reeds.
>
> source:
>
> public void getPage()
> {
> try
> {
> XMLReader tagsoup = XMLReaderFactory.createXMLReader
> ("org.ccil.cowan.tagsoup.Parser");
> Document doc = new Builder(tagsoup).build("http://
> finance.yahoo.com/q?s=IBM");
>
> String query = "<table>\n"+
> "{\n"+
> " for $d in //td\n"+
> " where contains($d/text()[1], \"Last Trade\")\n"+
> " return <tr><td> { data($d/following-sibling::td) } </td></tr>\n"+
> "}\n"+
> "</table>";
> Nodes results = XQueryUtil.xquery(doc, query);
>
> for (int i=0; i < results.size(); i++)
> {
> System.out.println(results.get(i).toXML());
> // System.out.println(results.get(i));
> }
> }
> catch (/* the various exceptions */)
> {
> // ...
> }
>
> _______________________________________________
> talk at x-query.com
> http://x-query.com/mailman/listinfo/talk
More information about the talk
mailing list