[xquery-talk] Finding a name and the resulting

Graham Reeds grahamr at ntlworld.com
Thu Aug 17 02:38:34 PDT 2006


I have a HTML file that contains table that has information regarding 
projects and progression of the people working on those progressions

The file itself is passed through TagSoup to make it well formed.  From 
this i would like to extract the information.  For this I am using the 
XOM to build the document and NUX to provide the XQuery capability.

Now I am looking for a name followed by a number.  This is a sample with 
whitespace removed, but linebreaks left in:

<td align="left" colspan="1" rowspan="1" valign="TOP" width="200">
<b>Bob Stevens:
</b>
</td>
<td align="center" colspan="1" rowspan="1" valign="TOP" width="75">
<b/>
<p class="purple">
<b>

<b>SCMM9</b>
</b>
</p>
<b>
</b>
</td>

Other pages have varying numbers of columns but this is the simplest 
page with a single name, followed by a letter/number combo.  Some early 
documents are just a number.

What I need is to parse out the names and their document, from a 
pre-generated list, and fed into an array. So a list of Tom, Dick and 
Harry would mean Tom was [0] in the array, Dick is [1], etc.

Most sites that I have found that talk about XQuery perform simple 
queries that turn one type of XML into another. One of the best ones is 
http://www-128.ibm.com/developerworks/xml/library/j-jtp03225.html.  I 
need some more similar to that if possible.

Thanks, Graham Reeds.




More information about the talk mailing list