[xquery-talk] Doing some Pattern Frequency Distribution

Kusunam, Srinivas SKusunam at rlpt.com
Thu Jun 8 14:22:13 PDT 2006


Thanks for the reply Andrew. This is just one of the list of things I am
trying achieve using XQuery.

I have a requirement to get Count, Blank Count, Max Length, Frequency
Distribution and Pattern Frequency Distribution on some of the elements
in an XML which can go up to a size of 5GB. With my initial reading on
XSLT and XQuery I felt XQuery is a best candidate for this. As you
suggested using XSLT for "Pattern Frequency Distribution (PFD)" I need
to change the whole solution from XQuery to XSLT. I gave Phone number as
simple example I might need to get PFD on Address, ZIP and literally on
element type.

What do you guys suggest? 

** Count, Blank Count, Max Length, Frequency Distribution seems to be
working fine on a XML file size of 1.5GB using Data Direct and also
Saxon. 

Thanks,
Srini

-----Original Message-----
From: andrew welch [mailto:andrew.j.welch at gmail.com] 
Sent: Wednesday, June 07, 2006 4:02 PM
To: Kusunam, Srinivas
Subject: Re: [xquery-talk] Doing some Pattern Frequency Distribution

On 6/7/06, Kusunam, Srinivas <SKusunam at rlpt.com> wrote:
> I have a requirement to find out the pattern frequency distribution of
> some of the elements say Phone number.
>
> Here is the example
>
> <DOC>
>         <ELEMENT>
>                 <PHONE>123-456-7890 </PHONE>
>         </ELEMENT>
>         <ELEMENT>
>                 <PHONE>123-456-7899 </PHONE>
>         </ELEMENT>
>         <ELEMENT>
>                 <PHONE>123.456.7890 </PHONE>
>         </ELEMENT>
>         <ELEMENT>
>                 <PHONE>(123)456-7890 </PHONE>
>         </ELEMENT>
> </DOC>
>
> Output should be something like this:
>         Pattern: 999-999-9999   count:2
>         Pattern: 999.999.9999   count:1
>         Pattern: (999)999-9999  count:1
>
> Is it possible to achieve this using XQuery? If yes how do we do this?
> Any pointers or suggestions are welcome.

This can be achieved in XQuery, but the grouping facility of XSLT 2.0
makes it easier:

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:variable name="zeroToNine" select="'0123456789'"/>

<xsl:template match="/">
  <xsl:for-each-group select="//PHONE"
                                   group-by="translate(., $zeroToNine,
'9999999999')">
  <xsl:value-of select="concat('Pattern: ',
                                          current-grouping-key(), '
count: ',
                                          count(current-group()),
'&#xa;' )"/>
	</xsl:for-each-group>
</xsl:template>

</xsl:stylesheet>

cheers
andrew
*****************************************************************
This message has originated from RLPTechnologies,
26955 Northwestern Highway, Southfield, MI 48034.

RLPTechnologies sends various types of email
communications.  If this email message concerns the
potential licensing of an RLPT product or service, and
you do not wish to receive further emails regarding Polk
products, forward this email to Do_Not_Send at rlpt.com
with the word "remove" in the subject line.

The email and any files transmitted with it are confidential
and intended solely for the individual or entity to whom they
are addressed.

If you have received this email in error, please delete this
message and notify the Polk System Administrator at
postmaster at rlpt.com.
*****************************************************************




More information about the talk mailing list