From joewiz at gmail.com Mon Apr 2 12:00:45 2018 From: joewiz at gmail.com (Joe Wicentowski) Date: Mon, 02 Apr 2018 19:00:45 +0000 Subject: [xquery-talk] Accessing empty namespaced elements in non-empty contexts Message-ID: Hi all, A common question for beginners is how to access empty namespaced elements in non-empty contexts. The easiest answer is to use a wildcard. For example, we use one here inside the curly braces: let $x := return { $x/*:y } As expected, this query returns: The other method I know of is to use the URI-qualified name: let $x := return { $x/Q{}y } A third method occurred to me, but it does not work, and I haven't been able to pin down the reason despite reading the spec. The idea is to bind the empty namespace URI to a namespace prefix, e.g., "my": declare namespace my=""; let $x := return { $x/my:y } In BaseX, eXist, and Saxon, this returns err:XPST0081 ( https://www.w3.org/TR/xquery-31/#ERRXPST0081). The location of the error points to the step, "my:y" on the final line. Can anyone enlighten me on the reason this 3rd method is invalid? Bonus points for an explanation that includes why I cannot bind a namespace prefix to the empty namespace, but I can access it via *: and Q{}? Thanks, Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at saxonica.com Mon Apr 2 12:19:04 2018 From: mike at saxonica.com (Michael Kay) Date: Mon, 2 Apr 2018 20:19:04 +0100 Subject: [xquery-talk] Accessing empty namespaced elements in non-empty contexts In-Reply-To: References: Message-ID: XML doesn't allow you to bind a prefix to the thing-that-is-not-a-namespace. With XML namespaces 1.1, the declaration xmlns:x="" doesn't bind x to anything, it unbinds x. Within the scope of this (un)declaratiion, the prefix x cannot be used. It makes sense for XQuery to do the same thing. The rules are in ?3.9.1.2: If the namespace URI is a zero-length string and the implementation supports [XML Names 1.1] , any existing namespace binding for the given prefix is removed from the in-scope namespaces of the constructed element and from thestatically known namespaces of the constructor expression. If the namespace URI is a zero-length string and the implementation does not support [XML Names 1.1] , a static error is raised [err:XQST0085 ]. It isimplementation-defined whether an implementation supports [XML Names] or [XML Names 1.1] . Michael Kay Saxonica > On 2 Apr 2018, at 20:00, Joe Wicentowski wrote: > > Hi all, > > A common question for beginners is how to access empty namespaced elements in non-empty contexts. The easiest answer is to use a wildcard. For example, we use one here inside the curly braces: > > let $x := > return > { $x/*:y } > > As expected, this query returns: > > > > The other method I know of is to use the URI-qualified name: > > let $x := > return > { $x/Q{}y } > > A third method occurred to me, but it does not work, and I haven't been able to pin down the reason despite reading the spec. The idea is to bind the empty namespace URI to a namespace prefix, e.g., "my": > > declare namespace my=""; > > let $x := > return > { $x/my:y } > > In BaseX, eXist, and Saxon, this returns err:XPST0081 (https://www.w3.org/TR/xquery-31/#ERRXPST0081 ). The location of the error points to the step, "my:y" on the final line. > > Can anyone enlighten me on the reason this 3rd method is invalid? Bonus points for an explanation that includes why I cannot bind a namespace prefix to the empty namespace, but I can access it via *: and Q{}? > > Thanks, > Joe > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk -------------- next part -------------- An HTML attachment was scrubbed... URL: From joewiz at gmail.com Tue Apr 3 05:20:28 2018 From: joewiz at gmail.com (Joe Wicentowski) Date: Tue, 03 Apr 2018 12:20:28 +0000 Subject: [xquery-talk] Accessing empty namespaced elements in non-empty contexts In-Reply-To: References:

Message-ID: Hi Michael, Thank you for your reply. It's interesting that until XPath 3.0, we could *construct* empty-namespace elements inside non-empty namespace elements (via the xmlns="" declaration), but we couldn't *query* them without resorting to the somewhat means of indirect wildcards (e.g., `/*:elem`) or use of the local-name function (e.g., `/*[local-name() eq 'elem']`). Is that right? If so, this drives home the importance to me of the URI-qualified names feature (e.g., `/Q{}elem`) released with XPath 3.0. I'd be interested to know more about the history of the development of namespaces and efforts to bridge the world of namespaces (namespace axis-land) with the world of documents without this axis (namespace flat-land). Is there any good published account of this? Joe On Mon, Apr 2, 2018 at 3:19 PM Michael Kay wrote: > XML doesn't allow you to bind a prefix to the > thing-that-is-not-a-namespace. > > With XML namespaces 1.1, the declaration xmlns:x="" doesn't bind x to > anything, it unbinds x. Within the scope of this (un)declaratiion, the > prefix x cannot be used. > > It makes sense for XQuery to do the same thing. The rules are in ?3.9.1.2: > > If the namespace URI is a zero-length string and the implementation > supports [XML Names 1.1] , > any existing namespace binding for the given prefix is removed from the in-scope > namespaces of > the constructed element and from thestatically known namespaces > of the > constructor expression. If the namespace URI is a zero-length string and > the implementation does not support [XML Names 1.1] > , a static error is raised [ > err:XQST0085 ]. It is > implementation-defined > whether an > implementation supports [XML Names] > or [XML Names 1.1] > . > > Michael Kay > Saxonica > > On 2 Apr 2018, at 20:00, Joe Wicentowski wrote: > > Hi all, > > A common question for beginners is how to access empty namespaced elements > in non-empty contexts. The easiest answer is to use a wildcard. For > example, we use one here inside the curly braces: > > let $x := > return > { $x/*:y } > > As expected, this query returns: > > > > The other method I know of is to use the URI-qualified name: > > let $x := > return > { $x/Q{}y } > > A third method occurred to me, but it does not work, and I haven't been > able to pin down the reason despite reading the spec. The idea is to bind > the empty namespace URI to a namespace prefix, e.g., "my": > > declare namespace my=""; > > let $x := > return > { $x/my:y } > > In BaseX, eXist, and Saxon, this returns err:XPST0081 ( > https://www.w3.org/TR/xquery-31/#ERRXPST0081). The location of the error > points to the step, "my:y" on the final line. > > Can anyone enlighten me on the reason this 3rd method is invalid? Bonus > points for an explanation that includes why I cannot bind a namespace > prefix to the empty namespace, but I can access it via *: and Q{}? > > Thanks, > Joe > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benito at benibela.de Wed Apr 4 09:14:27 2018 From: benito at benibela.de (Benito van der Zander) Date: Wed, 4 Apr 2018 18:14:27 +0200 Subject: [xquery-talk] [ANN] Xidel 0.9.8 released Message-ID: <735fcdb7-cc19-93a4-02c7-92d4714f76a2@benibela.de> Hello, Xidel is a small command line XQuery interpreter to run queries on downloaded X/HTML pages or JSON-APIs. It supports XPath 3.0, XQuery 3.0 + JSONiq expressions, compatibility modes for older XPath/XQuery versions as well as CSS 3 selectors and custom pattern matching. The 0.9.8 version improves various things: - Cookie handling follows RFC 6265 rather than sending all cookies to all servers. - add t:siblings-header/siblings elements to pattern matcher to match certain element siblings regardless of their ordering (e.g. table columns). - add functions x:call-action, x:has-action, x:get-log, x:clear-log to give programmatic access to multipage templates and variable changelog. - add --module, --module-path parameters to load XQuery modules into (xpath) queries and properly resolve relative paths for module imports. - fix system(), file:exists, file:move (override), file:path-to-uri on Windows - further minor bug fixes and performance improvements You can learn more on the homepage here: http://www.videlibri.de/xidel.html Benito -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdysonsmith at gmail.com Wed Apr 4 09:26:57 2018 From: bdysonsmith at gmail.com (Bridger Dyson-Smith) Date: Wed, 4 Apr 2018 12:26:57 -0400 Subject: [xquery-talk] [ANN] Xidel 0.9.8 released In-Reply-To: <735fcdb7-cc19-93a4-02c7-92d4714f76a2@benibela.de> References: <735fcdb7-cc19-93a4-02c7-92d4714f76a2@benibela.de> Message-ID: Hi Benito - Congratulations on the new release! Xidel has been a great and helpful utility - thanks for the announcement. Best, Bridger On Wed, Apr 4, 2018 at 12:14 PM, Benito van der Zander wrote: > Hello, > > Xidel is a small command line XQuery interpreter to run queries on > downloaded X/HTML pages or JSON-APIs. It supports XPath 3.0, XQuery 3.0 + > JSONiq expressions, compatibility modes for older XPath/XQuery versions as > well as CSS 3 selectors and custom pattern matching. > > The 0.9.8 version improves various things: > > > - Cookie handling follows RFC 6265 rather than sending all cookies to all > servers. > - add t:siblings-header/siblings elements to pattern matcher to match > certain element siblings regardless of their ordering (e.g. table columns). > - add functions x:call-action, x:has-action, x:get-log, x:clear-log to > give programmatic access to multipage templates and variable changelog. > - add --module, --module-path parameters to load XQuery modules into > (xpath) queries and properly resolve relative paths for module imports. > - fix system(), file:exists, file:move (override), file:path-to-uri on > Windows > - further minor bug fixes and performance improvements > > > You can learn more on the homepage here: http://www.videlibri.de/xidel. > html > > > Benito > > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joewiz at gmail.com Sat Apr 14 11:14:50 2018 From: joewiz at gmail.com (Joe Wicentowski) Date: Sat, 14 Apr 2018 18:14:50 +0000 Subject: [xquery-talk] Adaptive serialization of an empty sequence Message-ID: Hi all, Many thanks, as always, for the very helpful feedback here. I have noticed that Saxon, eXist, and BaseX all serialize the empty sequence `()` not as `()` but instead as the empty string ``. Sample code: serialize((), map { "method": "adaptive" }) I was expecting to see `()` because when serializing a map entry, the empty sequence is serialized as `()`: serialize(map { "test": () }, map { "method": "adaptive" }) This returns `map{"blah":()}`. Can anyone enlighten me on why the empty sequence is serialized as `()` in the latter context and the empty string in the former? Thanks, Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at saxonica.com Sat Apr 14 11:58:40 2018 From: mike at saxonica.com (Michael Kay) Date: Sat, 14 Apr 2018 19:58:40 +0100 Subject: [xquery-talk] Adaptive serialization of an empty sequence In-Reply-To: References: Message-ID: As always with "why?" questions, it's difficult to know what kind of answer you want, between (a) Where does the spec say this should happen? (b) Why does the the spec say this should happen? and (b) breaks down to either (b1) Why would this be a reasonable choice for the spec-writers to make? (b2) As a matter of historical record, who proposed that it should be like this and what justification did they put forward? Regarding (a), the spec says: Each item in the supplied sequence is serialized individually as follows, with an occurrence of the chosen item-separator between successive items. I think it's a reasonable reading of that that adaptive(S) == string-join(S!adaptive(.), item-separator), which leads to the conclusion that the serialization of () is "". Regarding (b2), my main recollection of relevant discussions concerns streamability: specifically, it should be possible to serialize each item independently without knowing what follows. But if someone had proposed serializing () as "()", I don't think that could really have been opposed on streamability grounds. But I don't think anyone proposed it. Regarding (b1), the main clue about the WG's thinking is the sentence The intention of this is to allow any valid XDM instance to be serialized without raising a serialization error. So you find that the adaptive method focuses on how to serialize cases that otherwise would fail. Serializing the empty sequence wouldn't otherwise fail, so I guess it didn't receive much attention. Whether a proposal to serialize () as "()" would have been accepted is anyone's guess. Michael Kay Saxonica > On 14 Apr 2018, at 19:14, Joe Wicentowski wrote: > > Hi all, > > Many thanks, as always, for the very helpful feedback here. > > I have noticed that Saxon, eXist, and BaseX all serialize the empty sequence `()` not as `()` but instead as the empty string ``. Sample code: > > serialize((), map { "method": "adaptive" }) > > I was expecting to see `()` because when serializing a map entry, the empty sequence is serialized as `()`: > > serialize(map { "test": () }, map { "method": "adaptive" }) > > This returns `map{"blah":()}`. > > Can anyone enlighten me on why the empty sequence is serialized as `()` in the latter context and the empty string in the former? > > Thanks, > Joe > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk From joewiz at gmail.com Sat Apr 14 12:47:15 2018 From: joewiz at gmail.com (Joe Wicentowski) Date: Sat, 14 Apr 2018 19:47:15 +0000 Subject: [xquery-talk] Adaptive serialization of an empty sequence In-Reply-To: References:

Message-ID: Thank you, Mike. That explanation is perfectly reasonable; this handling certainly meets the stated intention. Joe On Sat, Apr 14, 2018 at 2:58 PM Michael Kay wrote: > As always with "why?" questions, it's difficult to know what kind of > answer you want, between > > (a) Where does the spec say this should happen? > > (b) Why does the the spec say this should happen? > > and (b) breaks down to either > > (b1) Why would this be a reasonable choice for the spec-writers to make? > > (b2) As a matter of historical record, who proposed that it should be like > this and what justification did they put forward? > > Regarding (a), the spec says: > > > Each item in the supplied sequence is serialized individually as follows, > with an occurrence of the chosen item-separator between successive items. > > > I think it's a reasonable reading of that that adaptive(S) == > string-join(S!adaptive(.), item-separator), which leads to the conclusion > that the serialization of () is "". > > Regarding (b2), my main recollection of relevant discussions concerns > streamability: specifically, it should be possible to serialize each item > independently without knowing what follows. But if someone had proposed > serializing () as "()", I don't think that could really have been opposed > on streamability grounds. But I don't think anyone proposed it. > > Regarding (b1), the main clue about the WG's thinking is the sentence > > > The intention of this is to allow any valid XDM instance to be serialized > without raising a serialization error. > > > So you find that the adaptive method focuses on how to serialize cases > that otherwise would fail. Serializing the empty sequence wouldn't > otherwise fail, so I guess it didn't receive much attention. Whether a > proposal to serialize () as "()" would have been accepted is anyone's guess. > > Michael Kay > Saxonica > > > > On 14 Apr 2018, at 19:14, Joe Wicentowski wrote: > > > > Hi all, > > > > Many thanks, as always, for the very helpful feedback here. > > > > I have noticed that Saxon, eXist, and BaseX all serialize the empty > sequence `()` not as `()` but instead as the empty string ``. Sample code: > > > > serialize((), map { "method": "adaptive" }) > > > > I was expecting to see `()` because when serializing a map entry, the > empty sequence is serialized as `()`: > > > > serialize(map { "test": () }, map { "method": "adaptive" }) > > > > This returns `map{"blah":()}`. > > > > Can anyone enlighten me on why the empty sequence is serialized as `()` > in the latter context and the empty string in the former? > > > > Thanks, > > Joe > > _______________________________________________ > > talk at x-query.com > > http://x-query.com/mailman/listinfo/talk > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joewiz at gmail.com Mon Apr 23 09:22:40 2018 From: joewiz at gmail.com (Joe Wicentowski) Date: Mon, 23 Apr 2018 16:22:40 +0000 Subject: [xquery-talk] An analyze-string stumper Message-ID: Hi all, I have encountered an unexpected challenge constructing a regex for a pattern I am looking for. I am looking for numbers in parentheses. For example, in the following string: "On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171)" ... I would like to match "79" and "171" (but not "UAR" or "13" or "1968"). I have been trying to construct a regex for use with analyze-string to capture this pattern, but I have not been successful. I have tried the following: analyze-string($string, "(?:$)(?:(\d+)(?:, )?)+(?:$)") In other words, there are these 3 components: 1. (?:$) a non-capturing group consisting of an open parens, followed by 2. (?:(\d+)(?:, )?)+ one or more non-capturing groups consisting of (a number followed by an optional, non-matching comma-and-space), followed by 3. (?:$) a non-capturing group consisting of a close parens I was expecting to get the following output:

On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace.

(79, 171) However, the actual result is that the first number ("79") is skipped, and only the 2nd number ("171") is captured:

(79, 171) What am I missing? Can anyone suggest a regex that is able to capture both numbers inside the parentheses? Or do I need to make a two-pass run through this, finding parenthetical text with a first analyze-string like "$.+$" and then looking inside its matches with a second analyze-string like "(\d+)(?:, )?"? Thanks, Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick at durusau.net Mon Apr 23 11:50:42 2018 From: patrick at durusau.net (Patrick Durusau) Date: Mon, 23 Apr 2018 14:50:42 -0400 Subject: [xquery-talk] An analyze-string stumper In-Reply-To: References: Message-ID: <920bceb3-f255-96d7-3e61-a79dfef431fd@durusau.net> Joe, Forgive the length but I'm likely to bump my head on this issue in the future, so a fuller than necessary explanation: Started with the simplest regex that would capture the parens: 1. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171) ", "$\d.*$") 1. Result: ? On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic ? (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171) ? OK, so what do we know about the desired matches? Digits plus (, ) with no spaces. Yes? 2. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171) ", "$\d, \d+$") So I match parens plus digits, ", " (comma plus whitespace), digits plus paren. 2. Result: ? On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. ? (79, 171) ? I need to split the two numbers and what better to do that than alternative matching? 3. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171) ", "$\d+ | \d+$") 3. Result: ? On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, ? 171) ? Your probably already laughing because you see my mistake, which I correct in #4: 4. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171) ", "$\d+|\d+$") 4. Result: ? On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. ? (79 ? , ? 171) ? The error was here: "$\d+ | \d+$", which would only match (any-digit plus a white space, whereas the number in question was followed by *no space* and a comma. Know thy data! Examples created on BaseX. BTW, I started from known good examples in XQuery Functions 3.1, verified that they worked and then created the search strings. Hope this helps! Patrick On 04/23/2018 12:22 PM, Joe Wicentowski wrote: > Hi all, > > I have encountered an unexpected challenge constructing a regex for a > pattern I am looking for.? I?am looking for numbers in parentheses.? > For example, in the following string: > > ? "On February 13, 1968, Secretary of State Dean Rusk sent a? > ? ? message to Israeli Foreign Minister Abba Eban calling upon Israel to? > ? ? endorse openly Resolution 242, and on May 13 President Johnson sent a? > ? ? letter to United Arab Republic (UAR) President Gamal Abdel Nasser,? > ? ? urging him to seize the unique opportunity offered by the Jarring? > ? ? mission to achieve peace. (79, 171)" > > ... I would like to match "79" and "171" (but not "UAR" or "13" or > "1968").? I have been trying to construct a regex for use with > analyze-string to capture this pattern, but I have not been > successful.? I have tried the following: > > ? analyze-string($string, "(?:$)(?:(\d+)(?:, )?)+(?:$)") > > In other words, there are these 3 components: > > ? 1. (?:$) a non-capturing group consisting of an open parens, > followed by > ? 2. (?:(\d+)(?:, )?)+ one or more non-capturing groups consisting of > (a number followed by an optional, non-matching comma-and-space), > followed by > ? 3. (?:$)?a non-capturing group consisting of?a?close parens > > I was expecting to get the following output: > > ? xmlns:fn="http://www.w3.org/2005/xpath-functions"> > ? ? On February 13, 1968, Secretary of State Dean Rusk > sent a? > ? ? message to Israeli Foreign Minister Abba Eban calling upon Israel to? > ? ? endorse openly Resolution 242, and on May 13 President Johnson sent a? > ? ? letter to United Arab Republic (UAR) President Gamal Abdel Nasser,? > ? ? urging him to seize the unique opportunity offered by the Jarring? > ? ? mission to achieve peace. > ? ? (79,? > ? ? ? 171) > ? > > However, the actual result is that the first number ("79") is skipped, > and only the 2nd number ("171") is captured: > > ? xmlns:fn="http://www.w3.org/2005/xpath-functions"> > ? ? On February 13, 1968, Secretary of State Dean Rusk > sent a? > ? ? message to Israeli Foreign Minister Abba Eban calling upon Israel to? > ? ? endorse openly Resolution 242, and on May 13 President Johnson sent a? > ? ? letter to United Arab Republic (UAR) President Gamal Abdel Nasser,? > ? ? urging him to seize the unique opportunity offered by the Jarring? > ? ? mission to achieve peace. > ? ? (79,? > ? ? ? 171) > ? > > What am I missing?? Can anyone suggest a regex that is able to capture > both numbers inside the parentheses?? Or do I need to make a two-pass > run through this, finding parenthetical text with a first > analyze-string like "$.+$" and then looking inside its matches with > a second analyze-string like "(\d+)(?:, )?"? > > Thanks, > Joe > > > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk -- Patrick Durusau patrick at durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From joewiz at gmail.com Mon Apr 23 12:50:52 2018 From: joewiz at gmail.com (Joe Wicentowski) Date: Mon, 23 Apr 2018 19:50:52 +0000 Subject: [xquery-talk] An analyze-string stumper In-Reply-To: <920bceb3-f255-96d7-3e61-a79dfef431fd@durusau.net> References: <920bceb3-f255-96d7-3e61-a79dfef431fd@durusau.net> Message-ID: Hi Patrick, Thanks for your reply! That 4th version is certainly promising, but I wonder, will it capture a case I have but regrettably didn't mention explicitly: more than 2 numbers? Here's an example: The most significant elements in the package were 18 F-104 fighters and 100 M 48 tanks. (72, 76, 77, 82, 89, 95, 99, 107, 111, 125) Here, I've got more than 2 numbers inside the parentheses, so I can't count on a parens to begin or end a number. I was hoping to find a pattern that would wrap each of the numbers inside the parentheses in an element, without jeopardizing inadvertent hits on numbers outside the parentheses. I'd take any solution or hint, but what really threw me about my attempts was that I wasn't able to use the open and close parentheses to anchor my search and allow arbitrary repeats of number-plus-optional-comma-and-space "(\d+(, )?)+" within a pair of parentheses. I couldn't see why this wasn't capturing each of the numbers within the parentheses. Joe On Mon, Apr 23, 2018 at 2:54 PM Patrick Durusau wrote: > Joe, > > Forgive the length but I'm likely to bump my head on this issue in the > future, so a fuller than necessary explanation: > > Started with the simplest regex that would capture the parens: > > 1. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk > sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging > him to seize the unique opportunity offered by the Jarring mission to > achieve peace. (79, 171) ", "$\d.*$") > 1. Result: "http://www.w3.org/2005/xpath-functions" > > > On February 13, 1968, Secretary of State Dean Rusk sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic > (UAR) President Gamal Abdel Nasser, urging him to seize the > unique opportunity offered by the Jarring mission to achieve peace. (79, > 171) > > > > OK, so what do we know about the desired matches? Digits plus (, ) with no > spaces. Yes? > > 2. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk > sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging > him to seize the unique opportunity offered by the Jarring mission to > achieve peace. (79, 171) ", "$\d, \d+$") > > So I match parens plus digits, ", " (comma plus whitespace), digits plus > paren. > > 2. Result: "http://www.w3.org/2005/xpath-functions" > > > On February 13, 1968, Secretary of State Dean Rusk sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging > him to seize the unique opportunity offered by the Jarring mission to > achieve peace. > (79, 171) > > > > I need to split the two numbers and what better to do that than > alternative matching? > > 3. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk > sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging > him to seize the unique opportunity offered by the Jarring mission to > achieve peace. (79, 171) ", "$\d+ | \d+$") > > 3. Result: "http://www.w3.org/2005/xpath-functions" > > > On February 13, 1968, Secretary of State Dean Rusk sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging > him to seize the unique opportunity offered by the Jarring mission to > achieve peace. (79, > 171) > > > > Your probably already laughing because you see my mistake, which I correct > in #4: > > 4. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk > sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging > him to seize the unique opportunity offered by the Jarring mission to > achieve peace. (79, 171) ", "$\d+|\d+$") > > 4. Result: "http://www.w3.org/2005/xpath-functions" > > > On February 13, 1968, Secretary of State Dean Rusk sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging > him to seize the unique opportunity offered by the Jarring mission to > achieve peace. > (79 > , > 171) > > > > The error was here: "$\d+ | \d+$", which would only match (any-digit > plus a white space, whereas the number in question was followed by *no > space* and a comma. > > Know thy data! > > Examples created on BaseX. BTW, I started from known good examples in > XQuery Functions 3.1, verified that they worked and then created the search > strings. > > Hope this helps! > > Patrick > > > > > > > > > > > > > > > > On 04/23/2018 12:22 PM, Joe Wicentowski wrote: > > Hi all, > > I have encountered an unexpected challenge constructing a regex for a > pattern I am looking for. I am looking for numbers in parentheses. For > example, in the following string: > > "On February 13, 1968, Secretary of State Dean Rusk sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, > urging him to seize the unique opportunity offered by the Jarring > mission to achieve peace. (79, 171)" > > ... I would like to match "79" and "171" (but not "UAR" or "13" or > "1968"). I have been trying to construct a regex for use with > analyze-string to capture this pattern, but I have not been successful. I > have tried the following: > > analyze-string($string, "(?:$)(?:(\d+)(?:, )?)+(?:$)") > > In other words, there are these 3 components: > > 1. (?:$) a non-capturing group consisting of an open parens, followed by > 2. (?:(\d+)(?:, )?)+ one or more non-capturing groups consisting of (a > number followed by an optional, non-matching comma-and-space), followed by > 3. (?:$) a non-capturing group consisting of a close parens > > I was expecting to get the following output: > > > On February 13, 1968, Secretary of State Dean Rusk sent > a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, > urging him to seize the unique opportunity offered by the Jarring > mission to achieve peace. > (79, > 171) > > > However, the actual result is that the first number ("79") is skipped, and > only the 2nd number ("171") is captured: > > > On February 13, 1968, Secretary of State Dean Rusk sent > a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, > urging him to seize the unique opportunity offered by the Jarring > mission to achieve peace. > (79, > 171) > > > What am I missing? Can anyone suggest a regex that is able to capture > both numbers inside the parentheses? Or do I need to make a two-pass run > through this, finding parenthetical text with a first analyze-string like > "$.+$" and then looking inside its matches with a second analyze-string > like "(\d+)(?:, )?"? > > Thanks, > Joe > > > _______________________________________________talk at x-query.comhttp://x-query.com/mailman/listinfo/talk > > > -- > Patrick Durusaupatrick at durusau.net > Technical Advisory Board, OASIS (TAB) > Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 > Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) > > Another Word For It (blog): http://tm.durusau.net > Homepage: http://www.durusau.net > Twitter: patrickDurusau > > _______________________________________________ > talk at x-query.com > http://x-query.com/mailman/listinfo/talk -------------- next part -------------- An HTML attachment was scrubbed... URL: