tauZaman | UserManual / XMLSpecificationDiscussion

UserManual.XMLSpecificationDiscussion History

Hide minor edits - Show changes to output

June 02, 2009, at 02:53 PM by 72.200.112.238 -

Deleted lines 0-86:

!!!! Parsing Temporal Constants (Related to Properties)

One way of parsing the input is to tokenize it with whitespace (according to the value of "whitespace" attribute in <format>) and special characters
(only the ones defined in <format>). Then, these tokens will be tested against FVTs and testing semantic will be based on maximally successful tests.
So, if a token is checked against a FVT successfully, then concatenation of this token and subsequent token will be checked against the same FVT until a failure returns.

A more elegant way of achieving the same thing would be using regular expressions for inputs. We might have two levels of determining regular
expressions;

According to this approach each field values represented in a property value will have a regular expression, which either comes from related field value table in the same field value element or calendric system specification file as a default.

So, when using a property's value, first related field value tables will be checked to get regular expression for related field value (which also
corresponds to a variable name in format elements), if exists this regex will be used to parse temporal input (based on maximally successful test
semantic.). If there is no regex specified in field value table, then calendric system specification file will be referenced for a default one.

When UCS is asked to parse a temporal constant to a time stamp with a given property, parsing will be done in character based approach as opposed to using any kind of xml document parsing (DOM or SAX).

Character based parsing:

Given a property and temporal constant, we first determine regexs for variables in format element (as described above) and replace them. Then we will form regexs for other parts of format element by taking value of attribute "whitespace" and special characters in the format into account. Each time replacing format parts with these regexs. And lastly we will apply these formed regexs to temporal input and get the values to variable if they match.
For example, given input and corresponding property value;

* September, 2000

<value>
<format whitespace = "no">$month, $year</format>
<fieldValue variable="month" name="month of year"
using ="englishMonthNames"/>
<fieldValue variable="year" name="year" using="arabic numeral"/>
</value>

We first try to get the regexs for variables using field value tables referenced in fieldValue elements by "using" attributes. Assume no regexs found, we will use the default one (let's say [^,\s]+). Then two of our regexs are determined;

And then we go throught the "content" of format element and since "whitespace" attribute's value is "no" and there is a special character of ",", we will form our last regex, which comes between two previously formed regexs.

regexForMonth = [^,\s]+
regexForDelimeters = , (there are 4 whitespaces after ,)
regexForYear = [^,\s]+

And when we apply these regexs in order to temporal input, we will get the values of variables. (Same with xml like inputs.)

And another nice thing is that, depending on which operation costs more (checking a token for a value in FVT, or checking a regex against the input), we can first concatenate all the regexs that we generated and apply them on the input at once, and get a result beforehand.

For now this style of parsing is enough, but inherently lacks some issues. Attribute order matters. However, when we form a regex given an xml format, we might form it in such a way that attribute order will not be important, this is viable and not that complex, but not clear too.

On the other hand, if we parse an xml temporal constant each time by parsing it into a DOM object, it will be rather inefficient in the order of number of temporal inputs.

Note: Java 2 Platform, Standard Edition, version 1.4 contains a new package with name java.util.regex. And it seems to be easier and better than stringTokenizer and breakIterator packages.

Actually, example that is mentioned above and some others were run by using that package in Windows Environment. There are very nice properties of that package, like styles of greedy, reluctant and possessive parsing, however, we should not increase the complexity
of forming regexs for field values in field value tables.

Please see a [[Developers.DesignParsing | detailed explanation ]] of how parsing is done.

[[#AnotherDiscussionOnProperties]]
!!!! Another Discussion on Properties

Efficiency of push/pop operations. Pushing a whole <value> element is very inefficient and is a concern if only a <format> is pushed. Same with the <fieldValue> elements.
And we note that <format> and <fieldValue> elements are somewhat orthogonal to each other, except that their variable names should
match. Variable names can be anything (meaningful names). But usually they are formed by granularity names and some other notions that are related to granularity processing (e.g. distribution, lower, beginning), which means we might form a set of generic names of variables, and additionally we may provide a supplementary variable set to satisfy other needs. Having a predefined set could also be helpful for a user who wants to create a property (format or fieldValue or both) via a propertyGenerator Wizard (with GUI), which offers these variable names and lets the user choose and his/her format in a nice and generic way.

Although they are orthogonal (format and fieldValues) it would be too low level to push them separately. However, pushing is done by using urls easily.
Another Idea (Related to above):
Pushing a new single format to a existing property will override the previous ones. But in case it fails, others will be checked, too. (This design is different than property management of previous projects.)

pushFormat("nameOfProperty", "<format>..</format>"), here content of <format> consists of generic variable names.
pushFieldValue("nameOfProperty", "<fieldValue.../>"),
pushProperty("nameOfProperty", "<value>...</value>")

And check process may be done along with push operation to verify generic variable names, or if eventually an error comes up because of variable name matching problem,then an appropriate error will be returned.

In addition to above operations, we have pushProperty(url) as a main operation. However, it seems low level to include these above operations, it would be handy to have them.

[[#DescriptionElements]]
!!!! Description Elements in Specification Files

We may add an optional description element into each specification file discussed.

<description version = "..." >
<author firstName = "..." middleName = "..." lastName = "..." />
<company name = "" />
<pgpKey>...</pgpKey>
<previousVersion url = "..." />
<licence>...</licence>
</description>

Restore

April 14, 2009, at 08:12 PM by Chris Goo -

Added line 55:

Added line 59:

Restore

April 14, 2009, at 08:11 PM by Chris Goo -

Deleted lines 54-55:

~~!!!! Another Discussion on Properties~~

Changed lines 56-57 from:

to:

!!!! Another Discussion on Properties

Restore

April 14, 2009, at 08:11 PM by Chris Goo -

Changed line 53 from:

Please see a ~~detailed explanation~~ of how parsing is done.

to:

Please see a [[Developers.DesignParsing | detailed explanation ]] of how parsing is done.

Changed line 55 from:

~~[#anotherDiscussionOnProperties]~~

to:

Changed lines 57-58 from:

to:

[[#AnotherDiscussionOnProperties]]

Restore

April 14, 2009, at 08:07 PM by Chris Goo -

Changed line 25 from:

<value>

to:

<value>

Changed lines 27-28 from:

to:

Changed lines 30-31 from:

</value>

to:

</value>

Deleted line 79:

Deleted line 81:

Deleted line 82:

Deleted line 83:

Restore

April 14, 2009, at 08:03 PM by Chris Goo -

Added lines 1-89:

!!!! Parsing Temporal Constants (Related to Properties)

One way of parsing the input is to tokenize it with whitespace (according to the value of "whitespace" attribute in <format>) and special characters
(only the ones defined in <format>). Then, these tokens will be tested against FVTs and testing semantic will be based on maximally successful tests.
So, if a token is checked against a FVT successfully, then concatenation of this token and subsequent token will be checked against the same FVT until a failure returns.

A more elegant way of achieving the same thing would be using regular expressions for inputs. We might have two levels of determining regular
expressions;

According to this approach each field values represented in a property value will have a regular expression, which either comes from related field value table in the same field value element or calendric system specification file as a default.

So, when using a property's value, first related field value tables will be checked to get regular expression for related field value (which also
corresponds to a variable name in format elements), if exists this regex will be used to parse temporal input (based on maximally successful test
semantic.). If there is no regex specified in field value table, then calendric system specification file will be referenced for a default one.

When UCS is asked to parse a temporal constant to a time stamp with a given property, parsing will be done in character based approach as opposed to using any kind of xml document parsing (DOM or SAX).

Character based parsing:

Given a property and temporal constant, we first determine regexs for variables in format element (as described above) and replace them. Then we will form regexs for other parts of format element by taking value of attribute "whitespace" and special characters in the format into account. Each time replacing format parts with these regexs. And lastly we will apply these formed regexs to temporal input and get the values to variable if they match.
For example, given input and corresponding property value;

* September, 2000

<value>
<format whitespace = "no">$month, $year</format>

<fieldValue variable="month" name="month of year" using ="englishMonthNames"/>
<fieldValue variable="year" name="year" using="arabic numeral"/>
</value>

We first try to get the regexs for variables using field value tables referenced in fieldValue elements by "using" attributes. Assume no regexs found, we will use the default one (let's say [^,\s]+). Then two of our regexs are determined;

And then we go throught the "content" of format element and since "whitespace" attribute's value is "no" and there is a special character of ",", we will form our last regex, which comes between two previously formed regexs.

regexForMonth = [^,\s]+
regexForDelimeters = , (there are 4 whitespaces after ,)
regexForYear = [^,\s]+

And when we apply these regexs in order to temporal input, we will get the values of variables. (Same with xml like inputs.)

And another nice thing is that, depending on which operation costs more (checking a token for a value in FVT, or checking a regex against the input), we can first concatenate all the regexs that we generated and apply them on the input at once, and get a result beforehand.

For now this style of parsing is enough, but inherently lacks some issues. Attribute order matters. However, when we form a regex given an xml format, we might form it in such a way that attribute order will not be important, this is viable and not that complex, but not clear too.

On the other hand, if we parse an xml temporal constant each time by parsing it into a DOM object, it will be rather inefficient in the order of number of temporal inputs.

Note: Java 2 Platform, Standard Edition, version 1.4 contains a new package with name java.util.regex. And it seems to be easier and better than stringTokenizer and breakIterator packages.

Actually, example that is mentioned above and some others were run by using that package in Windows Environment. There are very nice properties of that package, like styles of greedy, reluctant and possessive parsing, however, we should not increase the complexity
of forming regexs for field values in field value tables.

Please see a detailed explanation of how parsing is done.

[#anotherDiscussionOnProperties]
!!!! Another Discussion on Properties

Efficiency of push/pop operations. Pushing a whole <value> element is very inefficient and is a concern if only a <format> is pushed. Same with the <fieldValue> elements.
And we note that <format> and <fieldValue> elements are somewhat orthogonal to each other, except that their variable names should
match. Variable names can be anything (meaningful names). But usually they are formed by granularity names and some other notions that are related to granularity processing (e.g. distribution, lower, beginning), which means we might form a set of generic names of variables, and additionally we may provide a supplementary variable set to satisfy other needs. Having a predefined set could also be helpful for a user who wants to create a property (format or fieldValue or both) via a propertyGenerator Wizard (with GUI), which offers these variable names and lets the user choose and his/her format in a nice and generic way.

Although they are orthogonal (format and fieldValues) it would be too low level to push them separately. However, pushing is done by using urls easily.
Another Idea (Related to above):
Pushing a new single format to a existing property will override the previous ones. But in case it fails, others will be checked, too. (This design is different than property management of previous projects.)

pushFormat("nameOfProperty", "<format>..</format>"), here content of <format> consists of generic variable names.
pushFieldValue("nameOfProperty", "<fieldValue.../>"),
pushProperty("nameOfProperty", "<value>...</value>")

And check process may be done along with push operation to verify generic variable names, or if eventually an error comes up because of variable name matching problem,then an appropriate error will be returned.

In addition to above operations, we have pushProperty(url) as a main operation. However, it seems low level to include these above operations, it would be handy to have them.

[[#DescriptionElements]]
!!!! Description Elements in Specification Files

We may add an optional description element into each specification file discussed.

<description version = "..." >

<author firstName = "..." middleName = "..." lastName = "..." />
<company name = "" />

<pgpKey>...</pgpKey>

<previousVersion url = "..." />

<licence>...</licence>
</description>

Restore