<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DaveOnCode &#187; regex</title>
	<atom:link href="http://www.daveoncode.com/tag/regex/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.daveoncode.com</link>
	<description>Objective C, iOS and more programming stuff</description>
	<lastBuildDate>Mon, 19 Dec 2011 12:11:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Reading querystring variables using SSI and regular expressions</title>
		<link>http://www.daveoncode.com/2009/10/29/reading-querystring-variables-using-server-side-includes-and-regular-expressions/</link>
		<comments>http://www.daveoncode.com/2009/10/29/reading-querystring-variables-using-server-side-includes-and-regular-expressions/#comments</comments>
		<pubDate>Thu, 29 Oct 2009 12:09:23 +0000</pubDate>
		<dc:creator>Davide Zanotti</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[include]]></category>
		<category><![CDATA[querystring]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[ssi]]></category>
		<category><![CDATA[url]]></category>

		<guid isPermaLink="false">http://www.daveoncode.com/?p=501</guid>
		<description><![CDATA[In these days I worked on a project into which I have to rely on SSI (apache&#8217;s Server Side Includes) in order to read and use url parameters to dynamically include certain html files with &#8220;include virtual&#8221; directive. Unfortunately the documentations available online is not exhaustive, and I had to figure out some things by [...]<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://www.daveoncode.com/2009/10/29/reading-querystring-variables-using-server-side-includes-and-regular-expressions/' addthis:title='Reading querystring variables using SSI and regular expressions ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>]]></description>
			<content:encoded><![CDATA[<p>In these days I worked on a project into which I have to rely on SSI (apache&#8217;s Server Side Includes) in order to read and use url parameters to dynamically include certain html files with &#8220;include virtual&#8221; directive. Unfortunately the documentations available online is not exhaustive, and I had to figure out some things by myself.<br />
Anyway, according to the docs, there are several global variables we can use in SSI, two of these are: <strong>DOCUMENT_URI</strong> and <strong>QUERY_STRING</strong>, which are the two we can use to handle the page url. The first returns the (%-decoded) URL path of the document, the second all the string starting with &#8220;?&#8221;.<br />
So, how we can extract our desired variables from these strings, since SSI doesn&#8217;t offer method such &#8220;substring&#8221;, &#8220;split&#8221;, &#8220;indexOf&#8221; and similar? The answer is: by using Regular Expression in a tricky and ingenious way!<br />
SSI offers a basic way to implementing decision flow (<strong>if</strong>, <strong>else</strong>, <strong>elif</strong>), the <strong>if</strong> command has an attribute <strong>expr</strong> which represents a declaration to be valuated, in this attribute is also possible to use a regex to test a given pattern. By knowing this, is possible to declare an SSI variable which represents the desired querystring parameter in the following way:</p>
<div class="codecolorer-container xml mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br /></div></td><td><div class="xml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #808080; font-style: italic;">&lt;!--#if expr=&quot;$QUERY_STRING = /year=([0-9]{4})/&quot; --&gt;</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">&lt;!--ssi-comment: year found --&gt;</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">&lt;!--#set var=&quot;year&quot; value=&quot;$1&quot; --&gt;</span><br />
<span style="color: #808080; font-style: italic;">&lt;!--#else --&gt;</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">&lt;!--ssi-comment: year NOT found --&gt;</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">&lt;!--#set var=&quot;year&quot; value=&quot;$DATE_LOCAL&quot; --&gt;</span> <br />
<span style="color: #808080; font-style: italic;">&lt;!--#endif --&gt;</span></div></td></tr></tbody></table></div>
<p>In the code above I&#8217;m looking to a querystring parameter called <strong>year</strong> which must be a 4 ({4}) digit number ([0-9]).<br />
If the pattern tested returns true, the matched value (returned by the regex) will be assigned to the SSI variable <strong>year</strong>, otherwise the current server date year ($DATE_LOCAL) will be assigned.</p>
<p>Notes:<br />
1. &#8220;ssi-comment:&#8221; is not a special syntax, but just a comment style I decided to adopt to be readable and understandable.<br />
2. To get only the year from $DATE_LOCAL variable, you must config the format using &#8220;#config timefmt=&#8221;%Y&#8221;"</p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://www.daveoncode.com/2009/10/29/reading-querystring-variables-using-server-side-includes-and-regular-expressions/' addthis:title='Reading querystring variables using SSI and regular expressions ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>]]></content:encoded>
			<wfw:commentRss>http://www.daveoncode.com/2009/10/29/reading-querystring-variables-using-server-side-includes-and-regular-expressions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Find outermost top level XML/HTML tags with regular expressions</title>
		<link>http://www.daveoncode.com/2009/09/10/top-level-html-xml-tags-regex-regular-expressions/</link>
		<comments>http://www.daveoncode.com/2009/09/10/top-level-html-xml-tags-regex-regular-expressions/#comments</comments>
		<pubDate>Thu, 10 Sep 2009 14:58:31 +0000</pubDate>
		<dc:creator>Davide Zanotti</dc:creator>
				<category><![CDATA[javascript]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://www.daveoncode.com/?p=484</guid>
		<description><![CDATA[I&#8217;m working on a personal big project (which I&#8217;m going to release soon) and in this project I need to parse strings containing XHTML tags with the goal of extract the top level of a given tag name, ie. from: 123456789&#60;onetag id=&#34;t1&#34;&#62; &#160; &#160; &#60;onetag id=&#34;t1-1&#34;&#62;&#60;/onetag&#62; &#160; &#160; &#60;onetag id=&#34;t1-2&#34;&#62;&#60;/onetag&#62; &#60;/onetag&#62; &#60;onetag id=&#34;t2&#34;&#62;&#60;/onetag&#62; &#60;onetag id=&#34;t3&#34;&#62;&#60;/onetag&#62; [...]<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://www.daveoncode.com/2009/09/10/top-level-html-xml-tags-regex-regular-expressions/' addthis:title='Find outermost top level XML/HTML tags with regular expressions ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m working on a personal big project (which I&#8217;m going to release soon) and in this project I need to parse strings containing XHTML tags with the goal of extract the top level of a given tag name, ie. from:</p>
<div class="codecolorer-container html4strict mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br /></div></td><td><div class="html4strict codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t1&quot;</span>&gt;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t1-1&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t1-2&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
<span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t2&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
<span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t3&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
<span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t4&quot;</span>&gt;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t4-1&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span></div></td></tr></tbody></table></div>
<p>I have to get 4 tags (t1, t2, t3, t4 with t1 and t4 containing their child nodes).<br />
My regex knowledge is unfortunately very basic, so I googled for a ready to use regex, but none satisfied my need&#8230; all the examples I found didn&#8217;t handle properly nested tags&#8230; so, after some hours of testing I realized my own regex (my first real one), the result is the following:</p>
<div class="codecolorer-container javascript mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="javascript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #003366; font-weight: bold;">var</span> pattern <span style="color: #339933;">=</span> <span style="color: #009966; font-style: italic;">/&lt;(onetag)[^&lt;&gt;]*&gt;(&lt;\1[^&lt;&gt;]*&gt;&lt;\/\1&gt;)*&lt;\/\1&gt;/gi</span><span style="color: #339933;">;</span></div></td></tr></tbody></table></div>
<p>In my case I&#8217;m using that pattern in Javascript, but I think it can be used with any language, because it doesn&#8217;t make use of advanced features like &#8220;atomic grouping&#8221; and these kind of &#8220;black magics&#8221;. To match the desired tag you can use it by replacing &#8220;onetag&#8221; with the tag you are looking for (even a tag with a namespace like &#8220;&lt;foo:mytag&gt;&#8221;).</p>
<p>EDIT:</p>
<p>The pattern above will work only if applied to a single line string (ie: var myString = &#8220;&lt;onetag id=&#8217;t1&#8242;&gt;&#8230;&#8221;), if you use that pattern on a &#8220;complex string&#8221; (a string containing spaces and new lines) it won&#8217;t works properly. Fortunately you can remove &#8220;bad characters&#8221; before by using a simple replace:</p>
<div class="codecolorer-container javascript mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="javascript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #003366; font-weight: bold;">var</span> parsedString <span style="color: #339933;">=</span> originalString.<span style="color: #660066;">replace</span><span style="color: #009900;">&#40;</span><span style="color: #009966; font-style: italic;">/\s(?!\w)/gi</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">&quot;&quot;</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">match</span><span style="color: #009900;">&#40;</span>pattern<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></div></td></tr></tbody></table></div>
<p><strong>\s(?!\w)</strong> will match any space and new line not followed by an alphanumeric characters (in this way spaces between tag attributes won&#8217;t be removed)</p>
<p>EDIT 2:</p>
<p>The pattern <strong>/<(onetag)[^<>]*>(<\1[^<>]*><\/\1>)*<\/\1>/gi</strong> won&#8217;t works properly in presence of several type of nested tags, ie:</p>
<div class="codecolorer-container html4strict mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br /></div></td><td><div class="html4strict codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t1&quot;</span>&gt;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&lt;anothertag&gt;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t1-1&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t1-2&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span>anothertag&gt;</span><br />
<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
<span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t2&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
<span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t3&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
<span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t4&quot;</span>&gt;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&lt;anothertag&gt;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&lt;onetag <span style="color: #000066;">id</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;t4-1&quot;</span>&gt;&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span><br />
&nbsp; &nbsp; <span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span>anothertag&gt;</span><br />
<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span>onetag&gt;</span></div></td></tr></tbody></table></div>
<p>The updated pattern is the following:</p>
<div class="codecolorer-container javascript mac-classic" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br /></div></td><td><div class="javascript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #003366; font-weight: bold;">var</span> newP <span style="color: #339933;">=</span> <span style="color: #009966; font-style: italic;">/&lt;(onetag)[^&lt;&gt;]*&gt;.*?(&lt;\1[^&lt;&gt;]*&gt;.*?&lt;\/\1&gt;)*.*?&lt;\/\1&gt;/gi</span><span style="color: #339933;">;</span></div></td></tr></tbody></table></div>
<p>I hope this will works without further modifications :P</p>
<div class="addthis_toolbox addthis_default_style addthis_32x32_style" addthis:url='http://www.daveoncode.com/2009/09/10/top-level-html-xml-tags-regex-regular-expressions/' addthis:title='Find outermost top level XML/HTML tags with regular expressions ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>]]></content:encoded>
			<wfw:commentRss>http://www.daveoncode.com/2009/09/10/top-level-html-xml-tags-regex-regular-expressions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

