Is there A RegEx Length Limit?

Topics: User Forum
Jun 25, 2010 at 8:10 PM
Edited Jun 28, 2010 at 3:31 PM
I seem to be having a problem with a RegEx provided to me by users of Apache's mod_rewrite...
I formated it to the best of knowledge for IIRF but the string is so long that it wraps in notepad.
This is seeming to cause an error with IIRF.
In the log file I get the following:

Fri Jun 25 14:45:19 - 10008 - ReadSiteConfig: line 8: RewriteCond %{HTTP_USER_AGENT} ^(1207|6310|6590|3gso|4thp|50[1-6]i|770s|802s|a\020wa|abac|ac(er|oo|s\-)|ai(ko|rn)|al(av|ca|co)|amoi|an(ex|ny|yw)|aptu|ar(ch|go)|as(te|us)|attw|au(di|\-m|r\020|s\020)|avan|be(ck|ll|nq)|bi(lb|rd)|bl(ac|az)|br(e|v)w|bumb|bw\-(n|u)|c55\/|capi|ccwa|cdm\-|cell|chtm|cldc|cmd\-|co(mp|nd)|craw|da(it|ll|ng)|dbte|dc\-s|devi|dica|dmob|do(c|p)o|ds(12|\-d)|el(49|ai)|em(l2|ul)|er(ic|k0)|esl8|ez([4-7]0|os|wa|ze)|fetc|fly(\-|_)|g1\020u|g560|gene|gf\-5|g\-mo|go(\.w|od)|gr(ad|un)|haie|hcit|hd\-(m|p|t)|hei\-|hi(pt|ta)|hp(\020i|ip)|hs\-c|ht(c(\-|\020|_|a|g|p|s|t)|tp)|hu(aw|tc)|i\-(20|go|ma)|i230|iac(\020|\-|\/)|ibro|idea|ig01|ikom|im1k|inno|ipaq|iris|ja(t|v)a|jbro|jemu|jigs|kddi|keji|kgt(\020|\/)|klon|kpt\020|kwc\-|kyo(c|k)|le(no|xi)|lg(\020g|\/(k|l|u)|50|54|e\-|e\/|\-[a-w])|libw|lynx|m1\-w|m3ga|m50\/|ma(te|ui|xo)|mc(01|21|ca)|m\-cr|me(di|rc|ri)|mi(o8|oa|ts)|mmef|mo(01|02|bi|de|do|t(\-|\020|o|v)|zz)|mt(50|p1|v\020)|mwbp|mywa|n10[0-2]|n20[2-3]|n30(0|2)|n50(0|2|5)|n7(0(0|1)|10)|ne((c|m)\-|on|tf|wf|w

Fri Jun 25 14:45:19 - 10008 - ReadSiteConfig: ERROR: compilation of RewriteCond expression '^(1207|6310|6590|3gso|4thp|50[1-6]i|770s|802s|a\020wa|abac|ac(er|oo|s\-)|ai(ko|rn)|al(av|ca|co)|amoi|an(ex|ny|yw)|aptu|ar(ch|go)|as(te|us)|attw|au(di|\-m|r\020|s\020)|avan|be(ck|ll|nq)|bi(lb|rd)|bl(ac|az)|br(e|v)w|bumb|bw\-(n|u)|c55\/|capi|ccwa|cdm\-|cell|chtm|cldc|cmd\-|co(mp|nd)|craw|da(it|ll|ng)|dbte|dc\-s|devi|dica|dmob|do(c|p)o|ds(12|\-d)|el(49|ai)|em(l2|ul)|er(ic|k0)|esl8|ez([4-7]0|os|wa|ze)|fetc|fly(\-|_)|g1\020u|g560|gene|gf\-5|g\-mo|go(\.w|od)|gr(ad|un)|haie|hcit|hd\-(m|p|t)|hei\-|hi(pt|ta)|hp(\020i|ip)|hs\-c|ht(c(\-|\020|_|a|g|p|s|t)|tp)|hu(aw|tc)|i\-(20|go|ma)|i230|iac(\020|\-|\/)|ibro|idea|ig01|ikom|im1k|inno|ipaq|iris|ja(t|v)a|jbro|jemu|jigs|kddi|keji|kgt(\020|\/)|klon|kpt\020|kwc\-|kyo(c|k)|le(no|xi)|lg(\020g|\/(k|l|u)|50|54|e\-|e\/|\-[a-w])|libw|lynx|m1\-w|m3ga|m50\/|ma(te|ui|xo)|mc(01|21|ca)|m\-cr|me(di|rc|ri)|mi(o8|oa|ts)|mmef|mo(01|02|bi|de|do|t(\-|\020|o|v)|zz)|mt(50|p1|v\020)|mwbp|mywa|n10[0-2]|n20[2-3]|n30(0|2)|n50(0|2|5)|n7(0(0|1)|10)|ne((c|m)\-|on|tf|wf|w' failed at offset 992: missing )

Fri Jun 25 14:45:19 - 10008 - ReadSiteConfig: WARNING: line 9: unrecognized directive, ignoring it: 'g|wt)|nok(6|i)|nzph|o2im|op(ti|wv)|oran|owg1|p800|pan(a|d|t)|pdxg|pg(13|\-([1-8]|c))|phil|pire|pl(ay|uc)|pn\-2|po(ck|rt|se)|prox|psio|pt\-g|qa\-a|qc(07|12|21|32|60|\-[2-7]|i\-)|qtek|r380|r600|raks|rim9|ro(ve|zo)|s55\/|sa(ge|ma|mm|ms|ny|va)|sc(01|h\-|oo|p\-)|sdk\/|se(c(\-|0|1)|47|mc|nd|ri)|sgh\-|shar|sie(\-|m)|sk\-0|sl(45|id)|sm(al|ar|b3|it|t5)|so(ft|ny)|sp(01|h\-|v\-|v\020)|sy(01|mb)|t2(18|50)|t6(00|10|18)|ta(gt|lk)|tcl\-|tdg\-|tel(i|m)|tim\-|t\-mo|to(pl|sh)|ts(70|m\-|m3|m5)|tx\-9|up(\.b|g1|si)|utst|v400|v750|veri|vi(rg|te)|vk(40|5[0-3]|\-v)|vm40|voda|vulc|vx(52|53|60|61|70|80|81|83|85|98)|w3c(\-|\020)|webc|whit|wi(g\020|nc|nw)|wmlb|wonu|x700|xda(\-|2|g)|yas\-|your|zeto|zte\-)'

This is what line 8 is in my iirf.ini file:

RewriteCond %{HTTP_USER_AGENT} ^(1207|6310|6590|3gso|4thp|50[1-6]i|770s|802s|a\020wa|abac|ac(er|oo|s\-)|ai(ko|rn)|al(av|ca|co)|amoi|an(ex|ny|yw)|aptu|ar(ch|go)|as(te|us)|attw|au(di|\-m|r\020|s\020)|avan|be(ck|ll|nq)|bi(lb|rd)|bl(ac|az)|br(e|v)w|bumb|bw\-(n|u)|c55\/|capi|ccwa|cdm\-|cell|chtm|cldc|cmd\-|co(mp|nd)|craw|da(it|ll|ng)|dbte|dc\-s|devi|dica|dmob|do(c|p)o|ds(12|\-d)|el(49|ai)|em(l2|ul)|er(ic|k0)|esl8|ez([4-7]0|os|wa|ze)|fetc|fly(\-|_)|g1\020u|g560|gene|gf\-5|g\-mo|go(\.w|od)|gr(ad|un)|haie|hcit|hd\-(m|p|t)|hei\-|hi(pt|ta)|hp(\020i|ip)|hs\-c|ht(c(\-|\020|_|a|g|p|s|t)|tp)|hu(aw|tc)|i\-(20|go|ma)|i230|iac(\020|\-|\/)|ibro|idea|ig01|ikom|im1k|inno|ipaq|iris|ja(t|v)a|jbro|jemu|jigs|kddi|keji|kgt(\020|\/)|klon|kpt\020|kwc\-|kyo(c|k)|le(no|xi)|lg(\020g|\/(k|l|u)|50|54|e\-|e\/|\-[a-w])|libw|lynx|m1\-w|m3ga|m50\/|ma(te|ui|xo)|mc(01|21|ca)|m\-cr|me(di|rc|ri)|mi(o8|oa|ts)|mmef|mo(01|02|bi|de|do|t(\-|\020|o|v)|zz)|mt(50|p1|v\020)|mwbp|mywa|n10[0-2]|n20[2-3]|n30(0|2)|n50(0|2|5)|n7(0(0|1)|10)|ne((c|m)\-|on|tf|wf|wg|wt)|nok(6|i)|nzph|o2im|op(ti|wv)|oran|owg1|p800|pan(a|d|t)|pdxg|pg(13|\-([1-8]|c))|phil|pire|pl(ay|uc)|pn\-2|po(ck|rt|se)|prox|psio|pt\-g|qa\-a|qc(07|12|21|32|60|\-[2-7]|i\-)|qtek|r380|r600|raks|rim9|ro(ve|zo)|s55\/|sa(ge|ma|mm|ms|ny|va)|sc(01|h\-|oo|p\-)|sdk\/|se(c(\-|0|1)|47|mc|nd|ri)|sgh\-|shar|sie(\-|m)|sk\-0|sl(45|id)|sm(al|ar|b3|it|t5)|so(ft|ny)|sp(01|h\-|v\-|v\020)|sy(01|mb)|t2(18|50)|t6(00|10|18)|ta(gt|lk)|tcl\-|tdg\-|tel(i|m)|tim\-|t\-mo|to(pl|sh)|ts(70|m\-|m3|m5)|tx\-9|up(\.b|g1|si)|utst|v400|v750|veri|vi(rg|te)|vk(40|5[0-3]|\-v)|vm40|voda|vulc|vx(52|53|60|61|70|80|81|83|85|98)|w3c(\-|\020)|webc|whit|wi(g\020|nc|nw)|wmlb|wonu|x700|xda(\-|2|g)|yas\-|your|zeto|zte\-) [I]

So, is there a limit on the length of the RegEx or is there some way around this?
Could I just manually put a line break in there and will it be correctly picked up by IIRF?

Thanks!

~Mike
Jun 28, 2010 at 4:51 PM
Edited Jun 28, 2010 at 4:52 PM
So it seems that my RewriteCond line can not be longer then 1024 total characters?
Currently that line is 1713 total characters.

Is there a reason for this?
Is there a way to break this RegEx up so that I can correctly process it as was given to me?
Is there any possible way around this?

I tried the line break and that still threw errors...
Anyone have any ideas?

Thanks!

~Mike
Coordinator
Jun 28, 2010 at 7:45 PM

Hey Mike, 

Yes, there's a limit of 1024 characters for a line in the configuration file - that sounds right to me. I just checked the documentation, and it seems that limit is not explicitly noted. I'll need to add that to the documentation.  

As for the reason for a limit of 1024 characters - It's an arbitrary number, but I thought 1k of characters on a single line ought to be enough.

You cannot simply insert a newline to get around that limit.  In IIRF ini files, newlines are significant.

What you can do:  Use multiple RewriteCond statements, joined by a logical OR.     For example, this set of lines: 

RewriteCond ${USER_AGENT} (wmlb|wonu|x700|xda(\-|2|g))   [I,OR]
RewriteCond ${USER_AGENT} (yas\-|your)                   [I,OR]
RewriteCond ${USER_AGENT} (zeto|zte\-)                   [I]

is equivalent to this single line: 

RewriteCond ${USER_AGENT} (wmlb|wonu|x700|xda(\-|2|g)|yas\-|your|zeto|zte\-)  [I]

Using that principle, you can split a pattern that is greater than 1024 characters into multiple smaller patterns.

Coordinator
Jun 28, 2010 at 7:48 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.