PHP Classes

Fatal error

Recommend this page to a friend!

      PHP Search Large Files  >  All threads  >  Fatal error  >  (Un) Subscribe thread alerts  
Subject:Fatal error
Summary:SearchableFile.phpclass Uncaught exception
Messages:10
Author:afstarconn
Date:2016-04-21 12:45:20
 

  1. Fatal error   Reply   Report abuse  
Picture of afstarconn afstarconn - 2016-04-21 12:45:20
Hi, I posted this issue in you git-hub repo, also copied here...

I think this class would be very usefull to my needs.
I started trying preg_match.php, with a 25Mb csv file, and got this error:

Fatal error:
Uncaught exception 'RuntimeException' with message 'Invalid modifier 'o' in regular expression.' in SearchableFile.phpclass:941
Stack trace:
#0 SearchableFile.phpclass(724): SearchableFile->transform_regex('52442735-365670...')
#1 preg_match.php(18): SearchableFile->pcre_match('52442735-365670...', NULL, 256, 0)
#2 {main} thrown in SearchableFile.phpclass on line 941

Then I found that,the fatal error changes the line, depending on the search string
E.g.
$re = "52442735-3656706299-816328" shows fatal error on line 941.
$re = "52442735" shows fatal error on line 732.

Any ideas?
Thank you!

  2. Re: Fatal error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-04-21 13:36:57 - In reply to message 1 from afstarconn
Hello,

as far as I can see, your regular expressions do not include the delimiters so, instead of writing :

$re = "52442735-3656706299-816328" shows fatal error on line 941.
$re = "52442735" shows fatal error on line 732

you should write :

$re = "/52442735-3656706299-816328/"
$re = "/52442735/"

The 1st one failed on line 941, because this is the place where I perform some basic checkings and transformations on the regex, and it found that it had no delimiters (or mismatching ones, the '5' and the '8').

The 2nd one failed on line 741 because this is the place where I execute the pcregrep command, and the command returned an error, because the leading and trailing "5" characters have been mistakenly taken as valid delimiters by my class, but not by the pcregrep command.

However, if you want to search for simple strings, I suggest you use the strpos() or stripos() methods available in this class.

Anyway, I will add one additional checking in the transform_regex() method (near line 941...) to throw an exception if the starting character is alphanumeric (this is forbidden as a delimiter in regular expressions)

The update should be available by tomorrow. Meanwhile, you can still do some tests for yourself but ensure that your regular expressions have delimiters in it.

Please feel free to contact me if you have any problem or question,
Christian.

  3. Re: Fatal error   Reply   Report abuse  
Picture of afstarconn afstarconn - 2016-04-21 13:49:42 - In reply to message 2 from Christian Vigh
Christian, thanks for response.

Now I sorrounded the string like this..
$re = "/52442735-3656706299-816328/";

And get this error...

Fatal error:
Uncaught exception 'RuntimeException' with message 'An error occurred during the execution of the pcregrep command : ' in SearchableFile.phpclass:732

Stack trace:
#0 preg_match.php(25): SearchableFile->pcre_match('/52442735-36567...', NULL, 256, 0)
#1 {main} thrown in SearchableFile.phpclass on line 732

  4. Re: Fatal error   Reply   Report abuse  
Picture of afstarconn afstarconn - 2016-04-21 13:53:35 - In reply to message 3 from afstarconn
By the way, I started testing the preg_match function, because I will need complex preg_match searchs in the future on many big csv files at once.

  5. Re: Fatal error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-04-21 17:48:28 - In reply to message 3 from afstarconn
This is a perfectly normal message. This means that you don't have the pcregrep command on your system. The README.md file explains that, here is a summary :

- If you're running debian, you need to execute the following command :

apt-get install pcregrep

- If you're running Centos, I suppose that you should have to do :

yum install pcregrep

- If you're running Windows, you have no other way than installing Cygwin (http://www.cygwin.org/). I'm personnally running cygwin and pcregrep is installed (during the cygwin installation, I selected ALL the packages NOT related to X-Windows and graphical environment).

After this setup process, everything should run smoothly.

I initially planned to use the grep command instead of pcregrep, but as far as I understood, it only supports posix and ereg regular expressions and that would have been some kind of nightmare trying to transform pcre expressions iinto either posix or ereg style. But I'm still thinking about it.

I'm also thinking on implementing some kind of regex matching using php builtin functions, but that's not trivial. preg_* functions only accept strings as input ; since the searched file data comes by blocks of 8Kb by default, how many blocks should be collected before trying a preg_match ? and what to do if the captured regex spans two, three, or more blocks ? those questions I don't have an answer yet, except by imposing limitations that preg_match does not have (maybe something like : ok, I accept the fact that the captured expression will not exceed x kilobytes ; if captures bigger than that exist in my file, I accept the fact that they won't be captured).

The other alternative would be to develop a PHP extension, were the preg_* functions could read from a stream, but this would require some considerable amount of time for me, just to develop the extension then package it for various os. And this would require the end user to install on his server anyway.

Please let me know if you were able to install the pcregrep command successfully.

Christian.

  6. Re: Fatal error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-04-21 17:55:23 - In reply to message 4 from afstarconn
Yes, I have had the same issue : searching through very big files, hence the creation of the SearchableFile class. I tested it on files greater than 2Gb.

I have been able to compare the performances of the various search functions offered by SearchableFile with that of PHP builtin functions, on files up to 800Mb (this was the approximate limit of files I could load into memory on the two systems I used).

If you have a good IO subsystem (like my system #1), then you can expect a small overhead of around 5%, when compared with the in-memory version.

For standard IO subsytems (like my system #2), expect the elapsed time to be multipled by two.


  7. Re: Fatal error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-04-21 18:09:18 - In reply to message 4 from afstarconn
By the way, I updated the class to check for invalid regular expressions that start with an alphanumeric character. You can download it again !

  8. Re: Fatal error   Reply   Report abuse  
Picture of afstarconn afstarconn - 2016-04-21 18:10:44 - In reply to message 6 from Christian Vigh
It worked!!

Elapsed (SearchableFile) : 0.084 (count = 1)
Elapsed (preg_match) : 0.06 (count = 1)

How do you think about those results? (0.084 vs 0.06)

I'm interested in extend your class utility for search in csv files. I mean, search by columns, etc, what do you think? Can you tell me your email? (I dont know if is correct to publish emails here)


  9. Re: Fatal error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-04-21 18:35:05 - In reply to message 8 from afstarconn
Very good ! (sorry, I'm alway surprised when I create software that it works fine :-) ...)

Regarding the results, I could say two things :
1) The SearchableFile class performs well on your system, so you must have a good IO subsystem
2) You have to run several times the same benchmark, just to figure out the overhead implied by the system itself.
3) Keep in mind the order in which the benchmark is performed : SearchableFile is called first, so it has the penalty of being the first to access your file. preg_match then operates faster, because file contents are already in the operating system cache. I should have put a dummy file_get_contents() before starting anything in my example script, so that SearchableFile and preg_match start with somewhat equal chances. Anyway, you can run the script several times, to ensure that file data is already in OS cache and check the results.
4) I cannot go pretty much farther on the subject, because a few tenths of milliseconds of elapsed time may include a significant part of system (not user) overhead, especially for SearchableFile which had to pay for the burden of loading file contents for the first time. I tend to be more confident on execution times that exceed several seconds, because operating system overhead will take a smaller percentage on the whole execution (your results show that there is a 40% overhead for the SearchableFile class when compared to the preg_match function, so 24 milliseconds for a 25Mb file, but what will be the overhead for a 250Mb file ? I bet it will be significantly smaller in percentage).

Well, this is all I can say for now regarding the result !

Concerning your last point :

" I'm interested in extend your class utility for search in csv files. I mean, search by columns, etc, what do you think? Can you tell me your email? (I dont know if is correct to publish emails here)"

My email is visible in my profile so I think it's ok ; in any case, there is a moderator, Manuel Lemos, which will explain the correct way to do so.

Meanwhile, you can reach me at the following address :

christian.vigh@orange.fr

Christian.

  10. Re: Fatal error   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-04-21 18:37:00 - In reply to message 8 from afstarconn
I forgot to answer one point : yes, extending this class to CSV files would be a really good idea so feel free to contact me if you want to discuss about that.