The Computer Language
Benchmarks Game

regex-redux Hack #4 program

source code

<?hh
/* The Computer Language Benchmarks Game
   http://benchmarksgame.alioth.debian.org/

   regex-dna program contributed by Danny Sauer
   modified by Josh Goldfoot
   modified by Sergey Khripunov
   modified by Craig Russell
   PHP as HHVM/Hack by Isaac Gouy
   converted from regex-dna program
*/

$tok = ftok(__FILE__, chr(time() & 255));
$queue = msg_get_queue($tok);

$variants = array(
    'agggtaaa|tttaccct',
    '[cgt]gggtaaa|tttaccc[acg]',
    'a[act]ggtaaa|tttacc[agt]t',
    'ag[act]gtaaa|tttac[agt]ct',
    'agg[act]taaa|ttta[agt]cct',
    'aggg[acg]aaa|ttt[cgt]ccct',
    'agggt[cgt]aa|tt[acg]accct',
    'agggta[cgt]a|t[acg]taccct',
    'agggtaa[cgt]|[acg]ttaccct',
);

// IUB replacement parallel arrays
$IUB = array(
   '/tHa[Nt]/S',
   '/aND|caN|Ha[DS]|WaS/S',
   '/a[NSt]|BY/S',
   '/<[^>]*>/S',
   '/\\|[^|][^|]*\\|/S'
);
$IUBnew = array(
   '<4>',
   '<4>',
   '<4>',
   '|',
   '-'
);

// read in file
$contents = file_get_contents('php://stdin');
$initialLength = strlen($contents);

// remove things
$contents = preg_replace('/^>.*$|\n/mS', '', $contents);
$codeLength = strlen($contents);

// do regexp counts
$messages = array_flip($variants);
$chunks = str_split($contents, ceil(strlen($contents) / 4));
$workers = $results = array();
foreach ($variants as $key => $regex){
   if($key == 0 || $key == 2 || $key == 4 || $key == 6) {
      if($pid = pcntl_fork()) $workers[] = $pid;
  }
   if($pid && $key > 7) {
      $messages[$regex] =
         preg_match_all('/' . $regex . '/iS', $contents, $discard);
   }
   else if(!$pid) {
      $results[] = $regex . ',' . 
         preg_match_all('/' . $regex . '/iS', $contents, $discard);
      if($key == 1 || $key == 3 || $key == 5 || $key == 7) {
         $results[] = strlen(preg_replace($IUB, $IUBnew, $chunks[(int)($key / 2)]));
         msg_send($queue, 2, implode(';', $results), false, false, $errno);
         exit;
	  }
   }
}

// receive and output the counts
$contentLength = 0;
foreach($workers as $worker) {
   pcntl_waitpid($worker, $status);
   msg_receive($queue, 2, $msgtype, 4096, $message, false);
   $message = explode(';', $message);
   foreach($message as $key => $line) {
      if($key == 2)
         $contentLength += $line;
      else {
         $tmp = explode(',', $line, 2);
         $messages[$tmp[0]] = $tmp[1];
      }
   }
}
foreach($messages as $regex => $count) {
   echo $regex, ' ', $count, "\n";
}

echo "\n",
      $initialLength, "\n",
      $codeLength, "\n",
      $contentLength, "\n";
    

notes, command-line, and program output

NOTES:
64-bit Ubuntu quad core
HipHop VM 3.21.0 (rel)
Compiler: 3.21.0+dfsg-2
Repo schema: 1c159cf2047dca5f4a3363b2138a33e14a1e99fa


Thu, 16 Nov 2017 00:39:27 GMT

MAKE:
/usr/bin/hh_client
No errors!

0.03s to complete and log all make actions

COMMAND LINE:
/usr/bin/hhvm  -d hhvm.hack.lang.look_for_typechecker=0 regexredux.hack-4.hack 0 < regexredux-input50000.txt

UNEXPECTED OUTPUT 

13c13
< 273943
---
> 273927

PROGRAM OUTPUT:
agggtaaa|tttaccct 3
[cgt]gggtaaa|tttaccc[acg] 12
a[act]ggtaaa|tttacc[agt]t 43
ag[act]gtaaa|tttac[agt]ct 27
agg[act]taaa|ttta[agt]cct 58
aggg[acg]aaa|ttt[cgt]ccct 16
agggt[cgt]aa|tt[acg]accct 15
agggta[cgt]a|t[acg]taccct 18
agggtaa[cgt]|[acg]ttaccct 20

508411
500000
273943