The Computer Language
Benchmarks Game

regex-redux Matz's Interpreter #6 program

source code

# The Computer Language Benchmarks Game
# http://benchmarksgame.alioth.debian.org
#
# regex-dna program contributed by jose fco. gonzalez
# optimized & parallelized by Rick Branson
# optimized for ruby2 by Aaron Tavistock
# converted from regex-dna program

require 'fiber'

seq = $stdin.read.force_encoding("ASCII-8BIT")
origin_len = seq.size

seq.gsub!(/>.*\n|\n/,'')
clean_len = seq.size

matchers = [
  'agggtaaa|tttaccct',
  '[cgt]gggtaaa|tttaccc[acg]',
  'a[act]ggtaaa|tttacc[agt]t',
  'ag[act]gtaaa|tttac[agt]ct',
  'agg[act]taaa|ttta[agt]cct',
  'aggg[acg]aaa|ttt[cgt]ccct',
  'agggt[cgt]aa|tt[acg]accct',
  'agggta[cgt]a|t[acg]taccct',
  'agggtaa[cgt]|[acg]ttaccct'
]

results = matchers.map do |matcher|
  Fiber.new do
    count = seq.scan( Regexp.new(matcher) ).size
    Fiber.yield "#{matcher} #{count}"
  end.resume
end

replacements = {
  /tHa[Nt]/ => '<4>',
  /aND|caN|Ha[DS]|WaS/ => '<3>',
  /a[NSt]|BY/ => '<2>',
  /<[^>]*>/ => '|',
  /\|[^|][^|]*\|/ => '-'
}
seq.gsub!(Regexp.new(replacements.keys.join('|')), replacements)

puts "#{results.join("\n")}\n\n#{origin_len}\n#{clean_len}\n#{seq.size}"
    

notes, command-line, and program output

NOTES:
64-bit Ubuntu quad core
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]


Mon, 27 Nov 2017 16:46:41 GMT

COMMAND LINE:
/usr/bin/ruby regexredux.mri-6.mri 0 < regexredux-input50000.txt

PROGRAM FAILED 


PROGRAM OUTPUT:

regexredux.mri-6.mri:9:in `require': no such file to load -- fiber (LoadError)
	from regexredux.mri-6.mri:9