module Buryspam::Stats

Module to calculate and display stats regarding the operation of the filter since the filter was last initialized. Stats regarding the number of spam messages received, spam frequency/period and accuracy rates are displayed.

Constants

BURYSPAM_HDR
FROM_HDR
SA_SPAM_HDR
SECS_PER_DAY
SECS_PER_HOUR
SECS_PER_MINUTE

Public Class Methods

show() click to toggle source

Display the statistics regarding the performance of the filter.

# File buryspam.rb, line 4009
    def show
      get_stats

      if @total_spams.zero?
        puts "No spams received since reinitialization."
        return
      end

      unless @unprocessed_from.empty?
        print "Messages not processed by buryspam:\n "
        print @unprocessed_from.join(" ")
        puts ""
      end

      secs = (Time.now - @inittime)
      days = secs / SECS_PER_DAY

      spam_freq = "%.1f spam%s/day".pluralize(@total_spams / days)
      spam_period = time_units(secs / @total_spams).to_s + "/spam"

      misses_per_day, accuracy = rate_accuracy(@misses, days)
      false_neg     = "%-6s" % @misses
      false_neg_day = "%-6s" % misses_per_day
      accuracy      = "%-6s" % accuracy
      sas = ":     "

      if @sa_installed
        sas = " (SA):"
        sa_misses_per_day, sa_accuracy = rate_accuracy(@sa_misses, days)
        false_neg     << " (%s)" % @sa_misses
        false_neg_day << " (%s)" % sa_misses_per_day
        accuracy      << " (%s)" % sa_accuracy
      end

      print "Last reinitialized:   #{@inittime.strftime("%F %T")} (#{time_units(secs)} ago)
Unprocessed messages: #{@unprocessed}
Total spams:          #{@total_spams}
Spam frequency:       #{spam_freq}
Spam period:          #{spam_period}
False neg.#{sas}      #{false_neg}
False neg./day#{sas}  #{false_neg_day}
Accuracy#{sas}        #{accuracy}

"
    end

Private Class Methods

file_stats(file) click to toggle source

Scan the given mbox file and count the total number of spam messages, missed spam messages and unprocessed messages in the mbox. Used exclusively by the ::get_stats method.

# File buryspam.rb, line 4093
def file_stats(file)
  return unless Mbox.is_valid?(file) && File.mtime(file) > @inittime
  mbox = IO.binread(file)
  mbox.scan(FROM_HDR) { |from, hdr|
    msgtime = Message.extract_time(from)
    next if msgtime < @inittime
    if BURYSPAM_HDR.match(hdr)
      @total_spams += 1
      @misses += 1 if $1 == "No"
      if SA_SPAM_HDR.match(hdr)
        @sa_installed = true
      else
        @sa_misses += 1
      end
    else
      @unprocessed += 1
      @unprocessed_from << from
    end
  }
end
get_stats() click to toggle source

Make a count of all the spam messages and false negatives that have occurred since the last initialization. Give an estimated comparison with the previous version of buryspam.

# File buryspam.rb, line 4071
def get_stats
  # It's a bit expensive to load in the entire bayesian database
  # just to get access to the initialization timestamp...
  db = Hashbase.load(Config.word_file)
  raise "Cannot load '#{Config.word_file}'" if db.nil?
  raise "No timestamp in '#{Config.word_file}'" if db[:timestamp].nil?

  @inittime = db[:timestamp]
  @total_spams = @misses = @sa_misses = @unprocessed = 0
  @sa_installed = false
  @unprocessed_from = []
  Config.bad_dirs.each { |dir|
    Dir[File.join(dir, '*')].each { |f|
      file_stats(f)
    }
  }
  file_stats(Config.missed_spam_file)
end
rate_accuracy(misses, days) click to toggle source

Given the number of missed spam messages and the number of days since last (re)initialization of the filter, return the average number of misses per day and the accuracy of the filter with respect to the total number of spams received.

# File buryspam.rb, line 4062
def rate_accuracy(misses, days)
  misses_per_day = "%.1f" % (misses / days)
  accuracy = "%.2f%%" % (100 - misses / @total_spams.to_f * 100)
  return misses_per_day, accuracy
end
time_units(secs) click to toggle source

Convert the given seconds to a more appropriate time unit string (minutes, hours, days).

# File buryspam.rb, line 4116
def time_units(secs)
  days  = "%.1f" % (secs / SECS_PER_DAY)
  hours = "%.1f" % (secs / SECS_PER_HOUR)
  mins  = "%.1f" % (secs / SECS_PER_MINUTE)
  secs  = "%.1f" % secs

  case secs.to_i
    when 0...SECS_PER_MINUTE
      "%g second%s".pluralize(secs.to_f)
    when SECS_PER_MINUTE...SECS_PER_HOUR
      "%g minute%s".pluralize(mins.to_f)
    when SECS_PER_HOUR...SECS_PER_DAY
      "%g hour%s".pluralize(hours.to_f)
    else
      "%g day%s".pluralize(days.to_f)
  end
end