← Back to branch summary

~nejucomo/divmod.org/fix-nevow-setup

~nejucomo/divmod.org/fix-nevow-setup

« back to all changes in this revision

Viewing changes to Quotient/benchmarks/spambayes

Committer: Jean-Paul Calderone
Date: 2012-09-13 12:12:56 UTC
mfrom: (2696.1.17 spambayes-fewer-potatoes)
Revision ID: exarkun@twistedmatrix.com-20120913121256-tg7d6l1w3rkpfehr

Improve SQLite3 Spambayes storage implementation performance

Author: exarkun
Reviewer: mithrandi

Change the SQLite3 Spambayes storage implementation to load data for all of a
message's token data in as few SQL operations as possible, instead of
loading one token's data per SQL operation.

files added:
Quotient/benchmarks

Quotient/benchmarks/spambayes

files modified:
Quotient/xquotient/spam.py

Quotient/xquotient/test/test_spambayes.py

Show diffs side-by-side

added added

removed removed

Quotient/benchmarks/spambayes

1

#!/usr/bin/python

2

3

# Benchmark of Quotient spambayes filter, both training and classification.

4

5

import sys, tempfile, random, time

6

7

from xquotient.spam import _SQLite3Classifier

8

9

words = list(open('/usr/share/dict/words', 'r'))

10

11

TRAINING_FACTOR = 50

12

MESSAGE_FACTOR = 500

13

14

def adj(duration):

15

return duration / (TRAINING_FACTOR * MESSAGE_FACTOR) * 1000.0

16

17

18

def main(argv):

19

prng = random.Random()

20

prng.seed(12345)

21

prng.shuffle(words)

22

23

classifier = _SQLite3Classifier(tempfile.mktemp())

24

25

before = time.time()

26

for i in range(TRAINING_FACTOR):

27

classifier.learn(words[i:i + MESSAGE_FACTOR], True)

28

29

for i in range(TRAINING_FACTOR, TRAINING_FACTOR * 2):

30

classifier.learn(words[i:i + MESSAGE_FACTOR], False)

31

after = time.time()

32

33

print 'Learning: %.2f ms/word' % (adj(after - before),)

34

35

before = time.time()

36

for i in range(TRAINING_FACTOR * 2):

37

classifier.spamprob(words[i:i + MESSAGE_FACTOR])

38

after = time.time()

39

40

print 'Guessing: %.2f ms/word' % (adj(after - before),)

41

42

43

if __name__ == '__main__':

44

main(sys.argv)

Older »