~daisy-pluckers/oops-repository/trunk

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
============================
OOPS Repository design notes
============================

Design goals
============

OOPS Repository is intended to scale up to 1 million OOPS reports a day (and
possibly further). This is based on a 1% soft failre rate needing collection.

It needs to supports an extensible model, aggregation, automated garbage
collection, emitting messages for trend and fault detection systems and finally
realtime insertion and display of individual OOPSes.

Components
==========

Cassandra
---------

Cassandra was chosen because of the drop-dead simple method for increasing
write and read bandwidth available in the system.

Schema
======

OOPS : Individual OOPSes are in this column family.
  row key : the oops ID supplied by the inserter
  mandatory columns:
    'date': LONG Used to build a secondary index for garbage collection.
  optional known columns (all strings):
    'bug.*': Maps to bugs.
    'HTTP.*': HTTP variables. e.g. HTTP.method is PUT/POST/GET etc. 
    'REQUEST.*': arbitrary request variables.
    'context': The context for the fault report. E.g. a page template,
               particular API call - that sort of thing.
    'exception': The exception causing the fault.
    'URL': The URL of the request.
    'username': the username.
    'userid': A database id for the user.
    'branch': Source code branch for the server
    'revision': Revision of the server
    'duration': The duration of the request
    'timeline': A json sequence describing the actions taken during the
        request. This may be split out to a separate CF in future. For now
        an example would be [{"start":"0", "length": "34", "database": "main",
            "statment":"SELECT ...", "callstack": "...."}, {....} ]