1
Phase I : the framework
2
-----------------------
4
Develop 4 very robust components:
5
. csv.in, csv.out, sort, logger (send to a connector, not a screen)
6
. subjob and event/trigger system
8
Implement 1 input connectors:
11
Implement 2 output connectors:
15
Component must be robusts:
17
. manage encodings types
19
. support different formats (separator, delimiters, ...)
21
Automatic generation of documentation
23
. Good display of components
25
Manage all internal data as a uniform style:
27
. date object for datas and datetime
30
Schema validation on components (optionnal)
32
We must be able to save a job in a file
33
if possible, use pickle ?
34
otherwise, we should implement a __rep__ on each component
36
We must have a function to instanciate a job from a saved file.
38
Components must have basic statistics functions:
39
Records managed by channel
42
statistics are sent to a statistic channel at end of processing or at pause
44
Do some automated tests in a directory with a Makefile
45
* Test1: csv.in -> logger
46
* Test2: csv.in -> sort -> logger -> csv.out
47
* Test3: 2 subjobs, the second one is triggered after the first one
49
* Manage the interfaces of components: input/output channel names.
50
If a transition as a source or destination channel that do not fit
51
with the component, it should raise an exception
53
* Implement __str__ on jobs, components and transitions to have a way to print
56
* Integrate profiler in the job code: cProfile
58
* Change so that etl.py can be used as a python library or as a standalone executable
59
application by doing a if __name__=="__main__". Write command line arguments:
60
--job=job.pickle (load a job and process it)
61
--profile Do and print a profiling in the job process
66
* do a unittest system and implement unit tests on components file (using if
67
__name__=='__main__'). Example, a unit test of a component should be defined
68
like this (this is just a proposition, may be there are existing test framework
70
if __name__=='__main__':
73
test.input('main', [{...},{...}])
75
for t in test.output('main'):
76
if oldt and oldt['name']>t['name']:
77
raise unit_test('Validation Error')
81
* We need a system to log data opperations and consolidate this logs in the job
84
* We should implement the job execution functions: run, step(1), pause, stop,
87
* We should be able to copy() a job (have two instances of the same job)
89
* We should be able to desactivate some transitions (status on transitions)