1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
4
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=iso-8859-1">
5
<TITLE>Spooling framework</TITLE>
6
<META NAME="GENERATOR" CONTENT="StarOffice 6.0 (Solaris Sparc)">
7
<META NAME="CREATED" CONTENT="20020524;12211900">
8
<META NAME="CHANGEDBY" CONTENT="Joachim Gabler">
9
<META NAME="CHANGED" CONTENT="20020621;13010600">
10
<META NAME="CLASSIFICATION" CONTENT="Analysis and Redesign">
11
<META NAME="DESCRIPTION" CONTENT="Analysis of the current spooling functionality and possibilities for a redesign">
14
@page { size: 21.59cm 27.94cm; margin-left: 3.18cm; margin-right: 3.18cm; margin-top: 2.54cm; margin-bottom: 2.54cm }
15
TD P { margin-bottom: 0.21cm }
16
P { margin-bottom: 0.21cm }
17
H2.western { font-family: "Albany", sans-serif; font-size: 14pt; font-style: italic }
18
H2.cjk { font-family: "MSung Light SC"; font-size: 14pt; font-style: italic }
19
H2.ctl { font-size: 14pt; font-style: italic }
20
H3.western { font-family: "Albany", sans-serif }
21
H3.cjk { font-family: "MSung Light SC" }
22
H4.western { font-family: "Albany", sans-serif; font-size: 11pt; font-style: italic }
23
H4.cjk { font-family: "MSung Light SC"; font-size: 11pt; font-style: italic }
24
H4.ctl { font-size: 11pt; font-style: italic }
25
H5.western { font-family: "Albany", sans-serif; font-size: 11pt }
26
H5.cjk { font-family: "MSung Light SC"; font-size: 11pt }
27
H5.ctl { font-size: 11pt }
28
TH P { margin-bottom: 0.21cm; font-style: italic }
33
<H1>Spooling framework</H1>
34
<H2 CLASS="western">Idea</H2>
35
<P>Spooling is done through a spooling framework, that can have
36
different implementations, e.g. spooing in ascii files, in a database
38
<P>In a first step, spooling for monitoring and accounting is done in
39
a separate event client subscribing a certain number of object types
40
and simply spooling them through the spooling framework.</P>
41
<P>Qmaster still spools its own ascii files. If spooling framework
42
proves to be stable, switch qmaster to use the spooling framework and
43
let the Grid Engine admin decide, which spooling type to use.</P>
44
<P>If qmaster is set to spool into database, and a common production
45
and reporting database is to be used, the event client is not needed.</P>
48
<H2 CLASS="western">Spooled Objects – current implementation</H2>
49
<P>One implementation for each object type – for the reading of
50
most objects a common function call read_object is used.</P>
51
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#000000" CELLPADDING=4 CELLSPACING=0>
78
<P>daemons/qmaster/job_exit.c,</P>
79
<P>clients/qacct/qacct.c</P>
82
<P>Ascii file, one line per record, fixed delimiter</P>
85
<P>Nothing to do. The same information can come from spooling
94
<P>common/read_write_cal.c</P>
97
<P>Ascii file per object, one whitespace separated name/value per
107
<P>Checkpoint Environment</P>
110
<P>common/read_write_ckpt.c</P>
113
<P>Ascii file per object, one whitespace separated name/value per
117
<P>sublist: queues, only names, could be stored as string</P>
122
<P>Cluster configuration</P>
125
<P>common/rw_configuration.c</P>
128
<P>Ascii file per object, one whitespace separated name/value per
132
<P>Probably merge with host objects</P>
140
<P>common/sge_complex.c</P>
143
<P>Ascii file per complex, one line per complex attribute,
144
whitespace separated fields</P>
147
<P>Need rules for spooling of complex attributes. On/Off.
148
Min,Max,Avg in a certain interval.</P>
156
<P>common/complex_history.c</P>
159
<P>Directory for hosts and queues, one file per timestamp,
160
complex file format</P>
163
<P>Nothing to do. The same information can come from spooling
172
<P>common/read_write_host.c</P>
175
<P>Ascii file per object, one whitespace separated name/value per
177
<P>Admin and submit hosts only contain one attribute, the name</P>
180
<P>Admin-/Exec-/Submit- hosts are different objects. Should be
181
merged into one object.</P>
189
<P>common/read_write_host_group.c</P>
204
<P>daemons/common/read_write_job.c</P>
207
<P>Directory structure, multiple binary files (cull packing
209
<P>Job script is stored separately</P>
222
<P>daemons/qmaster/read_write_manop.c</P>
225
<P>Ascii files, one line per user name</P>
228
<P>Should better be attribute of a user object</P>
240
<P>Ascii files, one line per record, fixed delimiter</P>
243
<P>No real objects at the moment. But each message has a
244
structure well suited for storage in database tables.</P>
249
<P>Parallel Environment</P>
252
<P>common/read_write_pe.c</P>
255
<P>Ascii file per object, one whitespace separated name/value per
259
<P>sublist: queues, only names, could be stored as string</P>
267
<P>common/read_write_userprj.c</P>
270
<P>Ascii file per object, one whitespace separated name/value per
274
<P>Usage and longterm usage are sublists. Stored as name/values
275
pairs: cpu, mem, io, finished jobs. Could also be stored as
285
<P>common/read_write_queue.c</P>
288
<P>Ascii file per object, one whitespace separated name/value per
292
<P>Qtype is stored as bitfield, spooled as list of type
294
<P>sublists: thresholds (name/value pairs), owner (string list),
295
user (string list), xuser (string list), subordinates (string
296
list), complexes (string list), complex_values (name/value
297
pairs), projects (string list), xprojects (string list)</P>
305
<P>common/sge_sharetree.c</P>
308
<P>One ascii file, references by node ids within the file</P>
320
<P>common/read_write_userprj.c</P>
323
<P>Ascii file per object, one whitespace separated name/value per
324
line, special format for project related data</P>
336
<P>common/read_write_ume.c</P>
351
<P>common/read_write_userset.c</P>
354
<P>Ascii file per object, one whitespace separated name/value per
364
<P STYLE="margin-bottom: 0cm"><BR>
366
<P STYLE="margin-bottom: 0cm"><BR>
368
<H2 CLASS="western">Implementation</H2>
369
<H3 CLASS="western">Types of spooling</H3>
370
<P>Spooling is done in a certain spooling context.</P>
371
<P>A spooling context defines, how objects are spooled.</P>
372
<P>Multiple spooling contexts can be used within one process.</P>
373
<P>Examples for spooling types/destinations:</P>
375
<LI><P>Ascii file, one record per file, name/value pairs per line</P>
376
<LI><P>Ascii file, fixed delimiters for objects and attributes</P>
377
<LI><P>Cull binary file (actually used for jobs, combined with a
378
sophisticated directory structure).</P>
379
<LI><P>XML files. They could easily replace the Cull binary file
380
format, as hierarchies can be implemented in a straigthforward and
382
<LI><P>Database files (e.g. Xbase)</P>
383
<LI><P>SQL Database</P>
384
<LI><P>LDAP Repository (for certain objects like users)</P>
386
<P>Further information stored in a spooling context:</P>
388
<LI><P>spool historical data (with timestamp) or snapshot</P>
389
<LI><P>spooling type specific information, e.g. delimiters for ascii
390
file spooling, file handles, database connections etc. if they are
393
<H3 CLASS="western">Spooling of sublists</H3>
394
<P>Many Grid Engine object types contain sublists.
396
<P>In the current implementation, these hierarchical data structures
397
are stored in different ways:</P>
399
<LI><P>by referencing other objects using string lists, e.g. the
400
queue names in pe objects reference queue objects</P>
401
<LI><P>by using name/value pairs in string lists, e.g. complex
402
variables set for queues are stored in a string lists containing
403
tuples in the format <name>=<value></P>
404
<LI><P>by using special formats within the same ascii file (e.g. the
405
user object or the sharetree). We should avoid these in the future.</P>
406
<LI><P>by using the cull binary format as spool file format
407
including sublists. We should not differentiate between ascii and
408
cull binary file formats in the future.</P>
409
<LI><P>by using directory hierarchies (e.g. storing array tasks
410
within the jobs spool directory). For file based storage, we'll need
411
them also in future implementations.</P>
415
<P>For the new implementation, we'll have to differentiate between
416
file based formats and database storage.</P>
417
<P>For file based storage, we should use the following strategies:</P>
419
<LI><P>when referencing other spooled objects, we should store a
420
unique keys. Lists of such keys can be stored as string list.</P>
421
<LI><P>name/value pairs can be stored in string lists in the
422
existing format <name>=<value></P>
423
<LI><P>We'll have to continue the use of directory hierarchies for
424
job spooling due to limitations of the number of files per
427
<P>For database storage, we should use the following strategies:</P>
429
<LI><P>referencing single other objects can be done by storing a
431
<LI><P>referencing lists of other objects can also be done by
432
storing a string list of keys, if we want to accept performance
433
drawbacks for certain queries, e.g. „which pe's contain queue
434
xyz“.<BR>Better would be to use mapping tables, e.g. a table
435
pe_queues, that links queues to pe's. Problem: Special keywords like
436
„all“ would have to be handled by either a pseudo queue
437
„all“ or a mapping entry without queue reference.</P>
438
<LI><P>name/value pairs have to be stored in additional tables. In
439
certain cases this can be extended mapping tables, e.g. mapping
440
complex attributes to queues and giving them a value.</P>
441
<LI><P>The hierarchy job – ja_task – pe_task can be
442
easily implemented by referencing the hierarchical superior object
443
in the subordinated object – pe_tasks reference the ja_task,
444
ja_tasks reference the job.</P>
446
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#000000" CELLPADDING=4 CELLSPACING=0>
454
<P>reference type</P>
457
<P>current implementation</P>
470
<P>referencing objects</P>
473
<P>object id from cull</P>
476
<P>object id from cull</P>
479
<P>object id, either from cull or database internal serial number</P>
484
<P>list of references</P>
487
<P>string list or cull sublist</P>
498
<P>name/value pairs</P>
501
<P>string list or cull sublist</P>
507
<P>mapping table with value</P>
512
<P>subordinate objects</P>
515
<P>special format or spool in cull binary format</P>
518
<P>break up such hierarchies (e.g. possible in the user object)
519
or store data in additional files or directory structure and
520
reference these files</P>
523
<P>store them in additional tables and make them reference their
532
<P>directory hierarchy</P>
535
<P>directory hierarchy</P>
538
<P>subordinate objects reference superior objects</P>
543
<P STYLE="margin-bottom: 0cm"><BR>
545
<H3 CLASS="western">Spooling policies dependent on component</H3>
546
<H4 CLASS="western">Current implementation
548
<P>In the current implementation we have different spooling policies
549
dependent on the component that does spooling.</P>
550
<P>Main spooling component is the qmaster.</P>
551
<P>But also execd has spooling of jobs and related information, e.g.
552
queues, or parallel environment information.
554
<P>The related information reflects the status of the spooled object
555
at the time the job was delivered to execd.</P>
556
<P>It is also possible that execd does spool other attributes of jobs
557
than does qmaster.</P>
558
<H4 CLASS="western">Suggestions for a new implementation</H4>
559
<P>Different approaches are possible to address this issue. The
560
following will discuss some ideas.</P>
561
<H5 CLASS="western">Multiple writing instances to one global database</H5>
562
<P>All daemons use a common database. The execds can write directly
563
to the database. Qmaster is notified about changes by the database.</P>
567
<LI><P>Reduced message transfer volume between qmaster and execd</P>
568
<LI><P>Reduced spooling overhead in qmaster</P>
569
<LI><P>More accurate data in the database, as data doesn't have to
575
<LI><P>Danger of inconsistencies between data in qmaster and data in
576
the database. This problem exists with any implementation, but most
577
probably qmaster should be the instance that holds the most recent
579
<LI><P>Scalability issues. It takes away the possibility of local
582
<P>Probably not an option for the near future.</P>
583
<H5 CLASS="western">Restrict to file spooling in execd</H5>
584
<P>Each execd has its own area for spooling, usually file based,
585
either on a local disk (recommended) or via NFS mount.</P>
586
<P>Use formats that allow the spooling of hierarchical data, i.e.
587
either cull binary format or XML format.</P>
588
<P>As execd spools information in a different way (not all / other
589
attributes as qmaster, different strategy for sublists), the spooling
590
implementation has to provide means to overwrite the spooling
591
strategies defined as default for certain object types, or 2 spooling
592
strategies have to be defined for object types.</P>
595
<LI><P>spooling load can be easily distributed by using local file
597
<LI><P>execd is the only instance that needs to spool hierarchical
598
data not normalized, as the sub objects that have to be spooled are
599
only valid for the lifetime of the only spooled object types (job
604
<LI><P>Different spooling strategies within one cluster have to be
606
<LI><P>spooling remains a bottleneck when NFS has to be used for
607
some reason, e.g. diskless compute engines</P>
608
<LI><P>on very big SMP machines (some hundred processors) spooling
609
could become a bottleneck due to slow file spooling</P>
611
<H3 CLASS="western">Cull enhancements</H3>
612
<H4 CLASS="western">Definition of attributes</H4>
613
<P>Cull definition will have to contain information, which fields
614
have to be spooled and how sublists are spooled.</P>
615
<P>Replace the many similar definitions for same object types by a
616
combination of flags. Example:</P>
617
<P>We have now 14 definitions for the string datatype (SGE_STRING,
618
SGE_STRINGH, SGE_STRING_HU, SGE_KSTRING, ...)</P>
621
<P>A list element definition like
623
<P>SGE_KULONGH(JB_job_number)</P>
624
<P>could be replaced by
626
<P>SGE_ULONG(JG_job_number, HASH | UNIQUE | SPOOL | QIDL_K)</P>
628
<P>SGE_LIST_ELEMENT(JG_job_number, ULONG | HASH | UNIQUE | SPOOL |
632
<P>A keyword DEFAULT could be used, if no special settings are done
636
<P>Descriptor field mt has lots of free space (currently only uses 4
637
bit for the data types from a (32 bit) integer) that could hold the
638
following additional information:</P>
640
<LI><P>ARRAY <BR>For an array implementation (optionally to be done
641
in a separate step)</P>
642
<LI><P>HASH<BR>Enable hashing for the field.</P>
643
<LI><P>UNIQUE<BR>Attribute has unique values within one list. This
644
is at the moment only checked for attributes that have hashing
645
enabled, but could be extended to any operations setting values.</P>
646
<LI><P>SPOOL<BR>Shall the attribute be spooled.</P>
647
<LI><P>SHOW<BR>Shall the attribute be shown (e.g. in qconf -s*,
649
<LI><P>CONFIG<BR>Shall the attribute be configurable, i.e. be
650
contained in the temporary files created for qconf -m* operations or
651
for qconf -mattr operations</P>
653
<P>Probably we should use a prefix like CULL_ or SGE_ to ensure
654
uniqueness, e.g. CULL_HASH instead of HASH.</P>
655
<H4 CLASS="western">Tracking of changed attributes</H4>
656
<P>To be able to interface a database using mechanisms like SQL, each
657
object must know, which attributes have changed. Otherwise, the whole
658
object has to be spooled on each spooling function call, even if only
659
few attributes have been changed or the object hasn't been changed at
661
<P>This could be achieved by making a struct arround the lMultiType
662
enum type and reserving „one bit“ for the changed
664
<P>Or by adding a bitfield containing this information to the
665
lListElem data type – this would be less memory consuming.</P>
666
<H4 CLASS="western">Attribute names</H4>
667
<P>A set of attribute names are generated using the NAMEDEF macros
668
for each object type.</P>
669
<P>These attribute names have very limited use in the current
670
implementation – they are only used for debugging purposes
671
(lWrite* function calls).</P>
672
<P>For spooling, information output and configuration changes we also
673
need attribute names. These names are at the moment hardcoded in the
674
spooling, output and parsing functions.</P>
675
<P>It would be better, to extend the existing NAMEDEF macros to
676
create struct objects containing both the internal attribute name and
677
an attribute name to be used for the other purposes.</P>
678
<H3 CLASS="western">Functions
680
<P>create_spooling_context</P>
681
<P>free_spooling_context</P>
689
<P>spool_attribute</P>
692
<H3 CLASS="western">Installation issues</H3>
694
<P>Provide an install_monitoring script to setup the event client and
695
its spooling configuration.</P>
697
<P>In qmaster install, decide which spooling type to use, with type
698
specific further actions (for SQL database, query user for parameters
699
and test the database).</P>
700
<P STYLE="margin-bottom: 0cm"><BR>
702
<H2 CLASS="western">Implementation proposal</H2>
703
<P>The implementation can be done in separate steps that can each
704
face thorough testing. Time estimations are netto times and include
705
documentation and testing.</P>
706
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#000000" CELLPADDING=4 CELLSPACING=0>
711
<TH WIDTH=79% BGCOLOR="#e6e6ff">
714
<TH WIDTH=21% BGCOLOR="#e6e6ff">
715
<P>est. time [weeks]</P>
721
<TD WIDTH=79% VALIGN=TOP>
722
<P>implement the suggested cull object definition changes</P>
724
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="2" SDNUM="1023;">
729
<TD WIDTH=79% VALIGN=TOP>
730
<P>implement tracking of attribute changes</P>
732
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="2" SDNUM="1023;">
737
<TD WIDTH=79% VALIGN=TOP>
738
<P>implement file based spooling. Restrict to the following text
741
<LI><P>one record per file, name/value pairs per line</P>
742
<LI><P>fixed delimiters for objects and attribute values</P>
746
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="3" SDNUM="1023;">
751
<TD WIDTH=79% VALIGN=TOP>
752
<P>make a compile time switch that will make the new spooling
753
functions used by qmaster for some selected object types. Only
754
for test purposes.</P>
756
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="1" SDNUM="1023;">
761
<TD WIDTH=79% VALIGN=TOP>
762
<P>implement database storage</P>
764
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="8" SDNUM="1023;">
769
<TD WIDTH=79% VALIGN=TOP>
770
<P>create an event client that subscribes all events for all
771
object types and spools them to a database</P>
773
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="2" SDNUM="1023;">
778
<TD WIDTH=79% VALIGN=TOP>
779
<P>do extensive tests with qmaster using some of the new spooling
780
functions to files and the event client attached, continue tests
781
during the next phases.</P>
783
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="2" SDNUM="1023;">
788
<TD WIDTH=79% VALIGN=TOP BGCOLOR="#e6e6e6">
789
<P><I><B>Sum essential steps</B></I></P>
791
<TD WIDTH=21% VALIGN=BOTTOM BGCOLOR="#e6e6e6" SDVAL="20" SDNUM="1023;">
792
<P ALIGN=RIGHT>20</P>
796
<TD WIDTH=79% VALIGN=TOP>
797
<P>make qmaster and execd use the new spooling framework (compile
798
time option), test different spooling strategies</P>
800
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="4" SDNUM="1023;">
805
<TD WIDTH=79% VALIGN=TOP>
806
<P>make new spooling framework the default, create means to
807
configure spooling strategies during the installation process
810
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="2" SDNUM="1023;">
815
<TD WIDTH=79% VALIGN=TOP>
816
<P>create install_monitoring that will install the event client
819
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="1" SDNUM="1023;">
824
<TD WIDTH=79% VALIGN=TOP>
825
<P>create means to update the database structure, backup and
826
purging of outdated information</P>
828
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="2" SDNUM="1023;">
833
<TD WIDTH=79% VALIGN=TOP>
834
<P>build clients that use the database as source of information
835
instead of qmaster (qhost, qstat, qacct)</P>
837
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="2" SDNUM="1023;">
842
<TD WIDTH=79% VALIGN=TOP>
843
<P>change qconf and qalter to use the new spooling framework for
844
reading information and for creating and processing the data to
847
<TD WIDTH=21% VALIGN=BOTTOM SDVAL="2" SDNUM="1023;">
852
<TD WIDTH=79% VALIGN=TOP BGCOLOR="#e6e6e6">
853
<P><I><B>Sum additional steps</B></I></P>
855
<TD WIDTH=21% VALIGN=BOTTOM BGCOLOR="#e6e6e6" SDVAL="13" SDNUM="1023;">
856
<P ALIGN=RIGHT>13</P>
b'\\ No newline at end of file'