~c-tunnell1/+junk/chepdq2012 : revision 5

1

\documentclass[a4paper]{jpconf}

2

\usepackage{graphicx}

3

\begin{document}

4

\title{MAUS: MICE Analysis User Software \jpcs}

5

6

\author{M. Jackson (EPSRC)\footnote{We would like to acknowledge the assistance of the Software Sustainability Institute. The work carried out by the SSI is supported by EPSRC through grant EP/H043160/1.}, D. Rajaram (IIT) and C.D. Tunnell (U. Oxford) \footnote{test}}

7

8

\address{}

9

10

\ead{michaelj@epcc.ed.ac.uk, \{durga,tunnell\}@fnal.gov}

11

12

\begin{abstract}

13

The Muon Ionization Cooling Experiment (MICE) has developed the MICE Analysis User Software (MAUS) to simulate and analyze experimental data. It serves as the primary codebase for the experiment and provides, for example, online data quality checks and offline batch simulation and reconstruction. The code is structured in a framework inspired by Map-Reduce to allow parallelization whether on a personal machine or in the control room. Various software engineering practices from industry are also used to ensure correct and maintainable physics code, which include unit, functional and integration tests, continuous integration and load testing, code reviews, and distributed version control systems. Lastly, there are various small design decisions like using JSON as the data structure, using SWIG to allow developers to write components in either Python or C++, or using the SCons python-based build system that may be of interest to other experiments.

14

\end{abstract}

15

16

%\section{Introduction}

17

%1. Overview of MICE goals\\

18

%2. MICE software requirements and goals\\

19

%3. Software framework\\

20

%4. Software components\\

21

%5. Data structure\\

22

%6. Lessons from Industry \\

23

%6. Conclusions\\

24

25

\section{The MICE Experiment}

26

27

The Muon Ionization Cooling Experiment (MICE) is based at the Rutherford Appleton Laboratory and aims to demonstrate the reduction of phase space for muon beams. This R\&D is important for ensuring efficient operation of proposed accelerators using muons beams (See \cite{c:IDR} and references there-in) and there have been various phases of this R\&D program: over the last decade there have been numerous construction phases interweaved with running periods. Given the experiment is constantly changing and the long time-scales inherent with conducting such an R\&D, the need arises to ensure long-term correctness and maintainability of the software used for the experiment. This goal is complicated by the 5 detector technologies used within the experiment.

28

29

\section{Software Requirements}

30

31

The MICE analysis software must simultaneously be both particle physics code and accelerator physics code. The particle physics functionality includes features like simulating electronics respond or reconstructing tracks, whilst the accelerator physics requires the computation of transfer matrices and Twiss parameters. Both types of functionality require knowledge of, for example, the magnetic fields and geometry thus requiring a single software scope.

32

33

The requirements imposed upon the software were previously addressed by the G4MICE package \cite{g4mice}, created in 2002. Test coverage and documentation were missing for the much of the code base making development, use and verification of the code challenging. This is a frequent problem in physics and industry. The extraction principle outlined in \cite{refactoring} was used to refactor the code, where \emph{refactoring} is the systematic process of restructuring code to address changing specifications.

34

35

Since it was inefficient to attempt to understand the previous code due to lost expertise, the code was frozen such that no changes were allowed and wrappers were written such that the code could interface with a new Python framework. To improve the project, small pieces of code were gradually written to replace the old frozen functionality. These new codes had quality requirements: good comments, a style guide, and tests. This made it possible to slowly improve the quality and maintainability of the code base while retaining existing functionality.

36

37

\section{MAUS}

38

39

The MICE Analysis User Software (MAUS) has been official MICE software since 2010 and has prepared MICE for more complex data taking scenarios with higher data rates \cite{maus}. The goal was to restructure the code into a Map-Reduce inspired \cite{mapreduce} data flow in order to simplify the interfaces that developers have to follow and aid running the code in parallel. It was felt that Map-Reduce parallelizes particle physics problems in a useful fashion but the API was simplified to have \emph{transformers} and \emph{mergers} instead of maps and reduces.

40

41

The basic unit of information is ``spill-level", which corresponds to a single beam extraction. Spills are independent, thus simplifying parallelization. For example, each ``transform'' should process a spill by converting the binary DAQ output to a processable data structure and then applying a track fitting routine. A similar thing can be done to Monte Carlo (MC) simulation of the apparatus. ``Merger'' allows functionality that requires access to the entire data set of a single spill: evolution of parameters over time, making histograms, and so forth.

42

43

The JSON data structure is used to represent a spill in order to aid developers in extending it and users in understanding it. An example spill input is:

44

45

\begin{verbatim}

46

{

47

"mc_particles": [

48

{

49

"primary": {

50

"energy": 210.0,

51

"momentum": {

52

"x": 0.0,

53

"y": 0.0,

54

"z": 1.0

55

},

56

"particle_id": 13,

57

"position": {

58

"x": 0.0,

59

"y": -0.0,

60

"z": -5000.0

61

},

62

"random_seed": 10,

63

"time": 0.0

64

}

65

}

66

]

67

}

68

\end{verbatim}

69

70

71

\begin{figure*}[tb]

72

\centering

73

\includegraphics*[width=168mm]{outfile}

74

\caption{An visual representation of a MAUS control macro that illustrates the data flow.}

75

\label{dataflow}

76

\end{figure*}

77

\noindent

78

and an example macro that controls MAUS is:

79

80

\begin{verbatim}

81

import MAUS

82

83

# File with particles to simulate

84

my_input = MAUS.InputJSON("evts.json")

85

86

# Create an empty array of maps, then

87

# populate it with the functionality you

88

# want to use.

89

my_map = MAUS.MapGroup()

90

91

# Add geant4 Monte Carlo simulation

92

my_map.append(MAUS.MapSimulation())

93

94

# Add electronics models

95

my_map.append(MAUS.MapTOFDigitization())

96

my_map.append(MAUS.MapTrackerDigitization())

97

98

# Create set of standard demo plots

99

my_reduce = MAUS.ReduceMakeDemoPlots()

100

101

# Where to save output?

102

filename = `simulation.out'

103

104

# Create uncompressed file object.

105

# 'w' means write.

106

output_file = open(filename, `w')

107

108

# Then construct a MAUS output component

109

my_output = MAUS.OutputJSON(output_file)

110

111

# The Go() drives all the components you

112

# pass in, then check the file defined

113

# above for the output

114

MAUS.Go(my_input, my_map,

115

my_reduce, my_output)

116

\end{verbatim}

117

118

\noindent

119

where dataflow in MAUS is illustrated in Fig.~\ref{dataflow}. The macro language is Python but components can be written in either Python or C++.

120

121

SWIG \cite{swig} is used to make Python bindings to C++ code which are created automatically. The experience with using SWIG is a bit mixed: SWIG is good for well-defined APIs like that defined by the transforms and merges but it proved difficult to use SWIG to reveal common C++ routines that would have been useful within Python code (ex. magnetic fields). The Boost::python libraries seem to be easier to use for the HEP use-case but their difficulty to install made them infeasible to use.

122

123

\section{Applying Lessons from Industry}

124

125

Knowledge gained within industry can be applied and enable the project to run more smoothly. Various industry procedures were tested in developing MAUS.

126

127

\subsection{Project Management and Issue Tracker}

128

129

A paradigm shift in how the code was written came naturally with the change in how the project was managed. Before any code was written, a project management website was set up that included a wiki and issue tracker using the Redmine software. This allowed people to keep track of task assignment, current bugs, and feature requests. It is a vital tool for establishing the status of various blocks of work while simultaneously providing a useful historical record and institutional memory. The decision to use Redmine seems to have been a good one as the ease of its use allowed for the expansion of this project management tool to other parts of the MICE project.

130

131

\subsection{Code Reviews}

132

133

Code reviews are standard practice in industry but rare within physics mostly due to the man-power limitations within the software projects. New code requires an hour of review before entering the trunk with other communal code. In addition to tracking down bugs, the review process also helps spread knowledge of the project between developers. It helps people learn from one another while simultaneously decreasing the reliance on specific developers.

134

135

\subsection{Static Code Analysis}

136

137

Static code analyzers such as Coverity \cite{coverity} were used. The purpose of this type of tool is to inspect code to determine if there are conditions the code can enter that may lead to unexpected behavior. For example, if a variable is not initialized and there is a way for the variable to be used that would lead to a segmentation fault, this tool will alert the user.

138

139

The static code analyzer finds problems of varying degrees of severity and human intervention is required to categorize them. Given that, it is inefficient to use static code analyzers on legacy code since it takes an unrealistic amount of time to process the wide range of errors. The optimal use case is only to rectify problems observed in new code, thus incrementally improving the code base.

140

141

However, it was decided to abandon static code analysis since the gains did not merit the time required.

142

143

\subsection{Tests and Continuous Integration}

144

145

Unit tests are small pieces of code that test other small pieces of code. They are meant to be granular, deterministic, and repeatable. Their purpose is to allow the person developer to know if they have broken preexisting code. These tests aid in creating releases since one can verify that the code is still functional. If bugs are found, new tests are added to make sure the bug never resurfaces. This type of development also allows one to quickly narrow down the source of a problem. The unit test coverage within MAUS has been useful for developers to know when a piece of code has broken, but most importantly unit tests help remove the fear of changing code that does not ``belong'' to them.

146

147

The entire system is checked by integration tests that execute at the

148

application level. For example, a large statistics simulation could be

149

used to verify that physics quantities have not changed within

150

statistical uncertainties. These have proven to be the most useful tests since they help ensure that the physics does not change.

151

152

Jenkins \cite{jenkins} performs continuous integration tests of the code. This tool

153

runs the test suite in a number of different installation environments

154

every time code is committed. A distributed version control system

155

called Bazaar \cite{bzr} is used and code from every user is tested before it

156

becomes communal.

157

158

These tools have been vital to the project since developers are alerted

159

to broken code. Jenkins is able to try to compile and test the code on a

160

wide range of Linux and Mac platforms to ensure that the code can be

161

deployed to any system. Continuous integration complements unit and integration tests

162

because the frequent running of unit tests allows code developers to know instantly where and when a problem was introduced into the code base.

163

164

\subsection{Release Cycle}

165

166

Code that has been tested as described above is periodically released. Major releases occur every few months and minor releases are biweekly. The limiting factor on the timescale for minor releases is how long it takes to develop and test new code. This quick release cycle means that bugs are quickly resolved.

167

168

169

\section{Future}

170

171

The MAUS effort within the MICE experiment has proven to be a successful collaboration physicists and software engineers. There are currently about ten active developers working as a team and using a wide range of tools and methods. A wealth of knowledge and experience exists in the software engineering community and taking advantage of that knowledge has helped MICE.

172

173

%\section{Figures and figure captions}

174

%Figures must be included in the source code of an article at the appropriate place in the text %not grouped together at the end.

175

176

%Each figure should have a brief cap%tion describing it and, if

177

%necessary, interpreting the various lines and symbols on the figure.

178

%As much lettering as possible should be removed from the figure itself and

179

%included in the caption. If a figure has parts, these should be

180

%labelled ($a$), ($b$), ($c$), etc.

181

%\Tref{blobs} gives the definitions for describing symbols and lines often

182

%used within figure captions (more symbols are available

183

%when using the optional packages loading the AMS extension fonts).

184

185

%\begin{table}[h]

186

%\caption{\label{blobs}Control sequences to describe lines and symbols in figure

187

%captions.}

188

%\begin{center}

189

%\begin{tabular}{lllll}

190

%\br

191

%Control sequence&Output&&Control sequence&Output\\

192

%\mr

193

%\verb"\dotted"&\dotted &&\verb"\opencircle"&\opencircle\\

194

%\verb"\dashed"&\dashed &&\verb"\opentriangle"&\opentriangle\\

195

%\verb"\broken"&\broken&&\verb"\opentriangledown"&\opentriangledown\\

196

%\verb"\longbroken"&\longbroken&&\verb"\fullsquare"&\fullsquare\\

197

%\verb"\chain"&\chain &&\verb"\opensquare"&\opensquare\\

198

%\verb"\dashddot"&\dashddot &&\verb"\fullcircle"&\fullcircle\\

199

%\verb"\full"&\full &&\verb"\opendiamond"&\opendiamond\\

200

%\br

201

%\end{tabular}

202

%\end{center}

203

%\end{table}

204

205

206

%Authors should try and use the space allocated to them as economically as possible. At times it %may be convenient to put two figures side by side or the caption at the side of a figure. To put f%igures side by side, within a figure environment, put each figure and its caption into a %minipage with an appropriate width (e.g. 3in or 18pc if the figures are of equal size) and then %separate the figures slightly by adding some horizontal space between the two minipages (e.g. %\verb"\hspace{.2in}" or \verb"\hspace{1.5pc}". To get the caption at the side of the figure add %the small horizontal space after the \verb"\includegraphics" command and then put the %\verb"\caption" within a minipage of the appropriate width aligned bottom, i.e. %\verb"\begin{minipage}[b]{3in}" etc (see code in this file used to generate figures 1--3).

207

208

%Note that it may be necessary to adjust the size of the figures (using optional arguments to %\verb"\includegraphics", for instance \verb"[width=3in]") to get you article to fit within your %page allowance or to obtain good page breaks.

209

210

%\begin{figure}[h]

211

%\begin{minipage}{14pc}

212

%\includegraphics[width=14pc]{name.eps}

213

%\caption{\label{label}Figure caption for first of two sided figures.}

214

%\end{minipage}\hspace{2pc}%

215

%\begin{minipage}{14pc}

216

%\includegraphics[width=14pc]{name.eps}

217

%\caption{\label{label}Figure caption for second of two sided figures.}

218

%\end{minipage}

219

%\end{figure}

220

221

%\begin{figure}[h]

222

%\includegraphics[width=14pc]{name.eps}\hspace{2pc}%

223

%\begin{minipage}[b]{14pc}\caption{\label{label}Figure caption for a narrow figure where the %caption is put at the side of the figure.}

224

%\end{minipage}

225

%\end{figure}

226

227

%Using the graphicx package figures can be included using code such as:

228

%\begin{verbatim}

229

%\begin{figure}

230

%\begin{center}

231

%\includegraphics{file.eps}

232

%\end{center}

233

%\caption{\label{label}Figure caption}

234

%\end{figure}

235

%\end{verbatim}

236

237

\section*{References}

238

\begin{thebibliography}{9}

239

\bibitem{iopartnum} IOP Publishing is to grateful Mark A Caprio, Center for Theoretical Physics, Yale University, for permission to include the {\tt iopart-num} \BibTeX package (version 2.0, December 21, 2006) with this documentation. Updates and new releases of {\tt iopart-num} can be found on \verb"www.ctan.org" (CTAN).

240

\end{thebibliography}

241

242

\end{document}

243

244