2
\label{chapter:globalpid}
7
The global PID framework is designed to use sets of PID variables to 1) use MC data to create PDFs of these variables for a range of particle hypotheses, and 2) to use the PDFs as part of a log-likelihood method to determine the PID of reconstructed global tracks from data. The framework is designed such that new PID variables can be added as they are developed. Section 1 of this document will explain how to use the PID to produce PDFs, and how to perform PID on spill data contained within a Json document. Section 2 will detail how these two actions are performed within the code, and in Section 3 the PID variables, their structure, how new ones can be added to the framework, and details of those already in place, will be discussed. This document will be updated as the PID framework and variables continue to be developed.
9
\subsection{Using the PID scripts}
12
\subsection{Producing PDFs}
14
Whilst the PID framework comes with PDFs provided in PIDhists.root, it is possible for a user to produce PDFs for hypotheses not included within this file. The following describes how this should be done.
17
\includegraphics[width=2in]{reconstruction/globalpid/pdfprodflow.pdf}
18
\caption{Steps invloved in producing a PDF from MC data}
23
\item Simulation: Production of MC data for a given particle
25
\item Global Reconstruction: The MC data should then be passed through
26
the global reconstruction. Detector information is currently to added to global tracks. Simulation and global recon can be performed
27
by using the simulate\_global.py script in \\
28
\$\{MAUS\_ROOT\_DIR\}\textbackslash bin\textbackslash Global. As well as performing the simulation
29
and TOF and Tracker reconstruction, this script also calls the mappers
30
MapCppGlobalReconImport and MapCppGlobalTrackMatching, which import the detector
31
information into the global event and then construct the global tracks required
32
for the calculation of PID variables. The control variable to specify
33
the name of the output Json file that contains the reconstructed tracks can be set at the command line, or by
34
using another datacard, as shown in listing ~\ref{globaldatacard}. To run
35
the global reconstruction with the datacard, the following should be entered at the command line:
39
> ${MAUS_ROOT_DIR}/bin/Global/simulate_global.py \
40
--configuration_file <name_of_datacard>
43
which for the example in ~\ref{globaldatacard} would be:
46
> ${MAUS_ROOT_DIR}/bin/Global/simulate_global.py \
47
--configuration_file ex_global_datacard.py
50
\item PDF Production: To produce the PDFs from the reconstructed MC
51
data, pid\_pdf\_production.py in \$\{MAUS\_ROOT\_DIR\}\textbackslash
52
bin\textbackslash Global is then used. This script calls the reducer
53
ReduceCppGlobalPID. With this script, a datacard, such as that shown
54
given in listing ~\ref{pdfdatacard}, that includes the input Json filename, the global\_pid\_hypothesis for which the PDF(s) are to be produced, and a unique\_identifier (typically the time and date at which the script is run) is used by entering at the command line:
56
> ${MAUS_ROOT_DIR}/bin/Global/pid_pdf_generator.py \
57
--configuration_file example_pdf_datacard.py
59
This will create a directory within \$\{MAUS\_ROOT\_DIR\}\textbackslash files\textbackslash PID corresponding to the hypothesis and identifier given by the datacard, which will then contain files for each PID variable, each of which will contain the PDF for that hypothesis and variable.
62
\vspace*{1\baselineskip}
64
\begin{lstlisting}[language=Python,basicstyle=\ttfamily,frame=single,breaklines=true,captionpos=b,caption={An example datacard (ex\_global\_datacard.py) for use with GlobalReconImport.py Other configuration flags can be added to this datacard, if they need to differ from those set in ConfigurationDefaults.py},label=globaldatacard]
67
# A json document containing spills from MC data
68
input_json_file_name = "example_hypothesis.json"
69
input_json_file_type = "text"
71
# The json document that the global tracks will be
73
output_json_file_name =
74
"example_hypothesis_Global_Recon.json"
75
output_json_file_type = "text"
78
\vspace*{2\baselineskip}
80
\begin{lstlisting}[language=Python,basicstyle=\ttfamily,frame=single,captionpos=b,caption={An example datacard (example\_pdf\_datacard.py) for use with pid\_pdf\_generator.py},label=pdfdatacard]
84
# Use the current time and date as a unique
85
# identifier when creating files to contain PDFs.
86
# A unique_identifier is required by the reducer,
87
# and PDF production will fail without one.
88
now = datetime.datetime.now()
90
now.strftime("%Y_%m_%dT%H_%M_%S_%f")
92
# A json document containing global tracks from MC
94
input_json_file_name =
95
"example_hypothesis_Global_Recon.json"
96
input_json_file_type = "text"
98
# The particle hypothesis that the PDF is being
99
# created for. A global_pid_hypothesis is required
100
# by the reducer, and PDF production will fail
102
global_pid_hypothesis = "example"
105
\subsubsection{Performing PID with pre-existing hypotheses}
107
To perform PID on data, the steps shown figure ~\ref{pidperf} should be followed.
111
\includegraphics[width=2in]{reconstruction/globalpid/pidperfflow.pdf}
112
\caption{Steps invloved in performing the PID for a data sample}
118
\item Data: This can be experimental or MC data, however the spill data must be passed to the PID in a Json document.
119
\item Global Reconstruction: In the same way as described above, the
120
data should then be passed through the global reconstruction,
121
currently using the GlobalReconImport.py script in \$\{MAUS\_ROOT\_DIR\}\textbackslash
122
bin\textbackslash Global, with a corresponding datacard containing the name of the input Json file and the name of the output file.
123
\item Global PID: To perform the PID on the reconstructed data, GlobalPID.py in \$\{MAUS\_ROOT\_DIR\}\textbackslash
124
bin\textbackslash Global is then used. This script calls the
125
MapCppGlobalPID mapper. With this script, a datacard, such as that
126
shown given in listing ~\ref{piddatacard}, that includes the input and output Json filenames, is used, by entering the following at the command line:
128
> ${MAUS_ROOT_DIR}/bin/Global/GlobalPID.py \
129
--configuration_file example_pid_datacard.py
133
\begin{lstlisting}[language=Python,basicstyle=\ttfamily,breaklines=true,frame=single,captionpos=b,caption={An example datacard (example\_pid\_datacard.py) for use with GlobalPID.py},label=piddatacard]
136
# A json document containing spills from data
137
input_json_file_name =
138
"example_hypothesis_Global_Recon.json"
139
input_json_file_type = "text"
141
# The json document that the global tracks will be
143
output_json_file_name =
144
"example_hypothesis_Global_PID.json"
145
output_json_file_type = "text"
148
As the framework currently stands, the output document would now contain the global tracks with the PID set (where it has been possible to do so) to whichever particle hypothesis had the highest log-likelihood. For tracks where the PID could not be determined, the track PID will be left as 0.
150
\section{MapCppGlobalPID and ReduceCppGlobalPID}
152
\subsection{MapCppGlobaPID}
154
The steps taken in MapCppGlobalPID for a single track are shown in
155
figure \ref{mapflow}. To express this more fully, the data, having passed through the global reconstruction, is then passed to the PID. For each track, the values of each PID variable are calculated. Each of these values is then compared to the corresponding PDFs for all particle hypotheses, the number of entries in the corresponding bin providing the probability from which the log-likelihood is calculated. For each particle hypothesis, the log-likelihoods of all of the PID variables are summed to give a log-likelihood for that hypothesis. The PID of the track is then obtained by comparing the log-likelihoods of the hypotheses.
158
\includegraphics[width=3in]{reconstruction/globalpid/PIDflow.pdf}
159
\caption{Flow chart detailing steps taken in MapCppGlobaPID}
164
\section{ReduceCppGlobalPID}
166
The steps taken in ReduceCppGlobalPID are shown in figure ~\ref{reduceflow}. MC data for a given particle hypothesis, having passed through the global reconstruction, is then passed to the PID. For each track, the values of each PID variable are calculated. A histogram is filled with these values. If the behaviour has been turned on in the PID variable class, then a single event is spread over all bins in the histogram, to ensure that when the PDF is used by the PID, there will no empty bins, thus avoiding cases where the log-likelihood takes the log of zero. The histogram is then normalised to create the PDF, which is then written and saved to file.
167
If a MC track returns a variable value outside of the allowed range of the histogram (as defined within the variable class) then the value for that track is not included.
170
\includegraphics[width=3in]{reconstruction/globalpid/PDFflow.pdf}
171
\caption{Flow chart detailing steps taken in ReduceCppGlobaPID}
176
\section{PID Variables}
178
Information from the MICE detectors will be incorporated into a set of
179
PID variables that can be used to distinguish between particle
181
The Global PID framework has been written such that any number of PID
182
variables can be developed and added as necessary, all represented by
183
their own class, derived from a base class.
185
\subsection{PID Base Class}
187
The base PID class (PIDBase.hh and .cc) contains the functions to:
189
\item Create the PDFs (and the files that contain them)
190
\item Use the PDFs with globally reconstructed tracks
191
\item Populate the PDFs with variable values (after checking that
193
\item Perform the log-likelihood for an incoming globally reconstructed
194
track (after checking that value of variable for track falls within
196
\item Calculate the value of the PID variable (this is a virtual
197
function to be defined in the derived classes)
200
\subsection{PID Variable Classes}
202
Each PID variable will be implemented in a derived class of the base PID class. Because of how the framework is designed, new variables can be added as they are developed.
204
\subsubsection{Adding PID Variables}
206
In each derived variable class, the following should be included:
208
\item The variable name should be set
209
\item The function to calculate the PID variable should be defined.
210
\item The minimum, maximum, and number of bins for PDFs created using
211
the variable should be set. The values of the minimum and maximum
212
define the allowed range of values that the PID variable can take.
213
\item In some cases it may be necessary to ensure that all bins in a
214
PDF return non zero entries, and so by setting the variable
215
\_nonZeroHistEntries to true, a single event spread accross all bins
219
\subsubsection{PIDVarA}
221
PIDVarA (see PIDVarA.hh and .cc), uses the difference between the times measured at TOF1
222
and TOF0 as its variable. Only for tracks where there is a single TOF0
223
and a single TOF1 time measurement, and for which the time difference
224
between the detectors falls within the minimum and maximum set within
225
the class, will a valid value of the variable be returned. Otherwise,
226
the value of the variable is set to -1, such that it falls outside of
227
the allowed range for the variable, and so variable for the track is
228
not used in PDF production, or in the PID. The PDFs for pions, muons and positrons are
229
shown on the same plot in figure ~\ref{tofplot}, allowing for the separation between the peaks
230
for each particle to be seen. A point to note with PIDVarA is that it is momentum dependent, and
231
so may not be included as a final PID variable.
235
\includegraphics[width=4.5in]{reconstruction/globalpid/tofplot.jpg}
236
\caption{Difference between times at TOF0 and TOF1, for 200 Mev/c muons, pions and positrons.}
241
\subsubsection{PIDVarB}
243
PIDVarB uses the correlation between the momentum measured in the upstream tracker
244
and the difference between the times measured at TOF1 and TOF0 as its variable, returning the two
245
values as a std pair. It places the same constraints on accepted tracks as those used in PIDVarA, that
246
there must be a single TOF0 and TOF1 time measurement, and for which the time difference
247
between the detectors falls within the minimum and maximum set within
248
the class, will a valid value of the variable be returned. The tracker must also return a
249
valid momentum measurement for a valid variable to be returned. Otherwise,
250
the value of the variable is set to (-1,-1), such that it falls outside of
251
the allowed range for the variable, and so variable for the track is
252
not used in PDF production, or in the PID. The PDFs for pions, muons and positrons are
253
shown on the same plot in figure ~\ref{tofplot}, showing the separation between their distributions.
257
\includegraphics[width=4.5in]{reconstruction/globalpid/toftrackerplot.jpg}
258
\caption{Difference between times at TOF0 and TOF1 vs momentum measured in upstream tracker, for 200 Mev/c muons, pions and positrons.}
259
\label{toftrackerplot}