~iliaplatone/spacedrone.eu/inova-sis-pack

In addition to this way of layers instantiation, there is a more common factory API (see @ref dnnLayerFactory), it allows to create layers dynamically (by name) and register new ones.

You can use both API, but factory API is less convenient for native C++ programming and basically designed for use inside importers (see @ref readNetFromCaffe(), @ref readNetFromTorch(), @ref readNetFromTensorflow()).

Built-in layers partially reproduce functionality of corresponding Caffe and Torch7 layers.

In particular, the following layers and Caffe importer were tested to reproduce <a href="http://caffe.berkeleyvision.org/tutorial/layers.html">Caffe</a> functionality:

- Convolution

- Deconvolution

- Pooling

- InnerProduct

- TanH, ReLU, Sigmoid, BNLL, Power, AbsVal

- Softmax

- Reshape, Flatten, Slice, Split

- LRN

- MVN

- Dropout (since it does nothing on forward pass -))

class CV_EXPORTS BlankLayer : public Layer

{

public:

static Ptr<Layer> create(const LayerParams &params);

};

/**

* Constant layer produces the same data blob at an every forward pass.

class CV_EXPORTS ConstLayer : public Layer

{

public:

static Ptr<Layer> create(const LayerParams &params);

};

//! LSTM recurrent layer

class CV_EXPORTS LSTMLayer : public Layer

{

public:

/** Creates instance of LSTM layer */

static Ptr<LSTMLayer> create(const LayerParams& params);

/** @deprecated Use LayerParams::blobs instead.

@brief Set trained weights for LSTM layer.

LSTM behavior on each step is defined by current input, previous output, previous cell state and learned weights.

100

101

Let @f$x_t@f$ be current input, @f$h_t@f$ be current output, @f$c_t@f$ be current state.

102

Than current output and current cell state is computed as follows:

103

@f{eqnarray*}{

104

h_t &= o_t \odot tanh(c_t), \\

105

c_t &= f_t \odot c_{t-1} + i_t \odot g_t, \\

106

@f}

107

where @f$\odot@f$ is per-element multiply operation and @f$i_t, f_t, o_t, g_t@f$ is internal gates that are computed using learned wights.

108

109

Gates are computed as follows:

110

@f{eqnarray*}{

111

i_t &= sigmoid&(W_{xi} x_t + W_{hi} h_{t-1} + b_i), \\

112

f_t &= sigmoid&(W_{xf} x_t + W_{hf} h_{t-1} + b_f), \\

113

o_t &= sigmoid&(W_{xo} x_t + W_{ho} h_{t-1} + b_o), \\

114

g_t &= tanh &(W_{xg} x_t + W_{hg} h_{t-1} + b_g), \\

115

@f}

116

where @f$W_{x?}@f$, @f$W_{h?}@f$ and @f$b_{?}@f$ are learned weights represented as matrices:

117

@f$W_{x?} \in R^{N_h \times N_x}@f$, @f$W_{h?} \in R^{N_h \times N_h}@f$, @f$b_? \in R^{N_h}@f$.

118

119

For simplicity and performance purposes we use @f$ W_x = [W_{xi}; W_{xf}; W_{xo}, W_{xg}] @f$

120

(i.e. @f$W_x@f$ is vertical concatenation of @f$ W_{x?} @f$), @f$ W_x \in R^{4N_h \times N_x} @f$.

121

The same for @f$ W_h = [W_{hi}; W_{hf}; W_{ho}, W_{hg}], W_h \in R^{4N_h \times N_h} @f$

122

and for @f$ b = [b_i; b_f, b_o, b_g]@f$, @f$b \in R^{4N_h} @f$.

123

124

@param Wh is matrix defining how previous output is transformed to internal gates (i.e. according to above mentioned notation is @f$ W_h @f$)

125

@param Wx is matrix defining how current input is transformed to internal gates (i.e. according to above mentioned notation is @f$ W_x @f$)

126

@param b is bias vector (i.e. according to above mentioned notation is @f$ b @f$)

127

128

CV_DEPRECATED virtual void setWeights(const Mat &Wh, const Mat &Wx, const Mat &b) = 0;

129

130

/** @brief Specifies shape of output blob which will be [[`T`], `N`] + @p outTailShape.

131

* @details If this parameter is empty or unset then @p outTailShape = [`Wh`.size(0)] will be used,

132

* where `Wh` is parameter from setWeights().

133

134

virtual void setOutShape(const MatShape &outTailShape = MatShape()) = 0;

135

136

/** @deprecated Use flag `produce_cell_output` in LayerParams.

137

* @brief Specifies either interpret first dimension of input blob as timestamp dimenion either as sample.

138

139

* If flag is set to true then shape of input blob will be interpreted as [`T`, `N`, `[data dims]`] where `T` specifies number of timestamps, `N` is number of independent streams.

140

* In this case each forward() call will iterate through `T` timestamps and update layer's state `T` times.

141

142

* If flag is set to false then shape of input blob will be interpreted as [`N`, `[data dims]`].

143

* In this case each forward() call will make one iteration and produce one timestamp with shape [`N`, `[out dims]`].

144

145

CV_DEPRECATED virtual void setUseTimstampsDim(bool use = true) = 0;

146

147

/** @deprecated Use flag `use_timestamp_dim` in LayerParams.

148

* @brief If this flag is set to true then layer will produce @f$ c_t @f$ as second output.

149

* @details Shape of the second output is the same as first output.

150

151

CV_DEPRECATED virtual void setProduceCellOutput(bool produce = false) = 0;

152

153

/* In common case it use single input with @f$x_t@f$ values to compute output(s) @f$h_t@f$ (and @f$c_t@f$).

154

* @param input should contain packed values @f$x_t@f$

155

* @param output contains computed outputs: @f$h_t@f$ (and @f$c_t@f$ if setProduceCellOutput() flag was set to true).

156

157

* If setUseTimstampsDim() is set to true then @p input[0] should has at least two dimensions with the following shape: [`T`, `N`, `[data dims]`],

158

* where `T` specifies number of timestamps, `N` is number of independent streams (i.e. @f$ x_{t_0 + t}^{stream} @f$ is stored inside @p input[0][t, stream, ...]).

159

160

* If setUseTimstampsDim() is set to false then @p input[0] should contain single timestamp, its shape should has form [`N`, `[data dims]`] with at least one dimension.

161

* (i.e. @f$ x_{t}^{stream} @f$ is stored inside @p input[0][stream, ...]).

162

163

164

int inputNameToIndex(String inputName) CV_OVERRIDE;

165

int outputNameToIndex(const String& outputName) CV_OVERRIDE;

166

};

167

168

/** @brief Classical recurrent layer

169

170

Accepts two inputs @f$x_t@f$ and @f$h_{t-1}@f$ and compute two outputs @f$o_t@f$ and @f$h_t@f$.

171

172

- input: should contain packed input @f$x_t@f$.

173

- output: should contain output @f$o_t@f$ (and @f$h_t@f$ if setProduceHiddenOutput() is set to true).

174

175

input[0] should have shape [`T`, `N`, `data_dims`] where `T` and `N` is number of timestamps and number of independent samples of @f$x_t@f$ respectively.

176

177

output[0] will have shape [`T`, `N`, @f$N_o@f$], where @f$N_o@f$ is number of rows in @f$ W_{xo} @f$ matrix.

178

179

If setProduceHiddenOutput() is set to true then @p output[1] will contain a Mat with shape [`T`, `N`, @f$N_h@f$], where @f$N_h@f$ is number of rows in @f$ W_{hh} @f$ matrix.

180

181

class CV_EXPORTS RNNLayer : public Layer

182

{

183

public:

184

/** Creates instance of RNNLayer */

185

static Ptr<RNNLayer> create(const LayerParams& params);

186

187

/** Setups learned weights.

188

189

Recurrent-layer behavior on each step is defined by current input @f$ x_t @f$, previous state @f$ h_t @f$ and learned weights as follows:

190

@f{eqnarray*}{

191

h_t &= tanh&(W_{hh} h_{t-1} + W_{xh} x_t + b_h), \\

192

o_t &= tanh&(W_{ho} h_t + b_o),

193

@f}

194

195

@param Wxh is @f$ W_{xh} @f$ matrix

196

@param bh is @f$ b_{h} @f$ vector

197

@param Whh is @f$ W_{hh} @f$ matrix

198

@param Who is @f$ W_{xo} @f$ matrix

199

@param bo is @f$ b_{o} @f$ vector

200

201

virtual void setWeights(const Mat &Wxh, const Mat &bh, const Mat &Whh, const Mat &Who, const Mat &bo) = 0;

202

203

/** @brief If this flag is set to true then layer will produce @f$ h_t @f$ as second output.

204

* @details Shape of the second output is the same as first output.

205

206

virtual void setProduceHiddenOutput(bool produce = false) = 0;

207

208

};

209

210

class CV_EXPORTS BaseConvolutionLayer : public Layer

211

{

212

public:

213

CV_DEPRECATED_EXTERNAL Size kernel, stride, pad, dilation, adjustPad;

214

std::vector<size_t> adjust_pads;

215

std::vector<size_t> kernel_size, strides, dilations;

216

std::vector<size_t> pads_begin, pads_end;

217

String padMode;

218

int numOutput;

219

};

220

221

class CV_EXPORTS ConvolutionLayer : public BaseConvolutionLayer

222

{

223

public:

224

static Ptr<BaseConvolutionLayer> create(const LayerParams& params);

225

};

226

227

class CV_EXPORTS DeconvolutionLayer : public BaseConvolutionLayer

228

{

229

public:

230

static Ptr<BaseConvolutionLayer> create(const LayerParams& params);

231

};

232

233

class CV_EXPORTS LRNLayer : public Layer

234

{

235

public:

236

int type;

237

238

int size;

239

float alpha, beta, bias;

240

bool normBySize;

241

242

static Ptr<LRNLayer> create(const LayerParams& params);

243

};

244

245

class CV_EXPORTS PoolingLayer : public Layer

246

{

247

public:

248

int type;

249

std::vector<size_t> kernel_size, strides;

250

std::vector<size_t> pads_begin, pads_end;

251

CV_DEPRECATED_EXTERNAL Size kernel, stride, pad;

252

CV_DEPRECATED_EXTERNAL int pad_l, pad_t, pad_r, pad_b;

253

bool globalPooling;

254

bool computeMaxIdx;

255

String padMode;

256

bool ceilMode;

257

// If true for average pooling with padding, divide an every output region

258

// by a whole kernel area. Otherwise exclude zero padded values and divide

259

// by number of real values.

260

bool avePoolPaddedArea;

261

// ROIPooling parameters.

262

Size pooledSize;

263

float spatialScale;

264

// PSROIPooling parameters.

265

int psRoiOutChannels;

266

267

static Ptr<PoolingLayer> create(const LayerParams& params);

268

};

269

270

class CV_EXPORTS SoftmaxLayer : public Layer

271

{

272

public:

273

bool logSoftMax;

274

275

static Ptr<SoftmaxLayer> create(const LayerParams& params);

276

};

277

278

class CV_EXPORTS InnerProductLayer : public Layer

279

{

280

public:

281

int axis;

282

static Ptr<InnerProductLayer> create(const LayerParams& params);

283

};

284

285

class CV_EXPORTS MVNLayer : public Layer

286

{

287

public:

288

float eps;

289

bool normVariance, acrossChannels;

290

291

static Ptr<MVNLayer> create(const LayerParams& params);

292

};

293

294

/* Reshaping */

295

296

class CV_EXPORTS ReshapeLayer : public Layer

297

{

298

public:

299

MatShape newShapeDesc;

300

Range newShapeRange;

301

302

static Ptr<ReshapeLayer> create(const LayerParams& params);

303

};

304

305

class CV_EXPORTS FlattenLayer : public Layer

306

{

307

public:

308

static Ptr<FlattenLayer> create(const LayerParams &params);

309

};

310

311

class CV_EXPORTS ConcatLayer : public Layer

312

{

313

public:

314

int axis;

315

/**

316

* @brief Add zero padding in case of concatenation of blobs with different

317

* spatial sizes.

318

319

* Details: https://github.com/torch/nn/blob/master/doc/containers.md#depthconcat

320

321

bool padding;

322

323

static Ptr<ConcatLayer> create(const LayerParams &params);

324

};

325

326

class CV_EXPORTS SplitLayer : public Layer

327

{

328

public:

329

int outputsCount; //!< Number of copies that will be produced (is ignored when negative).

330

331

static Ptr<SplitLayer> create(const LayerParams &params);

332

};

333

334

/**

335

* Slice layer has several modes:

336

* 1. Caffe mode

337

* @param[in] axis Axis of split operation

338

* @param[in] slice_point Array of split points

339

340

* Number of output blobs equals to number of split points plus one. The

341

* first blob is a slice on input from 0 to @p slice_point[0] - 1 by @p axis,

342

* the second output blob is a slice of input from @p slice_point[0] to

343

* @p slice_point[1] - 1 by @p axis and the last output blob is a slice of

344

* input from @p slice_point[-1] up to the end of @p axis size.

345

346

* 2. TensorFlow mode

347

* @param begin Vector of start indices

348

* @param size Vector of sizes

349

350

* More convenient numpy-like slice. One and only output blob

351

* is a slice `input[begin[0]:begin[0]+size[0], begin[1]:begin[1]+size[1], ...]`

352

353

* 3. Torch mode

354

* @param axis Axis of split operation

355

356

* Split input blob on the equal parts by @p axis.

357

358

class CV_EXPORTS SliceLayer : public Layer

359

{

360

public:

361

/**

362

* @brief Vector of slice ranges.

363

364

* The first dimension equals number of output blobs.

365

* Inner vector has slice ranges for the first number of input dimensions.

366

367

std::vector<std::vector<Range> > sliceRanges;

368

int axis;

369

370

static Ptr<SliceLayer> create(const LayerParams &params);

371

};

372

373

class CV_EXPORTS PermuteLayer : public Layer

374

{

375

public:

376

static Ptr<PermuteLayer> create(const LayerParams& params);

377

};

378

379

/**

380

* Permute channels of 4-dimensional input blob.

381

* @param group Number of groups to split input channels and pick in turns

382

* into output blob.

383

384

* \f[ groupSize = \frac{number\ of\ channels}{group} \f]

385

* \f[ output(n, c, h, w) = input(n, groupSize \times (c \% group) + \lfloor \frac{c}{group} \rfloor, h, w) \f]

386

* Read more at https://arxiv.org/pdf/1707.01083.pdf

387

388

class CV_EXPORTS ShuffleChannelLayer : public Layer

389

{

390

public:

391

static Ptr<Layer> create(const LayerParams& params);

392

393

int group;

394

};

395

396

/**

397

* @brief Adds extra values for specific axes.

398

* @param paddings Vector of paddings in format

399

* @code

400

* [ pad_before, pad_after, // [0]th dimension

401

* pad_before, pad_after, // [1]st dimension

402

* ...

403

* pad_before, pad_after ] // [n]th dimension

404

* @endcode

405

* that represents number of padded values at every dimension

406

* starting from the first one. The rest of dimensions won't

407

* be padded.

408

* @param value Value to be padded. Defaults to zero.

409

* @param type Padding type: 'constant', 'reflect'

410

* @param input_dims Torch's parameter. If @p input_dims is not equal to the

411

* actual input dimensionality then the `[0]th` dimension

412

* is considered as a batch dimension and @p paddings are shifted

413

* to a one dimension. Defaults to `-1` that means padding

414

* corresponding to @p paddings.

415

416

class CV_EXPORTS PaddingLayer : public Layer

417

{

418

public:

419

static Ptr<PaddingLayer> create(const LayerParams& params);

420

};

421

422

/* Activations */

423

class CV_EXPORTS ActivationLayer : public Layer

424

{

425

public:

426

virtual void forwardSlice(const float* src, float* dst, int len,

427

size_t outPlaneSize, int cn0, int cn1) const = 0;

428

};

429

430

class CV_EXPORTS ReLULayer : public ActivationLayer

431

{

432

public:

433

float negativeSlope;

434

435

static Ptr<ReLULayer> create(const LayerParams &params);

436

};

437

438

class CV_EXPORTS ReLU6Layer : public ActivationLayer

439

{

440

public:

441

float minValue, maxValue;

442

443

static Ptr<ReLU6Layer> create(const LayerParams &params);

444

};

445

446

class CV_EXPORTS ChannelsPReLULayer : public ActivationLayer

447

{

448

public:

449

static Ptr<Layer> create(const LayerParams& params);

450

};

451

452

class CV_EXPORTS ELULayer : public ActivationLayer

453

{

454

public:

455

static Ptr<ELULayer> create(const LayerParams &params);

456

};

457

458

class CV_EXPORTS TanHLayer : public ActivationLayer

459

{

460

public:

461

static Ptr<TanHLayer> create(const LayerParams &params);

462

};

463

464

class CV_EXPORTS SigmoidLayer : public ActivationLayer

465

{

466

public:

467

static Ptr<SigmoidLayer> create(const LayerParams &params);

468

};

469

470

class CV_EXPORTS BNLLLayer : public ActivationLayer

471

{

472

public:

473

static Ptr<BNLLLayer> create(const LayerParams &params);

474

};

475

476

class CV_EXPORTS AbsLayer : public ActivationLayer

477

{

478

public:

479

static Ptr<AbsLayer> create(const LayerParams &params);

480

};

481

482

class CV_EXPORTS PowerLayer : public ActivationLayer

483

{

484

public:

485

float power, scale, shift;

486

487

static Ptr<PowerLayer> create(const LayerParams &params);

488

};

489

490

/* Layers used in semantic segmentation */

491

492

class CV_EXPORTS CropLayer : public Layer

493

{

494

public:

495

static Ptr<Layer> create(const LayerParams &params);

496

};

497

498

class CV_EXPORTS EltwiseLayer : public Layer

499

{

500

public:

501

static Ptr<EltwiseLayer> create(const LayerParams &params);

502

};

503

504

class CV_EXPORTS BatchNormLayer : public ActivationLayer

505

{

506

public:

507

bool hasWeights, hasBias;

508

float epsilon;

509

510

static Ptr<BatchNormLayer> create(const LayerParams &params);

511

};

512

513

class CV_EXPORTS MaxUnpoolLayer : public Layer

514

{

515

public:

516

Size poolKernel;

517

Size poolPad;

518

Size poolStride;

519

520

static Ptr<MaxUnpoolLayer> create(const LayerParams &params);

521

};

522

523

class CV_EXPORTS ScaleLayer : public Layer

524

{

525

public:

526

bool hasBias;

527

int axis;

528

529

static Ptr<ScaleLayer> create(const LayerParams& params);

530

};

531

532

class CV_EXPORTS ShiftLayer : public Layer

533

{

534

public:

535

static Ptr<Layer> create(const LayerParams& params);

536

};

537

538

class CV_EXPORTS PriorBoxLayer : public Layer

539

{

540

public:

541

static Ptr<PriorBoxLayer> create(const LayerParams& params);

542

};

543

544

class CV_EXPORTS ReorgLayer : public Layer

545

{

546

public:

547

static Ptr<ReorgLayer> create(const LayerParams& params);

548

};

549

550

class CV_EXPORTS RegionLayer : public Layer

551

{

552

public:

553

static Ptr<RegionLayer> create(const LayerParams& params);

554

};

555

556

class CV_EXPORTS DetectionOutputLayer : public Layer

557

{

558

public:

559

static Ptr<DetectionOutputLayer> create(const LayerParams& params);

560

};

561

562

/**

563

* @brief \f$ L_p \f$ - normalization layer.

564

* @param p Normalization factor. The most common `p = 1` for \f$ L_1 \f$ -

565

* normalization or `p = 2` for \f$ L_2 \f$ - normalization or a custom one.

566

* @param eps Parameter \f$ \epsilon \f$ to prevent a division by zero.

567

* @param across_spatial If true, normalize an input across all non-batch dimensions.

568

* Otherwise normalize an every channel separately.

569

570

* Across spatial:

571

* @f[

572

* norm = \sqrt[p]{\epsilon + \sum_{x, y, c} |src(x, y, c)|^p } \\

573

* dst(x, y, c) = \frac{ src(x, y, c) }{norm}

574

* @f]

575

576

* Channel wise normalization:

577

* @f[

578

* norm(c) = \sqrt[p]{\epsilon + \sum_{x, y} |src(x, y, c)|^p } \\

579

* dst(x, y, c) = \frac{ src(x, y, c) }{norm(c)}

580

* @f]

581

582

* Where `x, y` - spatial coordinates, `c` - channel.

583

584

* An every sample in the batch is normalized separately. Optionally,

585

* output is scaled by the trained parameters.

586

587

class CV_EXPORTS NormalizeBBoxLayer : public Layer

588

{

589

public:

590

float pnorm, epsilon;

591

CV_DEPRECATED_EXTERNAL bool acrossSpatial;

592

593

static Ptr<NormalizeBBoxLayer> create(const LayerParams& params);

594

};

595

596

/**

597

* @brief Resize input 4-dimensional blob by nearest neighbor or bilinear strategy.

598

599

* Layer is used to support TensorFlow's resize_nearest_neighbor and resize_bilinear ops.

600

601

class CV_EXPORTS ResizeLayer : public Layer

602

{

603

public:

604

static Ptr<ResizeLayer> create(const LayerParams& params);

605

};

606

607

/**

608

* @brief Bilinear resize layer from https://github.com/cdmh/deeplab-public-ver2

609

610

* It differs from @ref ResizeLayer in output shape and resize scales computations.

611

612

class CV_EXPORTS InterpLayer : public Layer

613

{

614

public:

615

static Ptr<Layer> create(const LayerParams& params);

616

};

617

618

class CV_EXPORTS ProposalLayer : public Layer

619

{

620

public:

621

static Ptr<ProposalLayer> create(const LayerParams& params);

622

};

623

624

class CV_EXPORTS CropAndResizeLayer : public Layer

625

{

626

public:

627

static Ptr<Layer> create(const LayerParams& params);

628

};

629

630

//! @}

631

//! @}

632

CV__DNN_INLINE_NS_END

633

}

634

}

635

#endif

Older »