1
function outObj = GPUTypeFromToGL(cmd, inObj, glObjType, outObj, keepmapped, mapflags)
2
% outObj = GPUTypeFromToGL(cmd, inObj [, glObjType][, outObj][, keepmapped][, mapflags])
4
% Note: Calling this command requires calling the following command first
5
% to initialize Psychtoolbox GPU computing support:
7
% PsychImaging('AddTask', 'General', 'UseGPGPUCompute', 'GPUmat');
9
% Supported 'cmd' commands:
10
% -------------------------
12
% if cmd is zero, then convert an OpenGL object of type glObjType,
13
% referenced by handle inObj into a GPU object and return it in outObj. If
14
% the optional outObj is provided as input argument, try to recycle it --
15
% just fill its content with OpenGL object's content. Otherwise, create an
16
% outObj of matching format for content. If 'keepmapped' is set to 1, the OpenGL
17
% object will stay mapped for the GPU compute api, otherwise it gets immediately
18
% unmapped after the conversion. Keeping the object mapped is more efficient, but
19
% requires more careful management of objects to prevent malfunctions.
21
% If cmd is == 1, then convert GPU object inObj to OpenGL object of type
22
% glObjType and return it in outObj. Try to recycle a passed in outObj, if
23
% possible, otherwise create a new one. 'keepmapped' - see explanation for cmd zero.
25
% If cmd is == 2, then unmap the OpenGL object. You must do this if you previously
26
% set the optional 'keepmapped' flag to 1 during a copy operation and now want to
27
% use the object which was the source or the target of that copy again with OpenGL
28
% or Screen(), ie., with a Psychtoolbox drawing or image processing function.
29
% Unmapping is neccessary for proper OpenGL operation, but costs a fraction of a
30
% millisecond of overhead on well working operating systems like Linux. Clever use
31
% of the 'keepmapped' flag and this manual unmapping method sometimes allows to
32
% save some redundant unmap calls.
34
% If cmd is == 3, then remove the OpenGL object from use by the GPU compute toolkit.
35
% This must be done before destroying/deleting the OpenGL object, e.g., before
36
% a call to Screen('Close', x); for a window or texture handle x. This operation
37
% can be very expensive -- on the order of multiple milliseconds, so use sparingly.
39
% If cmd is == 4, all OpenGL objects are removed. Usually used before closing (all)
40
% onscreen windows, e.g., via Screen('CloseAll') or sca. This cache flush is very
43
% If cmd is == 5, then the given OpenGL object 'inObj' of type glObjType is mapped
44
% and a CUDA memory pointer is returned, for use with external mex files, so these
45
% can directly access the mapped resource. The object is mapped read-only.
47
% If cmd is == 6, the same operation as cmd == 5 happens, but the object is mapped
50
% glObjType == 0 (default): Provided OpenGL object is a Psychtoolbox
51
% texture or offscreen window handle.
53
% glObjType == 1: Provided inObj is a struct which defines the low-level
54
% OpenGL object, which can be a texture or a renderbuffer. The struct must
55
% have the following fields:
57
% texstruct.glhandle == OpenGL object handle.
58
% texstruct.gltarget == Target: texture target or renderbuffer.
59
% texstruct.width == Width in texels/pixels.
60
% texstruct.height == Height in texels/pixels.
61
% texstruct.bpp == Bytes per texel/pixel/element.
62
% texstruct.nrchannels == Number of layers / color channels.
64
% -> glObjType 1 can be used in places where no calls to Screen() functions
65
% are allowed or possible, e.g., inside the imaging pipeline, or 3rd
66
% party low-level OpenGL code.
69
% Note: If you pass in a Psychtoolbox texture, it should be already in
70
% normalized orientation (upright and in row-major format). This is a given
71
% if the texture was created via Screen('SetOpenGLTexture') or
72
% Screen('SetOpenGLTextureFromMemPointer'); or if your texture is actually
73
% an offscreen window created via Screen('OpenOffscreenWindow'). If your
74
% texture is created via Screen('MakeTexture') you need to usually set the
75
% optional 'textureOrientation' flag to 1, unless you've pretransposed the
76
% Matlab/octave image matrix (setting of 2 is fine), or it is entirely
77
% isotropic (setting of 3 is fine). If you get your texture from a movie
78
% file, you need to pass the optional 'specialFlags1' parameter in
79
% Screen('OpenMovie') as 16.
81
% If you get the texture from the video capture engine, you need to pass
82
% the optional 'recordingflags' to 2048 in a call to
83
% Screen('OpenVideoCapture').
86
% glObjType == 2: Read or write from/to current virtual framebuffer for a
87
% given onscreen window handle passed as 'inObj', specifically the 1st
91
% glObjType == 3: Read or write from/to currently bound FBO, specifically
92
% the 1st color attachment. 'inObj' or 'outObj' doesn't really have a
93
% meaning here, as we always query the current binding.
96
% Current Limitations:
98
% Currently only supports the GPUmat toolbox as hard-coded backend:
99
% (http://sourceforge.net/projects/gpumat/)
101
% In the future it should support more GPGPU backends and allow dynamic
102
% detection and/or runtime selection of backends. Possible candidates are,
103
% e.g., AccelerEyes "Jacket", low-level CUDA or OpenCL, and other toolkits
104
% based on CUDA or OpenCL, as well as our own to-be-done backed.
106
% Only really supports 32 bpc floating point precision textures and
107
% renderbuffers. This because this single precision float format is the
108
% only format common to both OpenGL and our one and only GPUmat backend.
109
% One can provide RGBA8 4-layer textures/renderbuffers, but these will be
110
% interpreted by the backend as single layer (luminance) single precision
111
% float matrix. Special CUDA kernels would be required in GPUmat to depack
112
% each apparent float pixel into a RGBA8 interleaved pixel for meaningful
113
% processing. Otherwise hilarious results will ensue.
115
% CUDA-5.0 interop as used by GPUmat currently only supports 1-layer,
116
% 2-layer and 4-layer textures and renderbuffers, ie., L, LA and RGBA, but
121
% 30.01.2013 mk Written.
122
% 15.04.2013 mk Require use of PsychImaging(..., 'UseGPGPUCompute', ...);
126
persistent initialized;
128
% This global variable signals if a GPGPU compute api is enabled, and which
129
% one. It gets initialized by PsychImaging() if usercode requests GPGPU
130
% compute support: 0 = None, 1 = GPUmat.
131
global psych_gpgpuapi;
133
if isempty(initialized)
134
% Make sure GPGPU computing got enabled by PsychImaging and GPU api
135
% type 1, the GPUmat toolbox, is in use:
136
if isempty(psych_gpgpuapi) || (psych_gpgpuapi ~= 1)
137
error('GPGPU computing via GPUmat toolbox not enabled! Aborted.');
141
InitializeMatlabOpenGL([], [], 1);
148
if nargin < 1 || isempty(cmd) || ~isscalar(cmd) || ~isnumeric(cmd)
149
error('Missing or invalid minimum required argument "cmd".');
154
% Copy from OpenGL to GPU backend:
158
% Copy from GPU backend to OpenGL:
162
% Unmap object from cache:
167
% Purge object from cache:
178
% Retrieve mapped pointers for reading from OpenGL:
182
% Retrieve mapped pointers for writing to OpenGL:
186
error('Invalid cmd specified.');
190
error('Missing required 2nd argument "inObj".');
193
if nargin < 3 || isempty(glObjType)
194
% Psychtoolbox "classic" texture handle or offscreen window handle:
198
% No outObj provided for recycling?
203
if nargin < 5 || isempty(keepmapped)
207
if nargin < 6 || isempty(mapflags)
212
% OpenGL -> GPU => gpu is outObj, if any.
215
% GPU -> OpenGL => gpu is inObj:
218
error('Empty GPUtype inObj variable provided, but update of OpenGL object requested. How is this supposed to work?!?');
221
% Impedance matching code. Try to massage input 'gpu' variable into a
222
% format that is compatible with CUDA-OpenGL interop and OpenGL itself:
225
% One dimensional vector: Turn into "row-vector"
226
% style single texel row luminance texture:
230
% Reshape into 3-D matrix with two singleton dimensions
231
% to checks further down don't fail due to
232
% size/dimension mismatch:
233
gpu = reshape(gpu, 1, d1, d2);
235
% Two dimensional matrix: Turn into luminance texture,
236
% the 2D size of the two non-singleton dimensions:
241
% Reshape gpu into a 3D matrix with the 1st dimension
242
% being a singleton dimension which represents the
243
% single luminance channel. We need this so that
244
% further checks in the code below don't fail:
245
gpu = reshape(gpu, 1, d2, d1);
247
% 3D matrix, hopefully a width x height x channels
248
% matrix with x = 1 to 4 channels.
253
% Reject any zero-channel textures or more than 4
254
% channel RGBA textures for now:
256
error('Provided 3D input GPU matrix "inObj" has less than 1 or more than 4 elements in 1st dimension, which would result in an unsupported color channel count of < 1 or > 4!');
259
% 3 channel input? This would translate into a 3
260
% channel RGB texture, but at least CUDA-5.0 does not
261
% support this. Extend it into a 4 channel format:
263
% We add a value 1.0 to the fourth channel,
264
% resulting in a alpha channel of 1 for fully
267
gpu = zeros(4, d2, d1, GPUsingle);
268
gpu(1:3, :, :) = oldgpu(1:3, :, :);
272
% Give a performance warning:
273
warning('GPUTYPEFROMTOGL:gputypeRGBtoRGBAautocast', 'Input GPU 3D matrix inObj has 3 elements in 1st dimension, which would result in a unsupported RGB texture! Extending to RGBA texture with A=1.');
277
error('Input argument N-d matrix inObj has more than 3 dimensions. This is unsupported for conversion to OpenGL objects!');
280
% Hopefully 'gpu' is now safe to convert into a OpenGL object.
283
% GL object is a Psychtoolbox texture/offscreen window handle?
287
% OpenGL -> GPUtype conversion:
291
error('No valid Psychtoolbox OpenGL input object provided!');
294
% GPUtype -> OpenGL conversion:
297
% No existing texture object provided as output destination.
298
% Create a 32 bpc float texture of matching format.
299
for win = Screen('Windows')
300
if Screen('WindowKind', win) == 1
305
if isempty(win) || Screen('WindowKind', win) ~= 1
306
error('No onscreen window opened. This does not work without at least one open onscreen window.');
309
% Create a 32 bpc float texture 'float = 2', with no need for
310
% orientation swap (transpose) 'textureOrientation = 3). We
311
% assume all buffers derived from the GPU backend are always in
312
% upright row-major format, like Offscreen windows. Input code
313
% therefore must do needed conversions.
314
texid = Screen('MakeTexture', win, zeros(d1, d2, cc), [], [], 2, 3);
318
% Get OpenGL texture handle and texture target of underlying OpenGL
319
% texture for given Psychtoolbox object handle:
320
[gltexid, gltextarget] = Screen('GetOpenGLTexture', texid, texid);
322
% Retrieve width and height of texture in texels:
323
[width, height] = Screen('Windowsize', texid);
325
glPushAttrib(GL.TEXTURE_BIT);
326
glBindTexture(gltextarget, gltexid);
328
% Query bits per pixel:
329
bpc = glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_RED_SIZE);
331
bpc = glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_LUMINANCE_SIZE);
335
bpp = bpp + glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_GREEN_SIZE);
336
bpp = bpp + glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_BLUE_SIZE);
337
bpp = bpp + glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_ALPHA_SIZE);
339
% Number of channels == Bits per pixel bpp / Bits per component, e.g., RED channel:
340
nrchannels = bpp / bpc;
342
% Translate to bytes per pixel:
345
glBindTexture(gltextarget, 0);
348
% fprintf('nrchannels = %i : Byteperpixel = %i : bpc = %i\n', nrchannels, bpp, bpc);
350
% Override a detected RGB32F format to become a 4-channel RGBA32F format. Why?
351
% Because at least NVidia on Linux silently allocates storage for RGBA32F when asked
352
% for RGB32F, ie., it allocates essentially a RGBX32F format with padding. The problem
353
% is that the system lies about this and reports internal format as RGB32F and bpp
354
% bits per pixel as 96 bpp instead of the real 128 bpp. This would cause us to misallocate
355
% memory and copy the wrong amount, leading to incomplete damaged data transfers. We try
356
% to work around this special case by faking the real format and just hope for the best...
357
if (bpc == 32 && nrchannels == 3)
363
% GL object is a struct with OpenGL object handle, target, and other info?
366
% OpenGL -> GPUtype conversion:
368
if isempty(texstruct)
369
error('No valid Psychtoolbox OpenGL input object provided!');
372
% GPUtype -> OpenGL conversion:
375
% No existing texture object provided as output destination.
376
error('Creating an OpenGL object from a given GPU object type is not yet supported.');
380
if ~isstruct(texstruct)
381
error('No OpenGL info struct for inObj provided! Must be a struct!');
386
gltexid = texstruct.glhandle;
387
gltextarget = texstruct.gltarget;
388
width = texstruct.width;
389
height = texstruct.height;
391
nrchannels = texstruct.nrchannels;
393
error('OpenGL info struct inObj is malformed or misses fields!');
397
% Use currently bound drawBufferFBO if imaging pipeline is active --
398
% accessing the regular onscreen windows virtual framebuffer. inObj is a
399
% onscreen window handle:
402
% OpenGL -> GPUtype conversion:
405
error('No valid Psychtoolbox onscreen window provided!');
408
% GPUtype -> OpenGL conversion:
411
% No existing onscreen window provided as output destination.
412
error('Creating a virtual framebuffer from a given GPU object type is not supported.');
416
% Make sure inObj is a onscreen window, with imaging pipeline active
417
% and in proper format:
418
if Screen('WindowKind', win) ~= 1
419
error('For glObjType 2, inObj must be a valid onscreen window handle. This is something else!');
422
% This queries window properties and binds the FBO for the onscreen
423
% windows virtual framebuffer if it isn't already bound:
424
winfo = Screen('GetWindowInfo', win);
425
if ~bitand(winfo.ImagingMode, kPsychNeedFastBackingStore) || winfo.BitsPerColorComponent < 32
426
error('For glObjType 2, onscreen window must have imaging pipeline enabled with a 32 bpc float framebuffer!');
429
% Proper FBO is bound. Query its color attachment zero, which is the
430
% OpenGL handle of the attached texture or renderbuffer:
431
gltexid = glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_OBJECT_NAME_EXT);
433
% Query type of attachment:
434
gltextarget = glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE_EXT);
437
if gltextarget == GL.TEXTURE
438
% Yes: We only support rectangle textures in the imaging pipeline,
439
% so this is our final target:
440
gltextarget = GL.TEXTURE_RECTANGLE_EXT;
442
% No: A renderbuffer:
443
gltextarget = GL.RENDERBUFFER;
446
% Only 4 channel RGBA32F supported, aka 16 Bytes per pixel:
449
[width, height] = Screen('Windowsize', win);
452
% Use currently bound OpenGL FBO, assuming the imaging pipeline is active
453
% and properly setup -- otherwise we'd crash or screw up.
455
% Proper FBO is hopefully bound. Query its color attachment zero, which
456
% is the OpenGL handle of the attached texture or renderbuffer:
457
gltexid = glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_OBJECT_NAME_EXT);
459
% Query type of attachment:
460
gltextarget = glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE_EXT);
461
if gltextarget == GL.FRAMEBUFFER_DEFAULT
462
error('For glObjType 3, an OpenGL FBO must be bound, not the system default framebuffer, as here!');
465
% Query bits per pixel:
466
bpc = glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_RED_SIZE);
468
bpp = bpp + glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_GREEN_SIZE);
469
bpp = bpp + glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_BLUE_SIZE);
470
bpp = bpp + glGetFramebufferAttachmentParameterivEXT(GL.FRAMEBUFFER_EXT, GL.COLOR_ATTACHMENT0_EXT, GL.FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE);
472
% Number of channels == Bits per pixel bpp / Bits per component, e.g.,
474
nrchannels = bpp / bpc;
476
% Translate to bytes per pixel:
480
if gltextarget == GL.TEXTURE
481
% Yes: We only support rectangle textures in the imaging pipeline,
482
% so this is our final target:
483
% TODO FIXME MK: Technically not quite correct, as at least
484
% Screen('TransformTexture') could also use a GL_TEXTURE_2D target
485
% instead of texture rectangle. However, this is a seldomly used
486
% special case and i don't know at the moment how to find out which
487
% target is actually used.
488
gltextarget = GL.TEXTURE_RECTANGLE_EXT;
490
% Query size width x height of texture image:
491
glPushAttrib(GL.TEXTURE_BIT);
492
glBindTexture(gltextarget, gltexid);
493
width = glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_WIDTH);
494
height = glGetTexLevelParameteriv(gltextarget, 0, GL.TEXTURE_HEIGHT);
495
glBindTexture(gltextarget, 0);
498
% No: A renderbuffer:
499
gltextarget = GL.RENDERBUFFER;
500
glBindRenderbuffer(gltextarget, gltexid);
501
width = glGetRenderbufferParameteriv(gltextarget, GL.RENDERBUFFER_WIDTH);
502
height = glGetRenderbufferParameteriv(gltextarget, GL.RENDERBUFFER_HEIGHT);
503
glBindRenderbuffer(gltextarget, 0);
507
% Unmap or Unregister object from cache?
508
if cmd == 2 || cmd == 3
509
memcpyCudaOpenGL(cmd - 1, gltexid, gltextarget);
513
% Map OpenGL resource, then return a memory pointer in a uint64 for it?
514
if cmd == 5 || cmd == 6
515
% This maps the resource and returns a pointer to it in uint64 outObj:
516
% cmd 5 and 6 are translated into direction values 0 and 1 via 'cmd - 5'. This
517
% is important to get the correct mapping flags for resource mapping (readonly vs.
519
outObj = memcpyCudaOpenGL(4, gltexid, gltextarget, 0, 0, cmd - 5, 1, mapflags);
523
if (nrchannels ~= 1) && (nrchannels ~= 2) && (nrchannels ~= 4)
524
error('Tried to convert a 3 layer RGB texture or framebuffer. This is not supported.');
527
% Number of bytes to copy = w * h * bpp:
528
nrbytes = width * height * bpp;
530
error('Tried to convert an empty texture. Forbidden!');
533
% Is an already existing 'gpu' variable provided for "refill" ?
535
% Yes: Check for matching format. If it doesn't match, delete it, so it
536
% can be recreated with matching format:
537
if ~isa(gpu, 'GPUsingle') || (size(gpu, 1) ~= nrchannels) || (size(gpu, 2) ~= width) || (size(gpu, 3) ~= height) || (0 == getPtr(gpu))
538
% No match, or not allocated. Destroy:
540
% And create empty for recreation below:
543
% This must not happen in GPU -> OpenGL mode:
545
error('Incompatible GPUtype inObj variable provided for update of OpenGL object. How is this supposed to work?!?');
549
% Is gpu a complex matrix - which we can't handle?
551
% Yes: Only extract and convert real part:
554
% Give data-loss / performance warning:
555
warning('GPUTYPEFROMTOGL:gputypeComplexToRealcast', 'Input GPU matrix inObj stores complex numbers, which we cannot store! Throwing away the imaginary component of each complex number!');
559
% Need to create a new gpu variable?
561
% Yes: Create a new GPUsingle GPU variable:
563
% Set it to real format:
566
% Set its size: We *must* double-cast the size vector here, because the
567
% gpuType == 3 path delivers int32's and GPUmat doesn't like this at
568
% all, punishing us with GPUallocVector failure, if we don't cast to
570
setSize(gpu, double([nrchannels, width, height]));
572
% Allocate its CUDA backing memory:
576
% Retrieve CUDA memory pointer to it:
577
gpuptr = getPtr(gpu);
579
error('Memory allocation on GPU failed!');
582
% Perform copy of image content from OpenGL texture into CUDA backing store:
583
memcpyCudaOpenGL(3, gltexid, gltextarget, gpuptr, nrbytes, direction, keepmapped, mapflags);
593
outObj = outObj; %#ok<ASGSL>