What’s New in iOS 10, tvOS 10, and macOS 10.12
This chapter summarizes the new features introduced in iOS 10, tvOS 10, and macOS 10.12.
New Metal Feature Sets
The new Metal feature sets are listed as follows:
To determine whether a feature set is supported by a device, query the existingsupportsFeatureSet:
method of aMTLDevice
object.
Each new feature described in this chapter is annotated with its feature set availability. For further information about feature availability, implementation limits, and pixel format capabilities for all feature sets, see theMetal Feature Set Tablespage.
All new Metal shading language features are available in version 1.2 (MTLLanguageVersion1_2
). For further information, see theMetal Shading Language Guide.
Tessellation
For a complete overview of tessellation in Metal, see theTessellationchapter.
Resource Heaps
For a complete overview of resource heaps in Metal, see theResource Heapschapter.
Memoryless Render Targets
Available in:iOS_GPUFamily1_v3
,iOS_GPUFamily2_v3
,iOS_GPUFamily3_v2
,tvOS_GPUFamily1_v2
Memoryless render targets are render targets that exist only transiently in on-GPU tile memory, without any other CPU or GPU memory backing. Memoryless render targets satisfy the increased memory demands of high-resolution displays and MSAA data, allowing you to:
Prevent wasting memory that is only needed for temporary render targets (color, depth, and stencil).
Increase image quality by using higher MSAA levels in the same memory budget.
To create a memoryless render target, set thestorageMode
property of aMTLTextureDescriptor
object toMTLStorageModeMemoryless
. Then, use this descriptor to create aMTLTexture
object.
OnlyMTLTexture
objects can be created with aMTLStorageModeMemoryless
storage mode. A memoryless render target can only be used by a temporary render target; to do so, set it as thetexture
property of aMTLRenderPassAttachmentDescriptor
object.
Use Cases
The following use cases are just a few examples of temporary render targets that can be used as memoryless render targets. For each of these, the contents of the render targets are never used after the rendering operations are completed:
Traditional depth testing, using a depth render target with a
MTLStoreActionDontCare
store action.Traditional stencil buffer operations, using a stencil render target with a
MTLStoreActionDontCare
store action.MSAA rendering, using a MSAA color render target with a
MTLStoreActionMultisampleResolve
store action.Deferred rendering, using two logical rendering passes within a single render command encoder:
The first pass populates the temporary G-buffer render targets with albedo, normal, and other data.
The second pass reads the temporary G-buffer render targets, then computes and accumulates lighting data to output the final color to a persistent render target.
Each temporary render target is a memoryless render target and has a
MTLStoreActionDontCare
store action.Deferred lighting, using three logical rendering passes within a single render command encoder:
The first pass populates the temporary depth and normal render targets with geometry data.
The second pass reads the previous temporary render targets, then computes and accumulates lighting data to populate the temporary diffuse and specular render targets.
The third pass reads all previous temporary render targets, then computes the lighting equation to output the final color to a persistent render target.
Each temporary render target is a memoryless render target and has a
MTLStoreActionDontCare
store action.
Rules and Restrictions
Memoryless render targets must adhere to the following rules and restrictions. Memoryless render targets...
Can be part of a heap, but they cannot be aliased. For further information, see theResource Heapschapter.
Must have a renderable color, depth, or stencil pixel format.
Can only be populated by a rendering pass.
Must have a
MTLTextureType2D
orMTLTextureType2DMultisample
texture type.Can only be used for the
texture
property of aMTLRenderPassAttachmentDescriptor
object.Cannot be used for the
resolveTexture
property of aMTLRenderPassAttachmentDescriptor
object.Must be used by a
MTLRenderPassAttachmentDescriptor
object with aMTLLoadActionDontCare
orMTLLoadActionClear
load action.Must be used by a
MTLRenderPassAttachmentDescriptor
object with aMTLStoreActionDontCare
orMTLStoreActionMultisampleResolve
store action. TheMTLRenderPassAttachmentDescriptor
object can also have an initialMTLStoreActionUnknown
store action, but this must be changed before its associated render command encoder ends encoding. For further information, see theDeferred Store Actionsection.Cannot be read from or written to by any methods in the
MTLTexture
protocol.Cannot be used as a parent texture to create a texture view.
Cannot be used by a
MTLBlitCommandEncoder
object. All blit operations are disallowed.Cannot be used by a
MTLComputeCommandEncoder
object. All compute operations are disallowed.Can be read by a fragment function using a framebuffer fetch. For further information, see theProgrammable Blendingsection of theMetal Shading Language Guide.
Apps using memoryless render targets should carefully control the total amount of data passed for processing in a single rendering pass. Memoryless render targets use on-GPU tile memory for their temporary storage; as long as all data associated with the draw calls issued for a rendering pass can be cached, memoryless render targets will be processed one tile at a time. To ensure successful usage of memoryless render targets, make sure you follow these additional rules:
All resources referenced in the rendering pass shouldn’t consume more physical memory than available.
You should not submit more than 64K unique viewports, scissors, and depth-bias values.
The command buffer will report any memory errors via theMTLCommandBufferErrorMemoryless
error code.
Function Specialization
Available in:iOS_GPUFamily1_v3
,iOS_GPUFamily2_v3
,iOS_GPUFamily3_v2
,tvOS_GPUFamily1_v2
,OSX_GPUFamily1_v2
Function specialization uses function constants to create specialized versions of your graphics and compute functions. Function constants are compile-time constants declared in your Metal shading language source; function constant values are assigned in your Metal app before the specialized function is compiled.
Note: Function constants are an improved replacement for pre-processor macros, as described in theProgram Scope Function Constantssection of theMetal Shading Language Guide.
Declaring Function Constants
The Metal shading language provides the[[ function_constant(index) ]]
attribute to declare function constants, as shown inListing 11-1.
Listing 11-1 Declaring a function constant
constant bool a [[ function_constant(0) ]]; |
---|
Function constants can be used to:
Control which function code paths get compiled
Specify optional arguments of a function
Specify optional elements of a struct declared with the
[[ stage_in ]]
qualifier
Setting Constant Values
The Metal framework provides theMTLFunctionConstantValues
class to set the constant values for a specialized function. Constant values can be set by index, index range, or name.Listing 11-2shows how to set a constant value by index.
Listing 11-2 Setting a constant value by index
const bool a = true; |
---|
MTLFunctionConstantValues* constantValues = [MTLFunctionConstantValues new]; |
[constantValues setConstantValue:&a type:MTLDataTypeBool atIndex:0]; |
A singleMTLFunctionConstantValues
object can be applied to multipleMTLFunction
objects (for example, a vertex function and a fragment function). After a specialized function has been created, any changes to its constant values have no further effect on it. However, you can reset, add, or modify any constant values in theMTLFunctionConstantValues
object and reuse it to create anotherMTLFunction
object.
Compiling Specialized Functions
Specialized functions areMTLFunction
objects created from aMTLLibrary
object by calling one of these methods:
These methods invoke the Metal compiler to evaluate your function constants and their constant values. The compiler specializes the named function by omitting function constant code paths, arguments, and elements that are not enabled by the constant values provided. Function constant values are first looked up by their index, then by their name. Any values that do not correspond to a function constant in the named function are ignored (without generating errors or warnings).
Note: ThenewFunctionWithName:constantValues:completionHandler:
method compiles a specialized function asynchronously. Use this method to maximize performance and parallelism if your app needs to create multiple specialized functions.
Obtaining Reflection Data
AMTLFunctionConstant
object provides reflection data for function constants. This object should only be obtained if you need the reflection data to set constant values for a specialized function (for example, determining if a function constant is optional or required before setting its constant value). To obtain reflection data, fetch aMTLFunction
object by calling thenewFunctionWithName:
method and querying thefunctionConstants
property. Use this reflection data to set your constant values and compile your specialized function.
Function Resource Read-Writes
Function Buffer Read-Writes
Available in:iOS_GPUFamily3_v2
,OSX_GPUFamily1_v2
Fragment functions can now write to buffers. Writable buffers must be declared in thedevice
address space and must not beconst
. Use dynamic indexing to write to a buffer.
Atomic Functions
Vertex and fragment functions now support atomic functions for buffers (in thedevice
address space). For further information, see theAtomic Functionssection of theMetal Shading Language Guide.
Function Texture Read-Writes
Available in:OSX_GPUFamily1_v2
Both vertex and fragment functions can now write to textures. Writable textures must be declared with theaccess::write
oraccess::read_write
qualifier. Use an appropriate variant of thewrite()
function to write to a texture (wherelod
is always constant and equal to zero).
Read-Write Textures
A read-write texture is a texture that can be both read from and written to by the same vertex, fragment, or kernel function.
Access
A read-write texture is declared in the Metal shading language as a texture with theaccess::read_write
qualifier.Listing 11-3shows a simple read and write operation on a read-write texture, within the same function.
Listing 11-3 Using a read-write texture
kernel void my_kernel(texture2d<float, access::read_write> texA [[ texture(0) ]], |
---|
ushort2 gid [[ thread_position_in_grid ]]) |
{ |
float4 color = texA.read(gid); |
color = processColor(color); |
texA.write(color, gid); |
} |
Note: A read-write texture cannot call asample()
,sample_compare()
,gather()
, orgather_compare()
function.
A read-write texture cannot be declared with atexture2d_ms
,depth2d
,depth2d_array
,depthcube
,depthcube_array
, ordepth2d_ms
texture type.
To set a read-write texture in the graphics or compute function argument table, use one of the existing methods in the Metal framework API:
Listing 11-4 Setting a read-write texture (Metal Shading Language)
// kernel function signature |
---|
kernel void my_kernel(texture2d<float, access::read_write> texA [[ texture(0) ]], ...) |
Listing 11-5 Setting a read-write texture (Metal Framework)
// kernel function argument table |
---|
[computeCommandEncoder setTexture:texA atIndex:0]; |
Note: It isinvalidto declare two separate texture arguments (one read, one write) in a function signature and then set the same texture for both.
Listing 11-6 Invalid read and write texture configuration (Metal Shading Language)
// kernel function signature |
---|
kernel void my_kernel(texture2d<float, access::read> texARead [[ texture(0) ]], |
texture2d<float, access::write> texAWrite [[ texture(1) ]], |
...) |
Listing 11-7 Invalid read and write texture configuration (Metal Framework)
// kernel function argument table |
---|
[computeCommandEncoder setTexture:texA atIndex:0]; // Read |
[computeCommandEncoder setTexture:texA atIndex:1]; // Write |
Synchronization
Thefence()
function allows you to control the order of a texture’s write and read operations within a thread, as shown inListing 11-8. Thefence()
function ensures that writes to the texture by a thread become visible to subsequent reads from that texture by the same thread.
Listing 11-8 Using a read-write texture with a fence
kernel void my_kernel(texture2d<float, access::read_write> texA [[texture(0)]], |
---|
ushort2 gid [[ thread_position_in_grid ]]) |
{ |
float4 color = generateColor(); |
texA.write(color, gid); |
// add a fence to ensure the correct ordering of write and read operations within the thread |
texA.fence(); |
float4 readColor = texA.read(gid); |
} |
Pixel Formats
The set of pixel formats supported by read-write textures is separated into two tiers, each defined by a specific feature set.
Tier 1 Pixel Formats
Available in:OSX_GPUFamily1_v2
R32Float
R32Uint
R32Sint
Tier 2 Pixel Formats
Available in:OSX_ReadWriteTextureTier2
RGBA32Float
RGBA32Uint
RGBA32Sint
RGBA16Float
RGBA16Uint
RGBA16Sint
RGBA8Unorm
RGBA8Uint
RGBA8Sint
R16Float
R16Uint
R16Sint
R8Unorm
R8Uint
R8Sint
Rules and Restrictions
Memory Barriers
Between Command Encoders
All resource writes performed in a given command encoder are visible in the next command encoder. This is true for both render and compute command encoders.
Within a Render Command Encoder
For buffers, atomic writes are visible to subsequent atomic reads across multiple threads.
For textures, thetextureBarrier
method ensures that writes performed in a given draw call are visible to subsequent reads in the next draw call.
Within a Compute Command Encoder
All resource writes performed in a given kernel function are visible in the next kernel function.
Fragment Functions
Discard
If thediscard_fragment()
function is called, all resource writes that occurred before the call are actually committed to memory. After the call, the execution of the fragment function either stops entirely or continues, but future resource writes are completely ignored.
Scissor Test
The scissor test is always performed before executing the fragment function. If the scissor test fails, the fragment function is not executed and no resource writes take place.
Early and Late Fragment Tests
The[[early_fragment_tests]]
qualifier can be added to a fragment function to request that depth, stencil, and occlusion query tests be performed before executing the fragment function. If any of the tests fail, the fragment function is not executed and no resource writes take place.
If the[[early_fragment_tests]]
qualifier is not added, all fragment function resource writes are committed to memory whether the tests pass or not.
MSAA
The number of times a resource write operation is executed depends on how many times the fragment function is executed. By default, the fragment function is executed between 1 andN
times, whereN
is the number of samples covered by the fragment. However, if the fragment function uses inputs with thecolor
,sample_id
,sample_perspective
, orsample_no_perspective
qualifiers, it is always executed at the sample rate.
If the fragment function result is discarded due to either setting a coverage mask of0
or returning a low alpha value when thealphaToCoverageEnabled
property is set toYES
, then all fragment function resource writes are still committed to memory.
Helper Threads
Helper threads are fragment function invocations for pixels near primitive edges that produce no output, typically used to calculate derivatives. Resource writes performed by helper threads are ignored; atomics will not update memory and the values returned by atomics will be undefined.
For further information, see theFragment Functionssection of theMetal Shading Language Guide.
Array of Textures
Available in:iOS_GPUFamily3_v2
An array of textures is a data structure for storing homogeneous textures, allowing you to dynamically index into a texture with ease. An array of textures is declared in the Metal shading language as either:
array<typename T, size_t N>
, orconst array<typename T, size_t N>
T
is a texture type declared with theaccess::read
oraccess::sample
qualifier andN
is the number of textures in the array.
An array of textures can be passed as an argument to graphics, compute, or user functions, or it can be declared as a local variable inside functions.Listing 11-9shows how to pass an array of 10 textures as an argument to a kernel function.
Listing 11-9 Passing an array of textures as an argument to a function
kernel void my_kernel( |
---|
const array<texture2d<float>, 10> src [[ texture(0) ]], |
texture2d<float, access::write> dst [[ texture(10) ]]) |
There is no new Metal framework API to set an array of textures; use the existing methods in theMTLRenderCommandEncoder
protocol orMTLComputeCommandEncoder
protocol.
The Metal shading language also adds support for a reference to an immutable array of textures, declared asarray_ref<T>
whereT
is a texture type andsize()
provides the number of textures in the array.
The storage for an array of textures is not owned by anarray_ref<T>
object. Implicit conversion operations are provided from types with contiguous iterators likemetal::array
. Thearray_ref<T>
type can be passed as an argument to user functions only. A common use for thearray_ref<T>
type is when passing an array of textures as an argument to functions where you want to be able to accept a variety of array types, as shown inListing 11-10.
Listing 11-10 Passing anarray_ref<T>
type as an argument to a function
float4 foo(array_ref<texture2d<float>> src) |
---|
{ |
float4 clr(0.0f); |
for(int i=0; i<src.size(); i++) |
{ |
clr += process_texture(src[i]); |
} |
return clr; |
} |
kernel void my_kernel_A( |
const array<texture2d<float>, 10> srcA [[ texture(0) ]], |
texture2d<float, access::write> dstB [[ texture(10) ]]) |
{ |
float4 clrA = foo(srcA); |
/* ... */ |
} |
kernel void my_kernel_B( |
const array<texture2d<float>, 20> srcB [[ texture(0) ]], |
texture2d<float, access::write> dstB [[ texture(20) ]]) |
{ |
float4 clrB = foo(srcB); |
/* ... */ |
} |
Stencil Texture Views
Available in:iOS_GPUFamily1_v3
,iOS_GPUFamily2_v3
,iOS_GPUFamily3_v2
,tvOS_GPUFamily1_v2
,OSX_GPUFamily1_v2
MTLPixelFormatX32_Stencil8
andMTLPixelFormatX24_Stencil8
are new stencil pixel formats that allow you to easily access stencil texture data in a graphics or compute function by using a stencil texture view.
A stencil texture view allows you to create a stencil-only texture from a combined depth and stencil texture. Both textures can then be set as texture arguments and sampled separately in a graphics or compute function. The stencil-only texture returns 8-bit unsigned integer values for each pixel. The format of the depth and stencil parent texture dictates the format of the stencil texture view, as listed inTable 11-1.
*These pixel formats are only supported in certain devices; query thedepth24Stencil8PixelFormatSupported
property of aMTLDevice
object to check for support.
A stencil texture view can be created by using the existing Metal framework API in theMTLTexture
protocol, as shown inListing 11-11.
Listing 11-11 Creating a stencil texture view
Note: Stencil texture views are an improved replacement for blitting stencil data from a combined depth and stencil texture into a buffer, then blitting from that buffer to a stencil-only texture.
Depth-16 Pixel Format
Available in:OSX_GPUFamily1_v2
MTLPixelFormatDepth16Unorm
is a new 16-bit depth pixel format with one normalized unsigned integer component.
Extended Range Pixel Formats
Available in:iOS_GPUFamily3_v2
Table 11-2lists the new extended range pixel formats. These pixel formats are intended to be used as displayable render targets for devices with a wide gamut display.
*The alpha component of theMTLPixelFormatBGRA10_XR
andMTLPixelFormatBGRA10_XR_sRGB
pixel formats is always clamped to the[0.0, 1.0]
range on sampling, rendering, and writing (despite supporting values outside this range).
All extended range formats are color-renderable and can be set in thepixelFormat
property of aCAMetalLayer
object or thecolorPixelFormat
property of aMTKView
object. Only devices with a wide gamut display will display values outside the[0.0, 1.0]
range; all other devices will clamp values to the[0.0, 1.0]
range.
Note: The 32bppMTLPixelFormatBGR10_XR
andMTLPixelFormatBGR10_XR_sRGB
extended range pixel formats have the same speed and memory characteristics as the 32bppMTLPixelFormatRGBA8Unorm
andMTLPixelFormatRGBA8Unorm_sRGB
ordinary pixel formats.
The 64bppMTLPixelFormatBGRA10_XR
andMTLPixelFormatBGRA10_XR_sRGB
extended range pixel formats require more memory bandwidth and do not perform as well. Although they can be used for source textures or intermediate render targets, it is recommended that you use these formats only for the texture of aCAMetalDrawable
object. If you want to represent wide-gamut values and require an alpha component, you should use a color texture with either of the 32bppMTLPixelFormatBGR10_XR
orMTLPixelFormatBGR10_XR_sRGB
pixel formats in conjunction with a separate alpha texture with the 8bppMTLPixelFormatA8Unorm
pixel format.
Combined MSAA Store and Resolve Action
Available in:iOS_GPUFamily3_v2
,OSX_GPUFamily1_v2
TheMTLStoreActionStoreAndMultisampleResolve
store action allows you to store and resolve MSAA data using a single render command encoder. The unresolved MSAA data is stored in the texture specified by thetexture
property and the resolved MSAA data is stored in the texture specified by theresolveTexture
property, as shown inListing 11-12
Listing 11-12 Performing a combined MSAA store and resolve operation using a single render command encoder
Note: The combined MSAA store and resolve action is an improved replacement for performing a store operation in one render command encoder and then performing a resolve operation in another render command encoder.
Deferred Store Action
Available in:iOS_GPUFamily1_v3
,iOS_GPUFamily2_v3
,iOS_GPUFamily3_v2
,tvOS_GPUFamily1_v2
,OSX_GPUFamily1_v2
TheMTLStoreActionUnknown
store action allows you to defer specifying a store action when configuring aMTLRenderPassAttachmentDescriptor
object. The store action must be specified after the render command encoder is created but before you call theendEncoding
method, as shown inListing 11-13. Call one of the following methods on aMTLRenderCommandEncoder
object to specify a store action other thanMTLStoreActionUnknown
:
Equivalent methods are provided for aMTLParallelRenderCommandEncoder
object, which you may only call on the parent encoder and must not call on any child encoders.
Listing 11-13 Deferring the decision of specifying a store action
A render command encoder created with aMTLStoreActionUnknown
store action can have its store action changed as many times as desired before a call to theendEncoding
method. A render command encoder created with a non-MTLStoreActionUnknown
store action cannot have its store action changed after creation.
Note: TheMTLStoreActionUnknown
store action may help you avoid potential bandwidth costs incurred by selecting theMTLStoreActionStore
action prematurely.
Dual-Source Blending
Available in:OSX_GPUFamily1_v2
Dual-source blending allows a fragment function to output two source colors,Source0
andSource1
, into the GPU’s blend unit for a single render target.
Producing a second source color
To produce two output colors,Source0
andSource1
, the Metal Shading Language extends thecolor(m)
attribute qualifier with anindex(i)
attribute qualifier, where:
m
is the color attachment index.i
is the color output index.
Listing 11-14 Enabling dual-source blending in a fragment function
Note: The values of bothm
andi
must be known at compile time. Ifindex(i)
is not specified, an index of0
is assumed.
Referencing a second source color
The second output color,Source1
, is referenced as a source or destination blend factor in the following fixed-function blending equations:
Output.rgb = (Source0.rgb * SBF) {BO} (Destination.rgb * DBF)
Output.a = (Source0.a * SBF) {BO} (Destination.a * DBF)
WhereBO
is aMTLBlendOperation
operator,SBF
is the source blend factor,DBF
is the destination blend factor, andSource1
can be referenced as one of the followingMTLBlendFactor
enums:
For example,Listing 11-15shows the render pipeline configuration that yields the following dual-source blend equation:
Output.rgb = (Source0.rgb * 1) + (Destination.rgb * Source1.rgb)
Listing 11-15 Configuring dual-source blending in a render pipeline descriptor
Rules and Restrictions
Dual-source blending configurations must adhere to the following rules and restrictions:
Dual-source blending is not compatible with multiple render targets.
Fragment functions can only output to
color(0)
.index(0)
always refers toSource0
andindex(1)
always refers toSource1
.Source1
blend factors can only be set oncolorAttachments[0]
.
MSAA Blits
Available in:iOS_GPUFamily1_v3
,iOS_GPUFamily2_v3
,iOS_GPUFamily3_v2
,tvOS_GPUFamily1_v2
,OSX_GPUFamily1_v1
,OSX_GPUFamily1_v2
MSAA blits from textures to buffers or from buffers to textures are now supported in the new iOS and tvOS feature sets. Blit destinations must be of an adequate size to store the MSAA data.
(MSAA blits were already supported in theOSX_GPUFamily1_v1
feature set and continue to be supported in theOSX_GPUFamily1_v2
feature set.)
sRGB Writes
Available in:iOS_GPUFamily2_v3
,iOS_GPUFamily3_v1
,iOS_GPUFamily3_v2
,tvOS_GPUFamily1_v2
Writes to sRGB textures are now supported in additional iOS and tvOS feature sets.
(Writes to sRGB textures were already supported in theiOS_GPUFamily3_v1
feature set.)
Additional Shading Language Features
Available in:iOS_GPUFamily1_v3
,iOS_GPUFamily2_v3
,iOS_GPUFamily3_v2
,tvOS_GPUFamily1_v2
,OSX_GPUFamily1_v2
(unless otherwise stated)
This section summarizes additional features introduced in version 1.2 of the Metal shading language. For further information, see theMetal Shading Language Guide.
Integer Functions
New integer functions to extract, insert, and reverse bits, as described inInteger Functions.
Texture Functions
Texture read and write functions can now be used with 16-bit unsigned integer coordinates (ushort
type), as described inTexture Functions.
Compute Functions
Available in:iOS_GPUFamily1_v3
,iOS_GPUFamily2_v3
,iOS_GPUFamily3_v2
,tvOS_GPUFamily1_v2
A new synchronization function for SIMD-group threads, as described inthreadgroup Synchronization Functions.
Sampler Qualifiers
Available in:iOS_GPUFamily1_v3
,iOS_GPUFamily2_v3
,iOS_GPUFamily3_v2
,tvOS_GPUFamily1_v2
New sampler qualifiers to specify maximum anisotropy and LOD clamp range, as described inSamplers.
Struct of Buffers, Textures, and Samplers
A struct of resources can now be passed by value as an argument to a graphics or compute function, as described inFunction Arguments and Variables.
Convenience Constants
New convenience constants of typefloat
andhalf
, as described inMath Functions.
Behavior Changes
For all
MTLRenderCommandEncoder
objects, aMTLStoreActionStore
store action is required to store the contents of a render target for subsequent render command encoders or for the display. Prior to iOS 10 and tvOS 10, if all color render targets had aMTLStoreActionDontCare
store action, the driver would choose to store the rendered contents to the first enabled render target. In iOS 10 and tvOS 10, the driver no longer performs this unnecessary action.In iOS 10 and tvOS 10, buffer alignments have been relaxed for all methods in the
MTLBlitCommandEncoder
protocol.