What’s New in iOS 10, tvOS 10, and macOS 10.12

This chapter summarizes the new features introduced in iOS 10, tvOS 10, and macOS 10.12.

New Metal Feature Sets

The new Metal feature sets are listed as follows:

To determine whether a feature set is supported by a device, query the existingsupportsFeatureSet:method of aMTLDeviceobject.

Each new feature described in this chapter is annotated with its feature set availability. For further information about feature availability, implementation limits, and pixel format capabilities for all feature sets, see theMetal Feature Set Tablespage.

All new Metal shading language features are available in version 1.2 (MTLLanguageVersion1_2). For further information, see theMetal Shading Language Guide.

Tessellation

For a complete overview of tessellation in Metal, see theTessellationchapter.

Resource Heaps

For a complete overview of resource heaps in Metal, see theResource Heapschapter.

Memoryless Render Targets

Available in:iOS_GPUFamily1_v3,iOS_GPUFamily2_v3,iOS_GPUFamily3_v2,tvOS_GPUFamily1_v2

Memoryless render targets are render targets that exist only transiently in on-GPU tile memory, without any other CPU or GPU memory backing. Memoryless render targets satisfy the increased memory demands of high-resolution displays and MSAA data, allowing you to:

  • Prevent wasting memory that is only needed for temporary render targets (color, depth, and stencil).

  • Increase image quality by using higher MSAA levels in the same memory budget.

To create a memoryless render target, set thestorageModeproperty of aMTLTextureDescriptorobject toMTLStorageModeMemoryless. Then, use this descriptor to create aMTLTextureobject.

OnlyMTLTextureobjects can be created with aMTLStorageModeMemorylessstorage mode. A memoryless render target can only be used by a temporary render target; to do so, set it as thetextureproperty of aMTLRenderPassAttachmentDescriptorobject.

Use Cases

The following use cases are just a few examples of temporary render targets that can be used as memoryless render targets. For each of these, the contents of the render targets are never used after the rendering operations are completed:

  • Traditional depth testing, using a depth render target with aMTLStoreActionDontCarestore action.

  • Traditional stencil buffer operations, using a stencil render target with aMTLStoreActionDontCarestore action.

  • MSAA rendering, using a MSAA color render target with aMTLStoreActionMultisampleResolvestore action.

  • Deferred rendering, using two logical rendering passes within a single render command encoder:

    1. The first pass populates the temporary G-buffer render targets with albedo, normal, and other data.

    2. The second pass reads the temporary G-buffer render targets, then computes and accumulates lighting data to output the final color to a persistent render target.

    Each temporary render target is a memoryless render target and has aMTLStoreActionDontCarestore action.

  • Deferred lighting, using three logical rendering passes within a single render command encoder:

    1. The first pass populates the temporary depth and normal render targets with geometry data.

    2. The second pass reads the previous temporary render targets, then computes and accumulates lighting data to populate the temporary diffuse and specular render targets.

    3. The third pass reads all previous temporary render targets, then computes the lighting equation to output the final color to a persistent render target.

    Each temporary render target is a memoryless render target and has aMTLStoreActionDontCarestore action.

Rules and Restrictions

Memoryless render targets must adhere to the following rules and restrictions. Memoryless render targets...

Apps using memoryless render targets should carefully control the total amount of data passed for processing in a single rendering pass. Memoryless render targets use on-GPU tile memory for their temporary storage; as long as all data associated with the draw calls issued for a rendering pass can be cached, memoryless render targets will be processed one tile at a time. To ensure successful usage of memoryless render targets, make sure you follow these additional rules:

  • All resources referenced in the rendering pass shouldn’t consume more physical memory than available.

  • You should not submit more than 64K unique viewports, scissors, and depth-bias values.

The command buffer will report any memory errors via theMTLCommandBufferErrorMemorylesserror code.

Function Specialization

Available in:iOS_GPUFamily1_v3,iOS_GPUFamily2_v3,iOS_GPUFamily3_v2,tvOS_GPUFamily1_v2,OSX_GPUFamily1_v2

Function specialization uses function constants to create specialized versions of your graphics and compute functions. Function constants are compile-time constants declared in your Metal shading language source; function constant values are assigned in your Metal app before the specialized function is compiled.

Note: Function constants are an improved replacement for pre-processor macros, as described in theProgram Scope Function Constantssection of theMetal Shading Language Guide.

Declaring Function Constants

The Metal shading language provides the[[ function_constant(index) ]]attribute to declare function constants, as shown inListing 11-1.

Listing 11-1 Declaring a function constant

constant bool a [[ function_constant(0) ]];

Function constants can be used to:

  • Control which function code paths get compiled

  • Specify optional arguments of a function

  • Specify optional elements of a struct declared with the[[ stage_in ]]qualifier

Setting Constant Values

The Metal framework provides theMTLFunctionConstantValuesclass to set the constant values for a specialized function. Constant values can be set by index, index range, or name.Listing 11-2shows how to set a constant value by index.

Listing 11-2 Setting a constant value by index

const bool a = true;
MTLFunctionConstantValues* constantValues = [MTLFunctionConstantValues new];
[constantValues setConstantValue:&a type:MTLDataTypeBool atIndex:0];

A singleMTLFunctionConstantValuesobject can be applied to multipleMTLFunctionobjects (for example, a vertex function and a fragment function). After a specialized function has been created, any changes to its constant values have no further effect on it. However, you can reset, add, or modify any constant values in theMTLFunctionConstantValuesobject and reuse it to create anotherMTLFunctionobject.

Compiling Specialized Functions

Specialized functions areMTLFunctionobjects created from aMTLLibraryobject by calling one of these methods:

These methods invoke the Metal compiler to evaluate your function constants and their constant values. The compiler specializes the named function by omitting function constant code paths, arguments, and elements that are not enabled by the constant values provided. Function constant values are first looked up by their index, then by their name. Any values that do not correspond to a function constant in the named function are ignored (without generating errors or warnings).

Note: ThenewFunctionWithName:constantValues:completionHandler:method compiles a specialized function asynchronously. Use this method to maximize performance and parallelism if your app needs to create multiple specialized functions.

Obtaining Reflection Data

AMTLFunctionConstantobject provides reflection data for function constants. This object should only be obtained if you need the reflection data to set constant values for a specialized function (for example, determining if a function constant is optional or required before setting its constant value). To obtain reflection data, fetch aMTLFunctionobject by calling thenewFunctionWithName:method and querying thefunctionConstantsproperty. Use this reflection data to set your constant values and compile your specialized function.

Function Resource Read-Writes

Function Buffer Read-Writes

Available in:iOS_GPUFamily3_v2,OSX_GPUFamily1_v2

Fragment functions can now write to buffers. Writable buffers must be declared in thedeviceaddress space and must not beconst. Use dynamic indexing to write to a buffer.

Atomic Functions

Vertex and fragment functions now support atomic functions for buffers (in thedeviceaddress space). For further information, see theAtomic Functionssection of theMetal Shading Language Guide.

Function Texture Read-Writes

Available in:OSX_GPUFamily1_v2

Both vertex and fragment functions can now write to textures. Writable textures must be declared with theaccess::writeoraccess::read_writequalifier. Use an appropriate variant of thewrite()function to write to a texture (wherelodis always constant and equal to zero).

Read-Write Textures

A read-write texture is a texture that can be both read from and written to by the same vertex, fragment, or kernel function.

Access

A read-write texture is declared in the Metal shading language as a texture with theaccess::read_writequalifier.Listing 11-3shows a simple read and write operation on a read-write texture, within the same function.

Listing 11-3 Using a read-write texture

kernel void my_kernel(texture2d<float, access::read_write> texA [[ texture(0) ]],
ushort2 gid [[ thread_position_in_grid ]])
{
float4 color = texA.read(gid);
color = processColor(color);
texA.write(color, gid);
}

Note: A read-write texture cannot call asample(),sample_compare(),gather(), orgather_compare()function.

A read-write texture cannot be declared with atexture2d_ms,depth2d,depth2d_array,depthcube,depthcube_array, ordepth2d_mstexture type.

To set a read-write texture in the graphics or compute function argument table, use one of the existing methods in the Metal framework API:

Listing 11-4 Setting a read-write texture (Metal Shading Language)

// kernel function signature
kernel void my_kernel(texture2d<float, access::read_write> texA [[ texture(0) ]], ...)

Listing 11-5 Setting a read-write texture (Metal Framework)

// kernel function argument table
[computeCommandEncoder setTexture:texA atIndex:0];

Note: It isinvalidto declare two separate texture arguments (one read, one write) in a function signature and then set the same texture for both.

Listing 11-6 Invalid read and write texture configuration (Metal Shading Language)

// kernel function signature
kernel void my_kernel(texture2d<float, access::read> texARead [[ texture(0) ]],
texture2d<float, access::write> texAWrite [[ texture(1) ]],
...)

Listing 11-7 Invalid read and write texture configuration (Metal Framework)

// kernel function argument table
[computeCommandEncoder setTexture:texA atIndex:0]; // Read
[computeCommandEncoder setTexture:texA atIndex:1]; // Write
Synchronization

Thefence()function allows you to control the order of a texture’s write and read operations within a thread, as shown inListing 11-8. Thefence()function ensures that writes to the texture by a thread become visible to subsequent reads from that texture by the same thread.

Listing 11-8 Using a read-write texture with a fence

kernel void my_kernel(texture2d<float, access::read_write> texA [[texture(0)]],
ushort2 gid [[ thread_position_in_grid ]])
{
float4 color = generateColor();
texA.write(color, gid);
// add a fence to ensure the correct ordering of write and read operations within the thread
texA.fence();
float4 readColor = texA.read(gid);
}
Pixel Formats

The set of pixel formats supported by read-write textures is separated into two tiers, each defined by a specific feature set.

Tier 1 Pixel Formats

Available in:OSX_GPUFamily1_v2

  • R32Float

  • R32Uint

  • R32Sint

Tier 2 Pixel Formats

Available in:OSX_ReadWriteTextureTier2

  • RGBA32Float

  • RGBA32Uint

  • RGBA32Sint

  • RGBA16Float

  • RGBA16Uint

  • RGBA16Sint

  • RGBA8Unorm

  • RGBA8Uint

  • RGBA8Sint

  • R16Float

  • R16Uint

  • R16Sint

  • R8Unorm

  • R8Uint

  • R8Sint

Rules and Restrictions

Memory Barriers

Between Command Encoders

All resource writes performed in a given command encoder are visible in the next command encoder. This is true for both render and compute command encoders.

Within a Render Command Encoder

For buffers, atomic writes are visible to subsequent atomic reads across multiple threads.

For textures, thetextureBarriermethod ensures that writes performed in a given draw call are visible to subsequent reads in the next draw call.

Within a Compute Command Encoder

All resource writes performed in a given kernel function are visible in the next kernel function.

Fragment Functions

Discard

If thediscard_fragment()function is called, all resource writes that occurred before the call are actually committed to memory. After the call, the execution of the fragment function either stops entirely or continues, but future resource writes are completely ignored.

Scissor Test

The scissor test is always performed before executing the fragment function. If the scissor test fails, the fragment function is not executed and no resource writes take place.

Early and Late Fragment Tests

The[[early_fragment_tests]]qualifier can be added to a fragment function to request that depth, stencil, and occlusion query tests be performed before executing the fragment function. If any of the tests fail, the fragment function is not executed and no resource writes take place.

If the[[early_fragment_tests]]qualifier is not added, all fragment function resource writes are committed to memory whether the tests pass or not.

MSAA

The number of times a resource write operation is executed depends on how many times the fragment function is executed. By default, the fragment function is executed between 1 andNtimes, whereNis the number of samples covered by the fragment. However, if the fragment function uses inputs with thecolor,sample_id,sample_perspective, orsample_no_perspectivequalifiers, it is always executed at the sample rate.

If the fragment function result is discarded due to either setting a coverage mask of0or returning a low alpha value when thealphaToCoverageEnabledproperty is set toYES, then all fragment function resource writes are still committed to memory.

Helper Threads

Helper threads are fragment function invocations for pixels near primitive edges that produce no output, typically used to calculate derivatives. Resource writes performed by helper threads are ignored; atomics will not update memory and the values returned by atomics will be undefined.

For further information, see theFragment Functionssection of theMetal Shading Language Guide.

Array of Textures

Available in:iOS_GPUFamily3_v2

An array of textures is a data structure for storing homogeneous textures, allowing you to dynamically index into a texture with ease. An array of textures is declared in the Metal shading language as either:

  • array<typename T, size_t N>, or

  • const array<typename T, size_t N>

Tis a texture type declared with theaccess::readoraccess::samplequalifier andNis the number of textures in the array.

An array of textures can be passed as an argument to graphics, compute, or user functions, or it can be declared as a local variable inside functions.Listing 11-9shows how to pass an array of 10 textures as an argument to a kernel function.

Listing 11-9 Passing an array of textures as an argument to a function

kernel void my_kernel(
const array<texture2d<float>, 10> src [[ texture(0) ]],
texture2d<float, access::write> dst [[ texture(10) ]])

There is no new Metal framework API to set an array of textures; use the existing methods in theMTLRenderCommandEncoderprotocol orMTLComputeCommandEncoderprotocol.

The Metal shading language also adds support for a reference to an immutable array of textures, declared asarray_ref<T>whereTis a texture type andsize()provides the number of textures in the array.

The storage for an array of textures is not owned by anarray_ref<T>object. Implicit conversion operations are provided from types with contiguous iterators likemetal::array. Thearray_ref<T>type can be passed as an argument to user functions only. A common use for thearray_ref<T>type is when passing an array of textures as an argument to functions where you want to be able to accept a variety of array types, as shown inListing 11-10.

Listing 11-10 Passing anarray_ref<T>type as an argument to a function

float4 foo(array_ref<texture2d<float>> src)
{
float4 clr(0.0f);
for(int i=0; i<src.size(); i++)
{
clr += process_texture(src[i]);
}
return clr;
}
kernel void my_kernel_A(
const array<texture2d<float>, 10> srcA [[ texture(0) ]],
texture2d<float, access::write> dstB [[ texture(10) ]])
{
float4 clrA = foo(srcA);
/* ... */
}
kernel void my_kernel_B(
const array<texture2d<float>, 20> srcB [[ texture(0) ]],
texture2d<float, access::write> dstB [[ texture(20) ]])
{
float4 clrB = foo(srcB);
/* ... */
}

Stencil Texture Views

Available in:iOS_GPUFamily1_v3,iOS_GPUFamily2_v3,iOS_GPUFamily3_v2,tvOS_GPUFamily1_v2,OSX_GPUFamily1_v2

MTLPixelFormatX32_Stencil8andMTLPixelFormatX24_Stencil8are new stencil pixel formats that allow you to easily access stencil texture data in a graphics or compute function by using a stencil texture view.

A stencil texture view allows you to create a stencil-only texture from a combined depth and stencil texture. Both textures can then be set as texture arguments and sampled separately in a graphics or compute function. The stencil-only texture returns 8-bit unsigned integer values for each pixel. The format of the depth and stencil parent texture dictates the format of the stencil texture view, as listed inTable 11-1.

*These pixel formats are only supported in certain devices; query thedepth24Stencil8PixelFormatSupportedproperty of aMTLDeviceobject to check for support.

A stencil texture view can be created by using the existing Metal framework API in theMTLTextureprotocol, as shown inListing 11-11.

Listing 11-11 Creating a stencil texture view

Note: Stencil texture views are an improved replacement for blitting stencil data from a combined depth and stencil texture into a buffer, then blitting from that buffer to a stencil-only texture.

Depth-16 Pixel Format

Available in:OSX_GPUFamily1_v2

MTLPixelFormatDepth16Unormis a new 16-bit depth pixel format with one normalized unsigned integer component.

Extended Range Pixel Formats

Available in:iOS_GPUFamily3_v2

Table 11-2lists the new extended range pixel formats. These pixel formats are intended to be used as displayable render targets for devices with a wide gamut display.

*The alpha component of theMTLPixelFormatBGRA10_XRandMTLPixelFormatBGRA10_XR_sRGBpixel formats is always clamped to the[0.0, 1.0]range on sampling, rendering, and writing (despite supporting values outside this range).

All extended range formats are color-renderable and can be set in thepixelFormatproperty of aCAMetalLayerobject or thecolorPixelFormatproperty of aMTKViewobject. Only devices with a wide gamut display will display values outside the[0.0, 1.0]range; all other devices will clamp values to the[0.0, 1.0]range.

Note: The 32bppMTLPixelFormatBGR10_XRandMTLPixelFormatBGR10_XR_sRGBextended range pixel formats have the same speed and memory characteristics as the 32bppMTLPixelFormatRGBA8UnormandMTLPixelFormatRGBA8Unorm_sRGBordinary pixel formats.

The 64bppMTLPixelFormatBGRA10_XRandMTLPixelFormatBGRA10_XR_sRGBextended range pixel formats require more memory bandwidth and do not perform as well. Although they can be used for source textures or intermediate render targets, it is recommended that you use these formats only for the texture of aCAMetalDrawableobject. If you want to represent wide-gamut values and require an alpha component, you should use a color texture with either of the 32bppMTLPixelFormatBGR10_XRorMTLPixelFormatBGR10_XR_sRGBpixel formats in conjunction with a separate alpha texture with the 8bppMTLPixelFormatA8Unormpixel format.

Combined MSAA Store and Resolve Action

Available in:iOS_GPUFamily3_v2,OSX_GPUFamily1_v2

TheMTLStoreActionStoreAndMultisampleResolvestore action allows you to store and resolve MSAA data using a single render command encoder. The unresolved MSAA data is stored in the texture specified by thetextureproperty and the resolved MSAA data is stored in the texture specified by theresolveTextureproperty, as shown inListing 11-12

Listing 11-12 Performing a combined MSAA store and resolve operation using a single render command encoder

Note: The combined MSAA store and resolve action is an improved replacement for performing a store operation in one render command encoder and then performing a resolve operation in another render command encoder.

Deferred Store Action

Available in:iOS_GPUFamily1_v3,iOS_GPUFamily2_v3,iOS_GPUFamily3_v2,tvOS_GPUFamily1_v2,OSX_GPUFamily1_v2

TheMTLStoreActionUnknownstore action allows you to defer specifying a store action when configuring aMTLRenderPassAttachmentDescriptorobject. The store action must be specified after the render command encoder is created but before you call theendEncodingmethod, as shown inListing 11-13. Call one of the following methods on aMTLRenderCommandEncoderobject to specify a store action other thanMTLStoreActionUnknown:

Equivalent methods are provided for aMTLParallelRenderCommandEncoderobject, which you may only call on the parent encoder and must not call on any child encoders.

Listing 11-13 Deferring the decision of specifying a store action

A render command encoder created with aMTLStoreActionUnknownstore action can have its store action changed as many times as desired before a call to theendEncodingmethod. A render command encoder created with a non-MTLStoreActionUnknownstore action cannot have its store action changed after creation.

Note: TheMTLStoreActionUnknownstore action may help you avoid potential bandwidth costs incurred by selecting theMTLStoreActionStoreaction prematurely.

Dual-Source Blending

Available in:OSX_GPUFamily1_v2

Dual-source blending allows a fragment function to output two source colors,Source0andSource1, into the GPU’s blend unit for a single render target.

Producing a second source color

To produce two output colors,Source0andSource1, the Metal Shading Language extends thecolor(m)attribute qualifier with anindex(i)attribute qualifier, where:

  • mis the color attachment index.

  • iis the color output index.

Listing 11-14 Enabling dual-source blending in a fragment function

Note: The values of bothmandimust be known at compile time. Ifindex(i)is not specified, an index of0is assumed.

Referencing a second source color

The second output color,Source1, is referenced as a source or destination blend factor in the following fixed-function blending equations:

Output.rgb = (Source0.rgb * SBF) {BO} (Destination.rgb * DBF)

Output.a = (Source0.a * SBF) {BO} (Destination.a * DBF)

WhereBOis aMTLBlendOperationoperator,SBFis the source blend factor,DBFis the destination blend factor, andSource1can be referenced as one of the followingMTLBlendFactorenums:

For example,Listing 11-15shows the render pipeline configuration that yields the following dual-source blend equation:

Output.rgb = (Source0.rgb * 1) + (Destination.rgb * Source1.rgb)

Listing 11-15 Configuring dual-source blending in a render pipeline descriptor

Rules and Restrictions

Dual-source blending configurations must adhere to the following rules and restrictions:

  • Dual-source blending is not compatible with multiple render targets.

  • Fragment functions can only output tocolor(0).

  • index(0)always refers toSource0andindex(1)always refers toSource1.

  • Source1blend factors can only be set oncolorAttachments[0].

MSAA Blits

Available in:iOS_GPUFamily1_v3,iOS_GPUFamily2_v3,iOS_GPUFamily3_v2,tvOS_GPUFamily1_v2,OSX_GPUFamily1_v1,OSX_GPUFamily1_v2

MSAA blits from textures to buffers or from buffers to textures are now supported in the new iOS and tvOS feature sets. Blit destinations must be of an adequate size to store the MSAA data.

(MSAA blits were already supported in theOSX_GPUFamily1_v1feature set and continue to be supported in theOSX_GPUFamily1_v2feature set.)

sRGB Writes

Available in:iOS_GPUFamily2_v3,iOS_GPUFamily3_v1,iOS_GPUFamily3_v2,tvOS_GPUFamily1_v2

Writes to sRGB textures are now supported in additional iOS and tvOS feature sets.

(Writes to sRGB textures were already supported in theiOS_GPUFamily3_v1feature set.)

Additional Shading Language Features

Available in:iOS_GPUFamily1_v3,iOS_GPUFamily2_v3,iOS_GPUFamily3_v2,tvOS_GPUFamily1_v2,OSX_GPUFamily1_v2(unless otherwise stated)

This section summarizes additional features introduced in version 1.2 of the Metal shading language. For further information, see theMetal Shading Language Guide.

Integer Functions

New integer functions to extract, insert, and reverse bits, as described inInteger Functions.

Texture Functions

Texture read and write functions can now be used with 16-bit unsigned integer coordinates (ushorttype), as described inTexture Functions.

Compute Functions

Available in:iOS_GPUFamily1_v3,iOS_GPUFamily2_v3,iOS_GPUFamily3_v2,tvOS_GPUFamily1_v2

A new synchronization function for SIMD-group threads, as described inthreadgroup Synchronization Functions.

Sampler Qualifiers

Available in:iOS_GPUFamily1_v3,iOS_GPUFamily2_v3,iOS_GPUFamily3_v2,tvOS_GPUFamily1_v2

New sampler qualifiers to specify maximum anisotropy and LOD clamp range, as described inSamplers.

Struct of Buffers, Textures, and Samplers

A struct of resources can now be passed by value as an argument to a graphics or compute function, as described inFunction Arguments and Variables.

Convenience Constants

New convenience constants of typefloatandhalf, as described inMath Functions.

Behavior Changes

  • For allMTLRenderCommandEncoderobjects, aMTLStoreActionStorestore action is required to store the contents of a render target for subsequent render command encoders or for the display. Prior to iOS 10 and tvOS 10, if all color render targets had aMTLStoreActionDontCarestore action, the driver would choose to store the rendered contents to the first enabled render target. In iOS 10 and tvOS 10, the driver no longer performs this unnecessary action.

  • In iOS 10 and tvOS 10, buffer alignments have been relaxed for all methods in theMTLBlitCommandEncoderprotocol.

results matching ""

    No results matching ""