Making a piece in metal

Question

Making a piece in metal

I am trying to create a procedural game using Metal, and I am using the octave-based fragment approach to implement the level of detail.

The method I use includes a processor that creates octree nodes for the terrain, which then has its own grid created on the GPU using a computational shader. This grid is stored in the vertex buffer and index buffer in the chunk object for rendering.

All of this seems to work quite well, however, when it comes to providing chunks, I click on performance issues at an early stage. I am currently collecting an array of pieces for drawing, and then submitting it to my renderer, which will create an MTLParallelRenderCommandEncoder , to then create an MTLRenderCommandEncoder for each fragment, which is then sent to the GPU.

In his opinion, about 50% of the processor’s time is spent creating an MTLRenderCommandEncoder for each fragment. Currently, I just create a simple 8 cc grid for each piece, and I have a 4x4x4 array, and I type in about 50 frames per second in these early stages. (In fact, it seems that in each MTLParallelRenderCommandEncoder can be up to 63 MTLRenderCommandEncoder , so it is not completely 4x4x4)

I read that the point of MTLParallelRenderCommandEncoder is to create each MTLRenderCommandEncoder in a separate thread, but I was not very lucky for this to work. In addition, multi-threading will not cover a cap of 63 pieces, which will be displayed as a maximum.

I feel that somehow consolidating the vertex and index buffers for each fragment into one or two large presentation buffers will help, but I'm not sure how to do this without plentiful calls to memcpy() and whether it will even increase efficiency.

Here is my code that takes in an array of nodes and draws them:

 func drawNodes(nodes: [OctreeNode], inView view: AHMetalView){ // For control of several rotating buffers dispatch_semaphore_wait(displaySemaphore, DISPATCH_TIME_FOREVER) makeDepthTexture() updateUniformsForView(view, duration: view.frameDuration) let commandBuffer = commandQueue.commandBuffer() let optDrawable = layer.nextDrawable() guard let drawable = optDrawable else{ return } let passDescriptor = MTLRenderPassDescriptor() passDescriptor.colorAttachments[0].texture = drawable.texture passDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0.2, 0.2, 0.2, 1) passDescriptor.colorAttachments[0].storeAction = .Store passDescriptor.colorAttachments[0].loadAction = .Clear passDescriptor.depthAttachment.texture = depthTexture passDescriptor.depthAttachment.clearDepth = 1 passDescriptor.depthAttachment.loadAction = .Clear passDescriptor.depthAttachment.storeAction = .Store let parallelRenderPass = commandBuffer.parallelRenderCommandEncoderWithDescriptor(passDescriptor) // Currently 63 nodes as a maximum for node in nodes{ // This line is taking up around 50% of the CPU time let renderPass = parallelRenderPass.renderCommandEncoder() renderPass.setRenderPipelineState(renderPipelineState) renderPass.setDepthStencilState(depthStencilState) renderPass.setFrontFacingWinding(.CounterClockwise) renderPass.setCullMode(.Back) let uniformBufferOffset = sizeof(AHUniforms) * uniformBufferIndex renderPass.setVertexBuffer(node.vertexBuffer, offset: 0, atIndex: 0) renderPass.setVertexBuffer(uniformBuffer, offset: uniformBufferOffset, atIndex: 1) renderPass.setTriangleFillMode(.Lines) renderPass.drawIndexedPrimitives(.Triangle, indexCount: AHMaxIndicesPerChunk, indexType: AHIndexType, indexBuffer: node.indexBuffer, indexBufferOffset: 0) renderPass.endEncoding() } parallelRenderPass.endEncoding() commandBuffer.presentDrawable(drawable) commandBuffer.addCompletedHandler { (commandBuffer) -> Void in self.uniformBufferIndex = (self.uniformBufferIndex + 1) % AHInFlightBufferCount dispatch_semaphore_signal(self.displaySemaphore) } commandBuffer.commit() }

+5

swift metal

Andy heeard Dec 02 '15 at 16:09

source share

1 answer

rickster · Accepted Answer · 2015-12-02T20:47:10+0000

You mark:

I read that the point of MTLParallelRenderCommandEncoder is to create each MTLRenderCommandEncoder in a separate thread ...

And you are right. What you do is sequential creation, coding, and final command encoders - nothing happens here, so MTLParallelRenderCommandEncoder does nothing for you. You will have approximately the same performance if you excluded the parallel encoder and the newly created encoders with renderCommandEncoderWithDescriptor(_:) on each pass through the for ... loop, that is, you will still have the same performance problem due to overhead the cost of creating all of these encoders.

So, if you are going to encode sequentially, just use the same encoder. In addition, you should use as much of your general condition as possible. Here's a quick pass with possible refactoring (untested):

 let passDescriptor = MTLRenderPassDescriptor() // call this once before your render loop func setup() { makeDepthTexture() passDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0.2, 0.2, 0.2, 1) passDescriptor.colorAttachments[0].storeAction = .Store passDescriptor.colorAttachments[0].loadAction = .Clear passDescriptor.depthAttachment.texture = depthTexture passDescriptor.depthAttachment.clearDepth = 1 passDescriptor.depthAttachment.loadAction = .Clear passDescriptor.depthAttachment.storeAction = .Store // set up render pipeline state and depthStencil state } func drawNodes(nodes: [OctreeNode], inView view: AHMetalView) { updateUniformsForView(view, duration: view.frameDuration) // Set up completed handler ahead of time let commandBuffer = commandQueue.commandBuffer() commandBuffer.addCompletedHandler { _ in // unused parameter self.uniformBufferIndex = (self.uniformBufferIndex + 1) % AHInFlightBufferCount dispatch_semaphore_signal(self.displaySemaphore) } // Semaphore should be tied to drawable acquisition dispatch_semaphore_wait(displaySemaphore, DISPATCH_TIME_FOREVER) guard let drawable = layer.nextDrawable() else { return } // Set up the one part of the pass descriptor that changes per-frame passDescriptor.colorAttachments[0].texture = drawable.texture // Get one render pass descriptor and reuse it let renderPass = commandBuffer.renderCommandEncoderWithDescriptor(passDescriptor) renderPass.setTriangleFillMode(.Lines) renderPass.setRenderPipelineState(renderPipelineState) renderPass.setDepthStencilState(depthStencilState) for node in nodes { // Update offsets and draw let uniformBufferOffset = sizeof(AHUniforms) * uniformBufferIndex renderPass.setVertexBuffer(node.vertexBuffer, offset: 0, atIndex: 0) renderPass.setVertexBuffer(uniformBuffer, offset: uniformBufferOffset, atIndex: 1) renderPass.drawIndexedPrimitives(.Triangle, indexCount: AHMaxIndicesPerChunk, indexType: AHIndexType, indexBuffer: node.indexBuffer, indexBufferOffset: 0) } renderPass.endEncoding() commandBuffer.presentDrawable(drawable) commandBuffer.commit() }

Then a profile with tools to find out if there are additional performance problems that you may have. There is a large WWDC 2015 session that shows several common “errors”, how to diagnose them during profiling and how to fix them.

Making a piece in metal

More articles: