I am trying to create a procedural game using Metal, and I am using the octave-based fragment approach to implement the level of detail.
The method I use includes a processor that creates octree nodes for the terrain, which then has its own grid created on the GPU using a computational shader. This grid is stored in the vertex buffer and index buffer in the chunk object for rendering.
All of this seems to work quite well, however, when it comes to providing chunks, I click on performance issues at an early stage. I am currently collecting an array of pieces for drawing, and then submitting it to my renderer, which will create an MTLParallelRenderCommandEncoder , to then create an MTLRenderCommandEncoder for each fragment, which is then sent to the GPU.
In his opinion, about 50% of the processor’s time is spent creating an MTLRenderCommandEncoder for each fragment. Currently, I just create a simple 8 cc grid for each piece, and I have a 4x4x4 array, and I type in about 50 frames per second in these early stages. (In fact, it seems that in each MTLParallelRenderCommandEncoder can be up to 63 MTLRenderCommandEncoder , so it is not completely 4x4x4)
I read that the point of MTLParallelRenderCommandEncoder is to create each MTLRenderCommandEncoder in a separate thread, but I was not very lucky for this to work. In addition, multi-threading will not cover a cap of 63 pieces, which will be displayed as a maximum.
I feel that somehow consolidating the vertex and index buffers for each fragment into one or two large presentation buffers will help, but I'm not sure how to do this without plentiful calls to memcpy() and whether it will even increase efficiency.
Here is my code that takes in an array of nodes and draws them:
func drawNodes(nodes: [OctreeNode], inView view: AHMetalView){ // For control of several rotating buffers dispatch_semaphore_wait(displaySemaphore, DISPATCH_TIME_FOREVER) makeDepthTexture() updateUniformsForView(view, duration: view.frameDuration) let commandBuffer = commandQueue.commandBuffer() let optDrawable = layer.nextDrawable() guard let drawable = optDrawable else{ return } let passDescriptor = MTLRenderPassDescriptor() passDescriptor.colorAttachments[0].texture = drawable.texture passDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0.2, 0.2, 0.2, 1) passDescriptor.colorAttachments[0].storeAction = .Store passDescriptor.colorAttachments[0].loadAction = .Clear passDescriptor.depthAttachment.texture = depthTexture passDescriptor.depthAttachment.clearDepth = 1 passDescriptor.depthAttachment.loadAction = .Clear passDescriptor.depthAttachment.storeAction = .Store let parallelRenderPass = commandBuffer.parallelRenderCommandEncoderWithDescriptor(passDescriptor) // Currently 63 nodes as a maximum for node in nodes{ // This line is taking up around 50% of the CPU time let renderPass = parallelRenderPass.renderCommandEncoder() renderPass.setRenderPipelineState(renderPipelineState) renderPass.setDepthStencilState(depthStencilState) renderPass.setFrontFacingWinding(.CounterClockwise) renderPass.setCullMode(.Back) let uniformBufferOffset = sizeof(AHUniforms) * uniformBufferIndex renderPass.setVertexBuffer(node.vertexBuffer, offset: 0, atIndex: 0) renderPass.setVertexBuffer(uniformBuffer, offset: uniformBufferOffset, atIndex: 1) renderPass.setTriangleFillMode(.Lines) renderPass.drawIndexedPrimitives(.Triangle, indexCount: AHMaxIndicesPerChunk, indexType: AHIndexType, indexBuffer: node.indexBuffer, indexBufferOffset: 0) renderPass.endEncoding() } parallelRenderPass.endEncoding() commandBuffer.presentDrawable(drawable) commandBuffer.addCompletedHandler { (commandBuffer) -> Void in self.uniformBufferIndex = (self.uniformBufferIndex + 1) % AHInFlightBufferCount dispatch_semaphore_signal(self.displaySemaphore) } commandBuffer.commit() }