Are you sure you do not mean OpenGL ES 2.0? You cannot make shaders of any type using OpenGL ES 1.1. I'll take the first one.
In my experience, the fastest way I've found this is your second list. I am doing several types of 3x3 convolutions in my GPUImage structure (which you could just use instead of trying to roll on your own), and for those I reproach in texture offset for horizontal and vertical directions and calculate the nine texture coordinates needed in the vertex shader. From there I pass them as changes to the fragment shader.
This (for the most part) avoids the dependent texture readings in the shader of fragments that are terribly expensive on iOS PowerVR GPUs. I say โfor the most partโ because on older devices such as the iPhone 4, only eight of these changes are used to avoid dependent texture. As I found out last week, the ninth causes a dependent texture read on older devices, so this slows down a bit. However, the iPhone 4S does not have this problem, since it supports more changes used in this way.
For the vertex shader, I use the following:
attribute vec4 position; attribute vec4 inputTextureCoordinate; uniform highp float texelWidth; uniform highp float texelHeight; varying vec2 textureCoordinate; varying vec2 leftTextureCoordinate; varying vec2 rightTextureCoordinate; varying vec2 topTextureCoordinate; varying vec2 topLeftTextureCoordinate; varying vec2 topRightTextureCoordinate; varying vec2 bottomTextureCoordinate; varying vec2 bottomLeftTextureCoordinate; varying vec2 bottomRightTextureCoordinate; void main() { gl_Position = position; vec2 widthStep = vec2(texelWidth, 0.0); vec2 heightStep = vec2(0.0, texelHeight); vec2 widthHeightStep = vec2(texelWidth, texelHeight); vec2 widthNegativeHeightStep = vec2(texelWidth, -texelHeight); textureCoordinate = inputTextureCoordinate.xy; leftTextureCoordinate = inputTextureCoordinate.xy - widthStep; rightTextureCoordinate = inputTextureCoordinate.xy + widthStep; topTextureCoordinate = inputTextureCoordinate.xy - heightStep; topLeftTextureCoordinate = inputTextureCoordinate.xy - widthHeightStep; topRightTextureCoordinate = inputTextureCoordinate.xy + widthNegativeHeightStep; bottomTextureCoordinate = inputTextureCoordinate.xy + heightStep; bottomLeftTextureCoordinate = inputTextureCoordinate.xy - widthNegativeHeightStep; bottomRightTextureCoordinate = inputTextureCoordinate.xy + widthHeightStep; }
and fragment shader:
precision highp float; uniform sampler2D inputImageTexture; uniform mediump mat3 convolutionMatrix; varying vec2 textureCoordinate; varying vec2 leftTextureCoordinate; varying vec2 rightTextureCoordinate; varying vec2 topTextureCoordinate; varying vec2 topLeftTextureCoordinate; varying vec2 topRightTextureCoordinate; varying vec2 bottomTextureCoordinate; varying vec2 bottomLeftTextureCoordinate; varying vec2 bottomRightTextureCoordinate; void main() { mediump vec4 bottomColor = texture2D(inputImageTexture, bottomTextureCoordinate); mediump vec4 bottomLeftColor = texture2D(inputImageTexture, bottomLeftTextureCoordinate); mediump vec4 bottomRightColor = texture2D(inputImageTexture, bottomRightTextureCoordinate); mediump vec4 centerColor = texture2D(inputImageTexture, textureCoordinate); mediump vec4 leftColor = texture2D(inputImageTexture, leftTextureCoordinate); mediump vec4 rightColor = texture2D(inputImageTexture, rightTextureCoordinate); mediump vec4 topColor = texture2D(inputImageTexture, topTextureCoordinate); mediump vec4 topRightColor = texture2D(inputImageTexture, topRightTextureCoordinate); mediump vec4 topLeftColor = texture2D(inputImageTexture, topLeftTextureCoordinate); mediump vec4 resultColor = topLeftColor * convolutionMatrix[0][0] + topColor * convolutionMatrix[0][1] + topRightColor * convolutionMatrix[0][2]; resultColor += leftColor * convolutionMatrix[1][0] + centerColor * convolutionMatrix[1][1] + rightColor * convolutionMatrix[1][2]; resultColor += bottomLeftColor * convolutionMatrix[2][0] + bottomColor * convolutionMatrix[2][1] + bottomRightColor * convolutionMatrix[2][2]; gl_FragColor = resultColor; }
Even with the caveats mentioned above, this shader works in ~ 2 ms for a video clip with a resolution of 640x480 on iPhone 4, and 4S can handle 1080p video at 30 FPS easily using such a shader.