I currently use a 3x3 or 5x5 Boxfilter and thought about separate the filter in two steps. First the x and in the second y. This would save some calculation time.
To achieve this I need to render the first step to another framebuffer attached texture and draw another full screen quad to get the full blur (including x,y blur).
From the performance point of view is there any noticable improvement to separate the filter in two steps vs blur x and y in just one?
In general when you have NxN blur filter then you need N^2 texture reads in the shader.
When you use separate filter then you move to N+N reads + some_const
some_const - means the cost of rendering twice, changing buffers, driver, etc, etc.
For 3x3 blur I think there will be no difference, for 5x5 possibly, but for larger kernels there should be a visible difference. It would be nice to measure performance of both approaches.