I gone through many forum and nvidia manual but i couldn't understand what is
__threadfence() and use of it ?
Normally, there are no guarantee that if one block writes something to global memory, the other block will "see" it. There is also no guarantee regarding the ordering of writes to global memory, with an exception of the block that issued it.
There are two exceptions:
Imagine, that one block produces some data, and then uses atomic operation to mark a flag that the data is there. But it is possible, that the other block will see the flag, but will read incorrect or incomplete data.
__threadfence function stalls current thread until its writes to global memory are guaranteed to be visible by all other threads in the grid. So, if you do something like:
it is guaranteed that if the other block sees the flag, it will also see the data.
Further reading: Cuda Programming Guide, Chapters B.2.4 and B.5