I wondered what sort of operations can CPU handle/perform while a memory operation is in progress by a DMA-controller of a device, to increase the level of concurrency? And if the CPU cache/registers is empty, how another instruction can be fetched without interleaving DMA in progress
It general, on big1 hardware, the CPU can do more or less anything while a DMA is in progress. In general, it simply continues with normal execution of running processes or kernel tasks under the control of the OS.
... [a]nd if the CPU cache/registers is empty, how another instruction can be fetched without interleaving DMA in progress[?]
As I understand it, you are asking what happens if the CPU needs to access memory. In general, the CPU is usually accessing memory frequently, not only when "registers or caches are empty" and this activity can proceed more or less normally2 when a DMA is in progress. The memory bus is already generally shared by several devices, including multiple DMA-capable devices, PCI cards, multiple cores or multiple CPUs. The memory controller is responsible for accepting and fulfilling all these questions, which include arbitrating between then.
So you are correct that there may be some type of "interleaving" when both DMA and the CPU access memory, just as this may occur when two cores (or even two logical threads running on the same core) access memory. How it works out in practice depends on how the DRAM is organized, how the memory controller works (and how many are present) and many other details, but in general you expect modern memory systems to be highly parallel - capable of sustaining multiple streams of accesses and often approaching the bandwidth limits imposed by RAM.
1These days that pretty much means anything bigger than an embedded microcontroller. E.g., even mobile CPUs qualify.
2By normally I mean the normal mechanisms are used and you can expect memory access to work, but not that performance won't be affected. The memory access by the CPU will be competing with the DMA access (and perhaps other access by other CPUs, PCI devices such as video cards, etc) and will likely be slower - but on reasonable hardware it certainly won't have to wait until the DMA finishes!