I'm currently implementing YOLO in TensorFlow and I'm a little surprised on how much memory that is taking. On my GPU I can train YOLO using their Darknet framework with batch size 64. On TensorFlow I can only do it with batch size 6, with 8 I already run out of memory. For the test phase I can run with batch size 64 without running out of memory.
I am wondering how I can calculate how much memory is being consumed by each tensor? Are all tensors by default saved in the GPU? Can I simply calculate the total memory consumption as the shape * 32 bits?
I noticed that since I'm using momentum, all my tensors also have a
/Momentum tensor. Could that also be using a lot of memory?
I am augmenting my dataset with a method
distorted_inputs, very similar to the one defined in the CIFAR-10 tutorial. Could it be that this part is occupying a huge chunk of memory? I believe Darknet does the modifications in the CPU.
Sorry for the slow reply. Unfortunately right now the only way to set the log level is to edit tensorflow/core/platform/logging.h and recompile with e.g.
#define VLOG_IS_ON(lvl) ((lvl) <= 1)
There is a bug open 1258 to control logging more elegantly.
MemoryLogTensorOutput entries are logged at the end of each Op execution, and indicate the tensors that hold the outputs of the Op. It's useful to know these tensors since the memory is not released until the downstream Op consumes the tensors, which may be much later on in a large graph.
See the description in this (commit). The memory allocation is raw info is there although it needs a script to collect the information in an easy to read form.