I am exporting a massive data set from dynamics to elastic search.
Below are the steps:
I am doing extensive logging for the time it takes and any errors.
It all works and my export exports the data in an hour.
That said, I have observed that the HttpPost's reponse time keeps increasing. I have looked for any memory leaks I could have or anything I should dispose and haven't. I want to make sure it will not haunt me later.
So, what are the possible reasons for the increase of response times?
How should I go about investigation the issue ?
I use ES 1.7 and I index about 10 mln documents using similar scenario. From my experience if you push ES to hard it will slow down and sometimes fail with OutOfMemory exceptions. I don't know if it is still an issue with newer versions.
IMHO it is because ES needs some time to process bulks - it accepts data, index it, but after that it does some background work to optimize the index.
To overcome the issue I experimented with parameters: a single bulk size (N), sleep time between indexing bulks (S1), and much longer sleep between a few (M) bulks (S2). For my dataset and my hardware I ended with N=5000, S1=1s, M=10, S2=10s. To choose safe values I observe usage of CPU, memory and I/O. For example increased I/O usage for extended period may suggest that ES will break soon.
I'm sure it is very dependent on hardware you have, especially give ES as much memory as you can!