当前位置: 动力学知识库 > 问答 > 编程问答 >

multithreading - How to make data reading and preprocessing faster in C#

问题描述:

I have an algorithm for preprocessing data. It works with an *.arff format. arff filles. I have a class attribute where I have the structure of the arff files. In a string I record the name of the attribute, values ​​it can take and their occurrence. In a numerical value, I record minimum maximum average and standard deviation. For the small files it works very well, but for large files it is very very slow. The file can have more than 10 GB.

I've tried many options - MemoryMapped Filles, BufferedStream. I think the problem is in the long preprocesing, but I don't know how make it faster.

I tried threads, but I don't know how.

private void readDataArff()

{

string line = "";

using (StreamReader file = new StreamReader(openFileDialog1.FileName))

{

string[] data;

while ((line = file.ReadLine()) != null)

{

if ((line.Contains('%')) || (line.Contains('@')) || (line.Contains("") && (!line.Contains(','))))

continue; //skip header

data = line.Split(',');

for (int j = 0; j < attrList.Count; j++)//

{

attrList[j].FilePath = openFileDialog1.FileName;

attrList[j].Index = j;

if (attrList[j].Type1 == "STRING")

{

foreach (var item in attrList[j].Values)

{

if (item.Name == data[j])

{

item.Count += 1;

break;

}

}

}

else if ((attrList[j].Type1 == "REAL" && (line != "") && (!line.Contains('@'))) ||

(attrList[j].Type1 == "REAL" && (line != "") && (!line.Contains('@'))))

{

if ((data[j] == "?") || (data[j] == "") || (data[j] == " "))

continue;

attrList[j].Count += 1;

attrList[j].Sum = double.Parse(data[j]) + attrList[j].Sum;

double tmp = double.Parse(data[j]);

if (attrList[j].Max < tmp)

attrList[j].Max = tmp;

if (attrList[j].Min > tmp)

attrList[j].Min = tmp;

}

}

}

}

网友答案:
  1. You can try class BufferedStream, it is buffered implementation of the IStream.
  2. You can optimise some of your code. preevalute some values, i.e. double.Parse(data[j])
  3. Also you can write manual method string.Contains(char[] chrToSeek) that will scan string just once.
分享给朋友:
您可能感兴趣的文章:
随机阅读: