I have relatively small data set (circa 5mb) which I need to download (paid subscription) and sort on a monthly basis. I am hoping that someone can indicate a more efficient way to tackle this task.
The data is a record of planning applications within the last 30 days and whist the data is very informative, it is not organised very well - i.e. they do not provide the use type (e.g. residential, commercial, industrial etc.) nor do they provide the number of units. But they do provide a title for the application as well as a description.
An example title as follows:
STACKVILLE - 1680 APARTMENTS/RESTAURANT/RETAIL UNITS
An example description as follows:
Independent living scheme for older persons comprising 1100 one and 580 two bedroom apartments with support facilities a restaurant and bar retail units a cafe 20 reablement suites parking access landscaping and demolition works. 1 568 sq.m
At present, I am using Excel which is somewhat time consuming. I have an ever-growing list of circa 100 use classes, as I need to determine the exact usage, which I search the title and description for. If both the title and description return the same value then I am reasonably certain that is the use classification, if it is not found it is unlikely to be of interest - although I am not 100% certain my method catches everything that I am asking it to. If they return different values then I have to do a bit of manual sift. The formula I am using as follows:
After I have determined them I then need to determine the number of units and put this figure into a column to the right of the data for analysis.
Can anyone recommend a more efficient way to do this? I'm currently learning Python if that helps. Would also be willing to learn almost anything to speed this up as it's very time consuming.