I have a 1 dimensional data set with some no data values which are set as 9999. Here is an extract as it is quite long:
this_array = [ 4, 4, 1, 9999, 9999, 9999, -5, -4, ... ]
I would like to replace the no data values with the average of the closest values on either side, however as some no data values have closest values as no data values as well, replacing them is a little harder.
i.e. I would like the three no data values to be replaced with -2. I have created a loop to go through each of the scalars in the array and test for no data:
for k in this_array:
if k == 9999:
temp = np.where(k == 9999, (abs(this_array[k-1]-this_array[k+1])/2), this_array[k])
this_array[k] = temp
However I need to add in an if function or way to take the value before k-1 or after k+1 if that also is equal to 9999 e.g:
if np.logical_or(k+1 == 9999, k-1 == 9999):
temp = np.where(k == 9999, (abs(this_array[k-2]-this_array[k+2])/2), this_array[k])
As one can tell, this code gets messy as one may end up taking the wrong value or ending up with loads of nested if functions.
Does anyone know of a cleaner way to implement this as it's pretty variable throughout the dataset?
As requested: If the first and/or last points are no data, they would preferably be replaced with the closest data point.
There may be a more efficeint way to do this with numpy functions, but here is a solution using the itertools module:
from itertools import groupby for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999): if k: indices = list(g) new_v = (this_array[indices-1] + this_array[indices[-1]+1]) / 2 this_array[indices:indices[-1]+1].fill(new_v)
If the last element or first element can be
9999, you use the following:
from itertools import groupby for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999): if k: indices = list(g) prev_i, next_i = indices-1, indices[-1]+1 before = this_array[prev_i] if prev_i != -1 else this_array[next_i] after = this_array[next_i] if next_i != len(this_array) else before this_array[indices:next_i].fill((before + after) / 2)
Example using second version:
>>> from itertools import groupby >>> this_array = np.array([9999, 4, 1, 9999, 9999, 9999, -5, -4, 9999]) >>> for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999): ... if k: ... indices = list(g) ... prev_i, next_i = indices-1, indices[-1]+1 ... before = this_array[prev_i] if prev_i != -1 else this_array[next_i] ... after = this_array[next_i] if next_i != len(this_array) else before ... this_array[indices:next_i].fill((before + after) / 2) ... >>> this_array array([ 4, 4, 1, -2, -2, -2, -5, -4, -4])
I'd do something along the following lines:
import numpy as np def fill(arr, fwd_fill): out = arr.copy() if fwd_fill: start, end, step = 0, len(out), 1 else: start, end, step = len(out)-1, -1, -1 cur = out[start] for i in range(start, end, step): if np.isnan(out[i]): out[i] = cur else: cur = out[i] return out def avg(arr): fwd = fill(arr, True) back = fill(arr, False) return (fwd[:-2] + back[2:]) / 2. arr = np.array([ 4, 4, 1, np.nan, np.nan, np.nan, -5, -4]) print arr print avg(arr)
The first function can do either a forward or a backward fill, replacing every NaN with the nearest non-NaN.
Once you have that, computing the average is trivial, and is done by the second function.
You don't say how you want the first and the last element handled, so the code just chops them off.
Finally, it is worth noting that the function can return NaNs if either the first or the last element of the input array are missing (in which case there's no data to compute some of the averages).
Ok, I am afraid I have to write it myself, you can use
np.interp or equivalent (maybe somewhat nicer and much more featured) scipy functions you can find in
Ok, rereading... I guess you don't want linear interpolation? In which case of course this doesn't quite work... Though I am sure there are some vectorized methods.
imort numpy as np # data is the given array. data = data.astype(float) # I cast to float, if you don't want that badly... valid = data != 9999 x = np.nonzero(valid) replace = np.nonzero(~valid) valid_data = data[x] # using np.interp, but I think you will find better things in scipy.interpolate # if you don't mind using scipy. data[replace] = np.interp(replace, x, valid_data, left=valid_data, right=valid_data[-1])
Here's a recursive solution where the first and last aren't 9999. You could probably clean it up with a generator as the recursion could get kind of deep. It's a reasonable start
def a(list, first, depth): if ( == list): return  car = list cdr = list[1:] if (9999 == car): return a(cdr, first, depth+1) if (depth != 0): avg = [((first + car) /2)] * depth return avg + [car] + a(cdr, car, 0) else: return [car] + a(cdr, car, 0) print a([1,2,9999, 4, 9999,9999, 12],0,0) # => [1, 2, 3, 4, 8, 8, 12]