当前位置: 动力学知识库 > 问答 > 编程问答 >

sql - Get latest forecast data with multi-column group identifier

问题描述:

I have many weather locations with wind forecast data. I need the most recent as_of before 10:00 of the previous day. I need this for each hour, of each day, for each location.

A location is defined as a unique lat and lon pair.

Full table schema with pertinent sample data:

CREATE SCHEMA weather

CREATE TABLE weather.forecast

(

foretime timestamp without time zone NOT NULL,

as_of timestamp without time zone NOT NULL, -- in UTC

summary text,

precipintensity numeric(8,4),

precipprob numeric(2,2),

temperature numeric(5,2),

apptemp numeric(5,2),

dewpoint numeric(5,2),

humidity numeric(2,2),

windspeed numeric(5,2),

windbearing numeric(4,1),

visibility numeric(5,2),

cloudcover numeric(4,2),

pressure numeric(6,2),

ozone numeric(5,2),

preciptype text,

lat numeric(8,6) NOT NULL,

lon numeric(9,6) NOT NULL,

CONSTRAINT forecast_pkey PRIMARY KEY (foretime, as_of, lat, lon)

);

INSERT INTO weather.forecast

(windspeed, foretime, as_of, lat, lon)

VALUES

(11.19, '2/1/2016 8:00', '1/30/2016 23:00', 34.556, 28.345),

(10.98, '2/1/2016 8:00', '1/31/2016 5:00', 34.556, 28.345),

(10.64, '2/1/2016 8:00', '1/31/2016 11:00', 34.556, 28.345),

(10.95, '2/1/2016 8:00', '1/31/2016 8:00', 29.114, 16.277),

(10.39, '2/1/2016 8:00', '1/31/2016 23:00', 29.114, 16.277),

(9.22, '2/1/2016 8:00', '1/31/2016 5:00', 29.114, 16.277),

(10, '2/1/2016 9:00', '1/30/2016 04:00', 34.556, 28.345),

(9.88, '2/1/2016 9:00', '1/31/2016 09:00', 34.556, 28.345),

(10.79, '2/1/2016 9:00', '1/30/2016 23:00', 34.556, 28.345),

(10.8, '2/1/2016 9:00', '1/31/2016 5:00', 29.114, 16.277),

(10.35, '2/1/2016 9:00', '1/31/2016 11:00', 29.114, 16.277),

(10.07, '2/1/2016 9:00', '1/31/2016 17:00', 29.114, 16.277)

;

Desired result format:

lat lon Foredate foreHE windspeed as_of

34.556 28.345 2/1/2016 8 10.98 1/31/2016 5:00

34.556 28.345 2/1/2016 9 9.88 1/31/2016 9:00

29.114 16.277 2/1/2016 8 10.95 1/31/2016 8:00

29.114 16.277 2/1/2016 9 10.80 1/31/2016 5:00

Here is my code to get the correct as_of. Things go bad for me when I try to add back in the windspeed.

SELECT

date_trunc('day', (a.foretime)) :: DATE AS Foredate,

extract(HOUR FROM (a.foretime)) AS foreHE,

a.lat,

a.lon,

max(a.as_of) - interval '5 hours' as latest_as_of

FROM weather.forecast a

WHERE date_trunc('day', foretime) :: DATE - as_of >= INTERVAL '14 hours'

GROUP BY Foredate, foreHE, a.lat, a.lon

网友答案:

Your error, when adding back windspeed, is this:

[42803] ERROR: column "a.windspeed" must appear in the GROUP BY clause or be used in an aggregate function
  Position: 184

I can't really improve on PostgreSQL's error message, except maybe to get into the theory a bit. Basically, when you do GROUP BY you're affording yourself the luxury acting over subsets within the larger set which is the table represented by the rest of the query. But SQL doesn't let you have those subsets to iterate over, you must tell the database your computations and let it give you back another flat list.

Of the two options presented by Postgres, usually one of them is an obvious choice. For instance, if you left out a.lon, it would be obvious that you're not grouping by longitude, just latitude, and you would agree that it should be added to the GROUP BY clause. But in this case if you group by actual measurements, every subset will have just one row, and that isn't useful either. So at first blush it seems like you need an aggregate. Problem number two is that there isn't an aggregate for you for this problem. Sigh!

So here's my thinking. The primary key you need to look up by is (forename, as_of, lat, lon) and you can get that straightforwardly with this query:

select
  foretime,
  max(as_of) as as_of,
  lat, lon
from weather.forecast
group by foretime, lat, lon;

Now you could join this back onto the same table, forecast, to get the most recent forecasts:

select
  date_trunc('day', a.foretime)::date as forecast_day,
  extract(hour from a.foretime) as forecast_hour,
  a.lat, a.lon,
  f.windspeed,
  a.as_of - interval '5 hours' as latest_as_of
from weather.forecast f
join (select
        foretime,
        max(as_of) as as_of,
        lat, lon
      from weather.forecast
      group by foretime, lat, lon) a using (foretime, as_of, lat, lon);

This produces the following report:

 forecast_day | forecast_hour |    lat    |    lon    | windspeed |    latest_as_of
--------------+---------------+-----------+-----------+-----------+---------------------
 2016-02-01   |             8 | 34.556000 | 28.345000 |     10.64 | 2016-01-31 06:00:00
 2016-02-01   |             8 | 29.114000 | 16.277000 |     10.39 | 2016-01-31 18:00:00
 2016-02-01   |             9 | 34.556000 | 28.345000 |      9.88 | 2016-01-31 04:00:00
 2016-02-01   |             9 | 29.114000 | 16.277000 |     10.07 | 2016-01-31 12:00:00
(4 rows)

There is probably a more efficient way to do this with correlated subqueries but I'm not sure how to accomplish it.

Edit: Matching your output format:

select
  a.lat, a.lon,
  date_trunc('day', a.foretime)::date as forecast_day,
  extract(hour from a.foretime) as forecast_hour,
  f.windspeed,
  a.as_of - interval '5 hours' as latest_as_of
from weather.forecast f
  join (select
          foretime,
          max(as_of) as as_of,
          lat, lon
        from weather.forecast
        where date_trunc('day', foretime)::date - as_of >= interval '14 hours'
        group by foretime, lat, lon) a using (foretime, as_of, lat, lon)
order by lat desc, lon;

Result:

    lat    |    lon    | forecast_day | forecast_hour | windspeed |    latest_as_of
-----------+-----------+--------------+---------------+-----------+---------------------
 34.556000 | 28.345000 | 2016-02-01   |             8 |     10.98 | 2016-01-31 00:00:00
 34.556000 | 28.345000 | 2016-02-01   |             9 |      9.88 | 2016-01-31 04:00:00
 29.114000 | 16.277000 | 2016-02-01   |             8 |     10.95 | 2016-01-31 03:00:00
 29.114000 | 16.277000 | 2016-02-01   |             9 |     10.80 | 2016-01-31 00:00:00
(4 rows)
分享给朋友:
您可能感兴趣的文章:
随机阅读: