I have a set of day-sharded data where individual entries do not contain the day. I would like to use table wildcards to select all available data and get back data that is grouped by both the column I am interested in and the day that it was captured. Something, in other words, like this:
SELECT table_id, identifier, Sum(AppAnalytic) as AppAnalyticCount
FROM (TABLE_QUERY(database_main,'table_id CONTAINS "Title_" AND length(table_id) >= 4'))
GROUP BY identifier, table_id order by AppAnalyticCount DESC LIMIT 10
Of course, this does not actually work because table_id is not visible in the table aggregation resulting from the TABLE_QUERY function. Is there any way to accomplish this? Some sort of join on table metadata perhaps?
This functionality is available now in BigQuery through
_TABLE_SUFFIX pseudocolumn. Full documentation is at https://cloud.google.com/bigquery/docs/querying-wildcard-tables.
Couple of things to note:
You will have to rename
_TABLE_SUFFIX into something else in your
SELECT list, i.e. following example illustrates it
SELECT _TABLE_SUFFIX as table_id, ... FROM `MyDataset.MyTablePrefix_*`
Not available today, but something I'd love to have too. The team takes feature requests seriously, so thanks for adding support for this one :).
In the meantime, a workaround is doing a manual union of a SELECT of each table, plus an additional column with the date data.
For example, instead of:
SELECT x, #TABLE_ID FROM table201401, table201402, table201303
You could do:
SELECT x, month FROM (SELECT x, '201401' AS month FROM table201401), (SELECT x, '201402' AS month FROM table201402), (SELECT x, '201403' AS month FROM table201403)