First I must declare that I'm an extreme newbie. Have just been using PHP and MySQL for about 4 weeks now. Please accept my apologies in advance if I haven't formatted this question correctly or am not using proper terms of art.
I am building a store locator application. For testing, I have a table named 'locations' that contains name, address and latitude/longitude data for 5 different restaurant chains with 1500 total (location) records.
I got the app running fine as a standard store locator where the user inputs their address and a distance in miles to search. The code below properly returns those results when the GROUP BY statement is removed. For example, when the user inputs their address and distance to search, the SELECT statement returns ALL of the restaurants within that distance just fine.
My app requires that only the nearest location of each restaurant chain within the user specified distance be returned and displayed. I added the GROUP BY statement to accomplish this. The correct number of records are returned with the correct loc_name and distance from the user. However, all of the other fields are never correct. They seem randomly selected from other records that are outside of the MIN value. For example, the 1st record returned is for DAIRY QUEEN at a distance of 4.38 mile - that's correct. However, the address, state, city, etc. for the DAIRY QUEEN at 4.38 miles are not correct.
I have read extensively about issues with GROUP BY and the requirement to use an INNER JOIN perhaps to solve my problem? A recent question and answer in stackoverflow addresses this very specifically, see MySQL Selecting wrong column value in Group By query. All the solutions I've read so far would have me using the computed distance as a key to perform the JOIN and I don't see how that's possible.
Question 1: How do I construct the SELECT statement to get my desired result: the complete row of data fields for only one each of the restaurant chains in the locations table?
Notes about my code that is not as scary as it looks and not necessary to understand to solve my problem:
The trig formula within MIN() computes the distance in miles between the user's address (translated to latitude/longitude) and the lat/lon of each location record. Trust me, this works OK.
ORDER BY 13 statement: means to ORDER by the 13th field listed in the SELECT, in this case it is the alias 'distance'. I'm mentioning this because I've noted that this syntax is not well known.
The code following the WHERE statement checks to see if the user's address (in lat/lon) is within a box with lat/lon corners that are the user specified distance of locations to search. This is called a 'Bounding Box'. It's used to optimize the search time. One could simply test to see if 'distance' is <= than user input distance but that would require reading the entire locations file. The production version will contain approximately a million records. An index is in place for the locations table: (loc_lat,loc_lon,loc_id). My understanding is that using the Bounding Box in the WHERE statement will limit the range of the index that needs to be read. Question 2: Is this true the way I have it implemented, will it be processed as I described? Will the solution to Question 1 retain the optimization?
Thank all of you in advance for your help. I'm really just 4 weeks into mySQL and PHP and as you can see, in over my head?
My question boils down to this. How should this SELECT be modified to return only 1 locations table record with corresponding fields for each loc_name that is the minimum distance from the user's input address??
MIN( ((ACOS( SIN( $lat * PI( ) /180 ) * SIN( loc_lat * PI( ) /180 ) +
COS( $lat * PI( ) /180 ) * COS( loc_lat * PI( ) /180 ) *
COS( ($long - loc_lon) * PI( ) /180 ) ) *180 / PI( )) *60 * 1.1515) )
FROM locations WHERE (loc_lat between $lat1 and $lat2
AND loc_lon between $lon1 and $lon2)
AND loc_geocoded_status = 1
GROUP BY loc_name
ORDER BY 13
You've come a long way in four weeks. It helps to include minimal DDL and INSERT statements to encourage more people to respond.
I added the GROUP BY statement to accomplish this. The correct number of records are returned with the correct loc_name and distance from the user. However, all of the other fields are never correct. They seem randomly selected from other records that are outside of the MIN value.
Yes, that's normal for MySQL. The article MySQL Standard Group By explains that behavior.
An indeterminate result set is returned when one or more non-aggregated columns in a SELECT clause aren’t listed in the GROUP BY clause. The columns listed in the SELECT clause but excluded from the GROUP BY clause return meaningless values because they’re column values chosen indeterminately from all pre-aggregated rows.
You need a determinate result set, not an indeterminate one. This statement should give you a two-column result set containing one row per location name.
SELECT loc_name, MIN( ((ACOS( . . . ) AS distance FROM locations GROUP BY loc_name
And you should be able to use that statement and a JOIN expression on the location name and distance to get the other columns you need.
I wrapped the arithmetic in a function named "distance", then
SELECT L1.*, C.* FROM locations L1 INNER JOIN (SELECT L2.loc_name, MIN(distance($lat, $lon, L2.loc_lat, L2.loc_lon)) AS distance FROM locations L2 GROUP BY L2.loc_name) C ON L1.loc_name = C.loc_name AND C.distance = distance($lat, $lon, L1.loc_lat, L1.loc_lon)
You'll need to add your bounding box information. I left it out while I was trying to make sure the JOIN worked right. I had an unnecessary ORDER BY in the inner SELECT clause, but that was a pre-caffeine clause, so I deleted it.
You will probably need an index on loc_name, too, because it's used in the
GROUP BY. See MySQL's docs for EXPLAIN syntax.