# Query to calculate sum of distance (longitude, latitude) in consecutive rows in Mysql

I am very new to sql and I am stuck. I am trying to calculate the (yearly) sum of distance each user has traveled. I have a table (lets call it dist_table) with the following structure:

``rowid user_name date LAT LONG1 maria 2005-01-01 51.555 5.0142 maria 2005-01-01 51.437 5.4743 peter 2005-02-03 51.437 5.4744 john 2005-02-03 51.858 5.8645 maria 2005-02-04 51.858 5.8646 john 2005-02-03 51.437 5.4747 john 2006-02-04 0 08 john 2006-02-04 51.858 5.8649 john 2006-02-04 51.858 5.86410 john 2006-02-04 51.437 5.474``

This is the intermediate step in the calculation (just to clarify what I mean):

``rowid user_name date LAT LONG distance1 maria 2005-01-01 51.555 5.014 02 maria 2005-01-01 51.437 5.474 34.4523 peter 2005-02-03 51.437 5.474 04 john 2005-02-03 51.858 5.864 05 maria 2005-02-04 51.858 5.864 54.0126 john 2005-03-03 51.437 5.474 54.0127 john 2006-02-04 0 08 john 2006-02-04 51.858 5.864 54.0129 john 2006-02-04 51.858 5.864 010 john 2006-02-04 51.437 5.474 54.012``

And this is the final result I need:

``user_name date sum(distance)maria 2005 88.464peter 2005 0john 2005 54.012john 2006 108.024``

I was thinking of using this formula (Haversine) to calculate the distance between consecutive rows and then summing it up:

``SELECT user_name,date,dist_table.LAT,dist_table.LONG, 6373 * 2 * ASIN(SQRT(POWER(SIN((orig_latitude - abs(next_latitude)) * pi()/180 / 2),2)+ COS(orig_latitude * pi()/180) * COS(abs(next_latitude) * pi()/180) * POWER(SIN((orig_longitude - next_longitude) * pi()/180 / 2),2) ))AS distance FROM dist_table WHERE dist_table.LAT !=0 AND dist_table.LONG !=0;``

However, I am unable to figure out how to call the consecutive row. So far, this is what I got when trying to figure out how to connect the rows:

``SELECT user_name, date, LAT,IFNULL( (SELECT MAX( LAT )FROM dist_tableWHERE user_name = t1.user_nameAND ( date < t1.date )) ,0) AS next_latitudeFROM dist_table AS t1 ORDER BY user_name, date;``

The problem is that for each user, there can be multiple rows satisfying this condition and this chooses the maximum value instead of the previous one. Furthermore, there is sometimes 0 in the longitude and/or latitude and I need to ignore these rows.

I was thinking that this could be probably solved, if I first created column with a row order based on user_name and date and then putting something like date+1 = t1.date in the condition. Unfortunately, I have very limited permissions on the server I am using so this would probably have to be handled with user defined variables, but I do not know how to do that.

I am using mysql 5.6.19-log.

Could anybody help me out?

So here's a solution to the first part of the problem...

``````DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(id     INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,user_name   VARCHAR(12) NOT NULL
,date             DATE NOT NULL
,LAT        DECIMAL(5,3) NOT NULL
,LON DECIMAL (5,2) NOT NULL
);

INSERT INTO my_table VALUES
( 1,'maria','2005-01-01',51.555 ,5.014),
( 2,'maria','2005-01-01',51.437 ,5.474),
( 3,'peter','2005-02-03',51.437 ,5.474),
( 4,'john' ,'2005-02-03',51.858 ,5.864),
( 5,'maria','2005-02-04',51.858 ,5.864),
( 6,'john' ,'2005-02-03',51.437 ,5.474),
( 7,'john' ,'2006-02-04',0      ,0),
( 8,'john' ,'2006-02-04',51.858 ,5.864),
( 9,'john' ,'2006-02-04',51.858 ,5.864),
(10,'john' ,'2006-02-04',51.437 ,5.474);

SELECT x.user_name
, x.id from_id
, MIN(y.id) to_id
FROM my_table x
JOIN my_table y
ON y.user_name = x.user_name
AND y.id > x.id
WHERE (y.lat <> 0 AND y.lon <> 0)
AND (x.lat <> 0 AND x.lon <> 0)
GROUP
BY x.id;

+-----------+---------+-------+
| user_name | from_id | to_id |
+-----------+---------+-------+
| maria     |       1 |     2 |
| maria     |       2 |     5 |
| john      |       4 |     6 |
| john      |       6 |     8 |
| john      |       8 |     9 |
| john      |       9 |    10 |
+-----------+---------+-------+
``````

For the rest of the problem, something like the following should work.

I have a function in my database called geo_distance_km. It looks like this, and saves typing out the haversine formula each time:

``````delimiter //
create DEFINER = CURRENT_USER function geo_distance_km (lat1 double, lon1 double, lat2 double, lon2 double) returns double
begin
declare R int DEFAULT 6372.8;
declare phi1 double;
declare phi2 double;
declare d_phi double;
declare d_lambda double;
declare a double;
declare c double;
declare d double;
set phi1 = radians(lat1);
set phi2 = radians(lat2);
set d_phi = radians(lat2-lat1);
set d_lambda = radians(lon2-lon1);
set a = sin(d_phi/2) * sin(d_phi/2) +
cos(phi1) * cos(phi2) *
sin(d_lambda/2) * sin(d_lambda/2);
set c = 2 * atan2(sqrt(a), sqrt(1-a));
set d = R * c;
return d;
end;
//
delimiter ;
``````

We can combine that with what we have already...

``````SELECT user_name
, YEAR(date) year
, COALESCE(SUM(distance),0) total
FROM
( SELECT a.*
, b.lat to_lat
, b.lon to_lon
, ROUND(geo_distance_km(from_lat,from_lon,b.lat,b.lon),3) distance
FROM
( SELECT x.user_name
, x.date
, x.id from_id
, x.lat from_lat
, x.lon from_lon
, MIN(y.id) to_id
FROM my_table x
LEFT
JOIN my_table y
ON y.user_name = x.user_name
AND y.id > x.id
AND (y.lat <> 0 OR y.lon <> 0)
WHERE (x.lat <> 0 AND x.lon <> 0)
GROUP
BY x.id
) a
LEFT
JOIN my_table b
ON b.id = a.to_id
) n
GROUP
BY user_name
, year;

+-----------+------+---------+
| user_name | year | total   |
+-----------+------+---------+
| john      | 2005 | 108.024 |
| john      | 2006 |  54.012 |
| maria     | 2005 |  88.464 |
| peter     | 2005 |   0.000 |
+-----------+------+---------+
``````

I don't quite understand how you handle distances that overlap years, but this should get you close to what you're after.