how to calculate a difference between 2 tables ?

Copper Contributor

Hello,

I have a table TableA that looks like this :

ViewDate ID prime otherfields

31/07/2023 1 10 titi

31/07/2023 1 10 titi

31/07/2023 1 10 toto

31/07/2023 2 10 tata

31/07/2023 2 10 tata

31/07/2023 2 10 tutu

And a table TableB that looks like this :

ViewDate ID prime otherfields

31/08/2023 2 10 tata

31/08/2023 2 30 tata

31/08/2023 2 30 tutu

31/08/2023 3 30 tata

31/08/2023 3 30 tata

31/08/2023 3 30 tutu

I want to calculate the variation line by line between the 2 tables with the following rules :

If a same ID is found, then the value of the field prime is equal to TableB prime - TableA prime

If an ID is not present in TableA, then I still want a line with the value of the field prime equal to TableB prime - 0

If an ID is not present in TableB, then I still want a line with the value of the field prime equal to 0 - TableA prime

If a result of variation is equal to 0, then I delete the line

I am then expecting to get this as a result (minus the line where prime = 0) :

ViewDate ID prime otherfields

31/08/2023 1 -10 titi

31/08/2023 1 -10 titi

31/08/2023 1 -10 toto

31/08/2023 2 0 tata

31/08/2023 2 20 tata

31/08/2023 2 20 tutu

31/08/2023 3 30 tata

31/08/2023 3 30 tata

31/08/2023 3 30 tutu

My query at the moment, in spark SQL that must run on databricks (so if it uses common enough sql clauses, it will be fine), is like this :

create table rmop.TableA (ViewDate date, ID integer, prime integer, otherfield string);

create table rmop.TableB (ViewDate date, ID integer, prime integer, otherfield string);

create table rmop.TableVAR (ViewDate date, ID integer, prime integer, otherfield string);

 

insert into rmop.TableA select '2023-07-31', 1, 10, 'titi';

insert into rmop.TableA select '2023-07-31', 1, 10, 'titi';

insert into rmop.TableA select '2023-07-31', 1, 10, 'toto';

insert into rmop.TableA select '2023-07-31', 2, 10, 'tata';

insert into rmop.TableA select '2023-07-31', 2, 10, 'tata';

insert into rmop.TableA select '2023-07-31', 2, 10, 'tutu';

 

insert into rmop.TableB select '2023-08-31', 2, 10, 'tata';

insert into rmop.TableB select '2023-08-31', 2, 30, 'tata';

insert into rmop.TableB select '2023-08-31', 2, 30, 'tutu';

insert into rmop.TableB select '2023-08-31', 3, 30, 'tata';

insert into rmop.TableB select '2023-08-31', 3, 30, 'tata';

insert into rmop.TableB select '2023-08-31', 3, 30, 'tutu';

 

insert into rmop.TableVAR (ViewDate, ID, prime, otherfield)

select

B.ViewDate,

COALESCE(A.ID, B.ID),

COALESCE(B.prime, 0) - COALESCE(A.prime, 0),

COALESCE(A.otherfield, B.otherfield)

from rmop.TableA A full outer join rmop.TableB B on A.ID = B.ID

where A.ViewDate ='2023-07-31' and B.ViewDate ='2023-08-31';

 

select * from rmop.TableVAR;

delete from rmop.TableVAR where prime = 0;

 

drop table rmop.TableA;

drop table rmop.TableB;

drop table rmop.TableVAR;

The problem is that this returns the following result :

ViewDate ID prime otherfields

31/08/2023 2 0 tata

31/08/2023 2 0 tata

31/08/2023 2 0 tutu

31/08/2023 2 20 tata

31/08/2023 2 20 tata

31/08/2023 2 20 tutu

31/08/2023 2 20 tata

31/08/2023 2 20 tata

31/08/2023 2 20 tutu

What is the issue in my query and how to correct it please in order to get the expected result ?

I know that I am not doing a join on otherfields because these are not identifiers

0 Replies