Align the column values with each other so you can easily see what's missing

Brass Contributor

I have some data on the attached worksheet. This is a side by side comparison of file contents. Column1 is the control, it has everything. The desire is to automate aligning the data so you can easily see what's missing in each file. There is an illustration in the workbook.

 

The idea I had for solving this is to start at the top and read each column a row at a time. If the cell matches the control cell leave it alone, if it doesn't push it down a row. And keep doing this until everything lines up.

 

 

 

 

6 Replies

@rodsan724 

Uh - the control does NOT have all values. You can see in the desired output that columns B, C and D extend further down than column A. Field 2 and Field 3 do not line up.

What exactly do you want?

@HansVogelaar sorry for any confusion. Column A is just the control. It doesn't have to line up with the others. The asterisk values could vary but hopefully you can see their relation because the related value is prefixed. The illustration shows the desired outcome.

 

I'm trying to line up things based on sections and fields. But the data coming in sometimes gives extra info about a field (hence, the single asterisk next to some of the fields).

@rodsan724 

How about this?

cli_one_dim~full_join_columns~0;

create temp table aa0 as 

select 数量 content, group_concat(属性) file from full_join_columnsunion group by 数量;

 create temp table aa as 

select f01 from full_join_columns group by f01;

 create temp table bb as 

select f02 from full_join_columns group by f02;

 create temp table cc as 

select f03 from full_join_columns group by f03;

 create temp table dd as 

select f04 from full_join_columns group by f04;

select * from aa0 left join aa on content=f01 left join bb on content=f02 left join cc on content=f03 left join dd on content=f04

 

content file f01 f02 f03 f04

*** Section1f01,f02,f03*** Section1*** Section1*** Section1 
*** Section2f01,f02,f03,f04*** Section2*** Section2*** Section2*** Section2
****** ZZZControl.txtf01****** ZZZControl.txt   
****** ZZZFile1.txtf02 ****** ZZZFile1.txt  
****** ZZZFile2.txtf03  ****** ZZZFile2.txt 
****** ZZZFile3.txtf04   ****** ZZZFile3.txt
*Field 1 Data1f03,f04  *Field 1 Data1*Field 1 Data1
*Field 2 Data1, Data2, Data3f02 *Field 2 Data1, Data2, Data3  
Field 1f01,f01,f02,f02,f03,f03,f04Field 1Field 1Field 1Field 1
Field 2f01,f01,f02,f03,f03,f04Field 2Field 2Field 2Field 2
Field 3f01,f01,f02,f02,f03,f04Field 3Field 3Field 3Field 3

 

@rodsan724 

I sort of got there with a 365 formula but its not what I would call 'pretty'!

image.png

I used a Lambda function to fill the section id down and created an identifier for each data item that could be filtered and sorted.

SetCodeλ(code, line)
=IF(
    ISNUMBER(SEARCH("Section", line)),
    "§" & RIGHT(line, 1) & ".0",
    IF(
        ISNUMBER(SEARCH("Field", line)),
        IF(ISERROR(SEARCH("Data", line)), LEFT(code, 3) & RIGHT(line, 1), LEFT(code, 4) & "xxx"),
        NA()
    )
)

Having set up the identifiers, I VSTACKed them and created the control list by filtering and sorting.  The final step was to look up these controls in each file column and return the original content using XLOOKUP.

Worksheet formula
= LET(
    identifier, DROP(REDUCE("", {1,2,3}, LAMBDA(acc,f,
      HSTACK(acc, SCAN("100", CHOOSECOLS(data, f), SetCodeλ))
    )),,1),
    control, SORT(UNIQUE(TOCOL(identifier,3))),
    result,  DROP(REDUCE("", {1,2,3}, LAMBDA(acc,f,
      LET(
        file,     CHOOSECOLS(identifier, f),
        dataset,  CHOOSECOLS(data, f),
        HSTACK(acc, XLOOKUP(control, file, dataset, "x"))
      )
    )),,1),
    HSTACK(control, result)
  )

I should be able to tidy up the REDUCE / HSTACK which are simply ugly workarounds for a MAP version that returns arrays of arrays, but I have rather run out of steam!

@peiyezhu 

if with hierarchical structure,maybe like this:

 

select * from full_join_columns;
create temp table aa as
select * from full_join_columns where rowid>1;
cli_one_dim~temp.aa~0;
create temp table bb as
select * from aaunion;
//select * from bb;
create temp table cc as
with recursive under_alice as (
select rowid old_rowid,属性,数量 s,0 f,0 d,数量
text from bb where rowid=1 union all
select a.rowid,a.属性,
case when instr(a.数量,'**')=1 then a.数量 else under_alice.s end,
case when instr(a.数量,'**')=1 then 0
when instr(a.数量,'Field')=1 then a.数量
else under_alice.f end f,
case
when instr(a.数量,'**')=1 then 0
when instr(a.数量,'Field')=1 then 0
when instr(a.数量,'*Field')=1 then a.数量 else 0 end,
a.数量 from bb a join under_alice on a.rowid=under_alice.old_rowid+1)
select 属性,s,f,d,text from under_alice;
//select * from cc;
cli_create_two_dim~cc~属性~text;
select colIdxf[3:]{iif(%s is null,highlight('x','yellow'),%s) %s} from cc_two_dim;

Screenshot_2023-10-29-10-38-25-114_cn.uujian.browser.jpg

@rodsan724 

 

Assuming there's a reliable/consistent way to distinguish Fields and SubFields (Artist: & *Track: in below example):

 

From.png

 

this looks doable with Get & Transform aka Power Query:

To.png