Oct 27 2023 12:00 PM
I have some data on the attached worksheet. This is a side by side comparison of file contents. Column1 is the control, it has everything. The desire is to automate aligning the data so you can easily see what's missing in each file. There is an illustration in the workbook.
The idea I had for solving this is to start at the top and read each column a row at a time. If the cell matches the control cell leave it alone, if it doesn't push it down a row. And keep doing this until everything lines up.
Oct 27 2023 12:30 PM
Uh - the control does NOT have all values. You can see in the desired output that columns B, C and D extend further down than column A. Field 2 and Field 3 do not line up.
What exactly do you want?
Oct 27 2023 12:56 PM - edited Oct 27 2023 01:03 PM
@HansVogelaar sorry for any confusion. Column A is just the control. It doesn't have to line up with the others. The asterisk values could vary but hopefully you can see their relation because the related value is prefixed. The illustration shows the desired outcome.
I'm trying to line up things based on sections and fields. But the data coming in sometimes gives extra info about a field (hence, the single asterisk next to some of the fields).
Oct 28 2023 01:00 AM
How about this?
cli_one_dim~full_join_columns~0;
create temp table aa0 as
select 数量 content, group_concat(属性) file from full_join_columnsunion group by 数量;
create temp table aa as
select f01 from full_join_columns group by f01;
create temp table bb as
select f02 from full_join_columns group by f02;
create temp table cc as
select f03 from full_join_columns group by f03;
create temp table dd as
select f04 from full_join_columns group by f04;
select * from aa0 left join aa on content=f01 left join bb on content=f02 left join cc on content=f03 left join dd on content=f04
content file f01 f02 f03 f04
*** Section1 | f01,f02,f03 | *** Section1 | *** Section1 | *** Section1 | |
*** Section2 | f01,f02,f03,f04 | *** Section2 | *** Section2 | *** Section2 | *** Section2 |
****** ZZZControl.txt | f01 | ****** ZZZControl.txt | |||
****** ZZZFile1.txt | f02 | ****** ZZZFile1.txt | |||
****** ZZZFile2.txt | f03 | ****** ZZZFile2.txt | |||
****** ZZZFile3.txt | f04 | ****** ZZZFile3.txt | |||
*Field 1 Data1 | f03,f04 | *Field 1 Data1 | *Field 1 Data1 | ||
*Field 2 Data1, Data2, Data3 | f02 | *Field 2 Data1, Data2, Data3 | |||
Field 1 | f01,f01,f02,f02,f03,f03,f04 | Field 1 | Field 1 | Field 1 | Field 1 |
Field 2 | f01,f01,f02,f03,f03,f04 | Field 2 | Field 2 | Field 2 | Field 2 |
Field 3 | f01,f01,f02,f02,f03,f04 | Field 3 | Field 3 | Field 3 | Field 3 |
Oct 28 2023 03:12 PM
I sort of got there with a 365 formula but its not what I would call 'pretty'!
I used a Lambda function to fill the section id down and created an identifier for each data item that could be filtered and sorted.
SetCodeλ(code, line)
=IF(
ISNUMBER(SEARCH("Section", line)),
"§" & RIGHT(line, 1) & ".0",
IF(
ISNUMBER(SEARCH("Field", line)),
IF(ISERROR(SEARCH("Data", line)), LEFT(code, 3) & RIGHT(line, 1), LEFT(code, 4) & "xxx"),
NA()
)
)
Having set up the identifiers, I VSTACKed them and created the control list by filtering and sorting. The final step was to look up these controls in each file column and return the original content using XLOOKUP.
Worksheet formula
= LET(
identifier, DROP(REDUCE("", {1,2,3}, LAMBDA(acc,f,
HSTACK(acc, SCAN("100", CHOOSECOLS(data, f), SetCodeλ))
)),,1),
control, SORT(UNIQUE(TOCOL(identifier,3))),
result, DROP(REDUCE("", {1,2,3}, LAMBDA(acc,f,
LET(
file, CHOOSECOLS(identifier, f),
dataset, CHOOSECOLS(data, f),
HSTACK(acc, XLOOKUP(control, file, dataset, "x"))
)
)),,1),
HSTACK(control, result)
)
I should be able to tidy up the REDUCE / HSTACK which are simply ugly workarounds for a MAP version that returns arrays of arrays, but I have rather run out of steam!
Oct 28 2023 07:43 PM
if with hierarchical structure,maybe like this:
select * from full_join_columns;
create temp table aa as
select * from full_join_columns where rowid>1;
cli_one_dim~temp.aa~0;
create temp table bb as
select * from aaunion;
//select * from bb;
create temp table cc as
with recursive under_alice as (
select rowid old_rowid,属性,数量 s,0 f,0 d,数量
text from bb where rowid=1 union all
select a.rowid,a.属性,
case when instr(a.数量,'**')=1 then a.数量 else under_alice.s end,
case when instr(a.数量,'**')=1 then 0
when instr(a.数量,'Field')=1 then a.数量
else under_alice.f end f,
case
when instr(a.数量,'**')=1 then 0
when instr(a.数量,'Field')=1 then 0
when instr(a.数量,'*Field')=1 then a.数量 else 0 end,
a.数量 from bb a join under_alice on a.rowid=under_alice.old_rowid+1)
select 属性,s,f,d,text from under_alice;
//select * from cc;
cli_create_two_dim~cc~属性~text;
select colIdxf[3:]{iif(%s is null,highlight('x','yellow'),%s) %s} from cc_two_dim;
Nov 02 2023 03:59 PM - edited Nov 11 2023 09:53 PM
Assuming there's a reliable/consistent way to distinguish Fields and SubFields (Artist: & *Track: in below example):
this looks doable with Get & Transform aka Power Query: