Table of Data: Is Countifs my best solution?

peiyezhu
Bronze Contributor
Nov 17, 2024
select * from Sheet1 limit 20;
create temp table aa as
select regexp2('^(.*):\s*(.*)$',f01,1) model,regexp2('^(.*):\s*(\S*)$',f01,2) pn from Sheet1;
//select *,instr('PN1,PN3',pn) ,instr('PN',pn) from aa;

select *,group_concat(pn) from aa where instr('PN1,PN3',pn)>0 group by model having(count(distinct(pn))>1);
- SherriF
  Copper Contributor
  Nov 18, 2024
  Thank you. What does the 'limit 20' do? I have over 30 part numbers and close to 200k models...
  - peiyezhu
    Bronze Contributor
    Nov 18, 2024
    select * from Sheet1 limit 20;
    This is only for show raw data sturcture.
    After you split one column to 2 column model and pn,you just need below sql to show what models have PN1 and PN3.
    select *,group_concat(pn) from aa where instr('PN1,PN3',pn)>0 group by model having(count(distinct(pn))>1);
    
    This SQL query is designed to retrieve records from a table, specifically focusing on grouping and filtering rows based on certain conditions. Let's break down the query step by step:
    ### Query Breakdown:
```sql
SELECT *, GROUP_CONCAT(pn)
FROM aa
WHERE INSTR('PN1,PN3', pn) > 0
GROUP BY model
HAVING COUNT(DISTINCT pn) > 1;
```
    ### 1. **`SELECT *, GROUP_CONCAT(pn)`**
    - `SELECT *`: This selects all columns from the `aa` table.
    - `GROUP_CONCAT(pn)`: This is an aggregate function that concatenates the values in the `pn` column (the primary key or identifier you're interested in) for each group formed by the `GROUP BY` clause. The result is a comma-separated string of all `pn` values within each group.
    ### 2. **`FROM aa`**
    - Specifies the table `aa` from which to retrieve the data.
    ### 3. **`WHERE INSTR('PN1,PN3', pn) > 0`**
    - `INSTR('PN1,PN3', pn)` is a string function that checks if the value of `pn` is present within the string `'PN1,PN3'`.
    - The `INSTR()` function returns the position of the first occurrence of `pn` within `'PN1,PN3'`. If it finds the `pn` in the list, the function returns a positive number (the position where it was found). If it doesn't find it, it returns `0`.
    - The condition `INSTR('PN1,PN3', pn) > 0` ensures that only rows with `pn` values equal to `'PN1'` or `'PN3'` will be included in the result. So, the query filters out any rows where `pn` is not `'PN1'` or `'PN3'`.
    ### 4. **`GROUP BY model`**
    - This clause groups the rows by the `model` column. After grouping, the aggregate functions (like `GROUP_CONCAT(pn)`) will be applied to each group.
    ### 5. **`HAVING COUNT(DISTINCT pn) > 1`**
    - `HAVING` is used to filter groups after they have been formed by the `GROUP BY` clause.
    - `COUNT(DISTINCT pn) > 1` checks that there is more than one unique `pn` in each group. This means that, for each group, there must be at least two distinct `pn` values (either `'PN1'` and `'PN3'` in this case) for the group to be included in the result.
    ### Putting It All Together:
    - **Step 1**: The query selects all rows from the table `aa`, but only keeps those where `pn` is either `'PN1'` or `'PN3'`.
    - **Step 2**: It groups the selected rows by the `model` column.
    - **Step 3**: For each group, it checks that there are at least two distinct `pn` values (either `'PN1'` and `'PN3'`), and only those groups that satisfy this condition are included in the result.
    - **Step 4**: The query includes all columns of the table (due to `SELECT *`) and a concatenated list of the `pn` values (using `GROUP_CONCAT(pn)`) for each group.
    ### Example:
    Suppose your table `aa` has the following data:
    | model | pn |
    |-------|-----|
    | A | PN1 |
    | A | PN3 |
    | A | PN1 |
    | B | PN2 |
    | B | PN1 |
    | C | PN3 |
    | C | PN3 |
    - After applying `WHERE INSTR('PN1,PN3', pn) > 0`, the data will be filtered to:
    | model | pn |
    |-------|-----|
    | A | PN1 |
    | A | PN3 |
    | A | PN1 |
    | C | PN3 |
    | C | PN3 |
    - The `GROUP BY model` step will group the rows by `model`, resulting in two groups:
    - Group 1: `model = A`
    - Group 2: `model = C`
    - The `HAVING COUNT(DISTINCT pn) > 1` condition will filter out the group where there is only one unique `pn` value. Since `model C` has only `PN3` values, it will be excluded.
    - Finally, `GROUP_CONCAT(pn)` will concatenate the `pn` values for the `model A` group.
    The result would look like:
    | model | pn | GROUP_CONCAT(pn) |
    |-------|------|------------------|
    | A | PN1 | PN1,PN3,PN1 |
    In this example:
    - Only the `model A` group remains, because it contains both `PN1` and `PN3`.
    - The `pn` values are concatenated into a single string: `'PN1,PN3,PN1'`.
    ### Summary:
    This SQL query selects groups of rows from the `aa` table, where the `pn` values are either `'PN1'` or `'PN3'`. It then groups these rows by the `model` column and filters out groups where there is not at least one distinct value of `pn` from the set `('PN1', 'PN3')`. For the remaining groups, it concatenates the `pn` values into a comma-separated string and returns the result.

Forum Discussion

Table of Data: Is Countifs my best solution?