Table of Data: Is Countifs my best solution?

SherriF
Copper Contributor
Nov 18, 2024
Thank you. What does the 'limit 20' do? I have over 30 part numbers and close to 200k models...
- peiyezhu
  Bronze Contributor
  Nov 18, 2024
  select * from Sheet1 limit 20;
  This is only for show raw data sturcture.
  After you split one column to 2 column model and pn,you just need below sql to show what models have PN1 and PN3.
  select *,group_concat(pn) from aa where instr('PN1,PN3',pn)>0 group by model having(count(distinct(pn))>1);
  
  This SQL query is designed to retrieve records from a table, specifically focusing on grouping and filtering rows based on certain conditions. Let's break down the query step by step:
  ### Query Breakdown:
```sql
SELECT *, GROUP_CONCAT(pn)
FROM aa
WHERE INSTR('PN1,PN3', pn) > 0
GROUP BY model
HAVING COUNT(DISTINCT pn) > 1;
```
  ### 1. **`SELECT *, GROUP_CONCAT(pn)`**
  - `SELECT *`: This selects all columns from the `aa` table.
  - `GROUP_CONCAT(pn)`: This is an aggregate function that concatenates the values in the `pn` column (the primary key or identifier you're interested in) for each group formed by the `GROUP BY` clause. The result is a comma-separated string of all `pn` values within each group.
  ### 2. **`FROM aa`**
  - Specifies the table `aa` from which to retrieve the data.
  ### 3. **`WHERE INSTR('PN1,PN3', pn) > 0`**
  - `INSTR('PN1,PN3', pn)` is a string function that checks if the value of `pn` is present within the string `'PN1,PN3'`.
  - The `INSTR()` function returns the position of the first occurrence of `pn` within `'PN1,PN3'`. If it finds the `pn` in the list, the function returns a positive number (the position where it was found). If it doesn't find it, it returns `0`.
  - The condition `INSTR('PN1,PN3', pn) > 0` ensures that only rows with `pn` values equal to `'PN1'` or `'PN3'` will be included in the result. So, the query filters out any rows where `pn` is not `'PN1'` or `'PN3'`.
  ### 4. **`GROUP BY model`**
  - This clause groups the rows by the `model` column. After grouping, the aggregate functions (like `GROUP_CONCAT(pn)`) will be applied to each group.
  ### 5. **`HAVING COUNT(DISTINCT pn) > 1`**
  - `HAVING` is used to filter groups after they have been formed by the `GROUP BY` clause.
  - `COUNT(DISTINCT pn) > 1` checks that there is more than one unique `pn` in each group. This means that, for each group, there must be at least two distinct `pn` values (either `'PN1'` and `'PN3'` in this case) for the group to be included in the result.
  ### Putting It All Together:
  - **Step 1**: The query selects all rows from the table `aa`, but only keeps those where `pn` is either `'PN1'` or `'PN3'`.
  - **Step 2**: It groups the selected rows by the `model` column.
  - **Step 3**: For each group, it checks that there are at least two distinct `pn` values (either `'PN1'` and `'PN3'`), and only those groups that satisfy this condition are included in the result.
  - **Step 4**: The query includes all columns of the table (due to `SELECT *`) and a concatenated list of the `pn` values (using `GROUP_CONCAT(pn)`) for each group.
  ### Example:
  Suppose your table `aa` has the following data:
  | model | pn |
  |-------|-----|
  | A | PN1 |
  | A | PN3 |
  | A | PN1 |
  | B | PN2 |
  | B | PN1 |
  | C | PN3 |
  | C | PN3 |
  - After applying `WHERE INSTR('PN1,PN3', pn) > 0`, the data will be filtered to:
  | model | pn |
  |-------|-----|
  | A | PN1 |
  | A | PN3 |
  | A | PN1 |
  | C | PN3 |
  | C | PN3 |
  - The `GROUP BY model` step will group the rows by `model`, resulting in two groups:
  - Group 1: `model = A`
  - Group 2: `model = C`
  - The `HAVING COUNT(DISTINCT pn) > 1` condition will filter out the group where there is only one unique `pn` value. Since `model C` has only `PN3` values, it will be excluded.
  - Finally, `GROUP_CONCAT(pn)` will concatenate the `pn` values for the `model A` group.
  The result would look like:
  | model | pn | GROUP_CONCAT(pn) |
  |-------|------|------------------|
  | A | PN1 | PN1,PN3,PN1 |
  In this example:
  - Only the `model A` group remains, because it contains both `PN1` and `PN3`.
  - The `pn` values are concatenated into a single string: `'PN1,PN3,PN1'`.
  ### Summary:
  This SQL query selects groups of rows from the `aa` table, where the `pn` values are either `'PN1'` or `'PN3'`. It then groups these rows by the `model` column and filters out groups where there is not at least one distinct value of `pn` from the set `('PN1', 'PN3')`. For the remaining groups, it concatenates the `pn` values into a comma-separated string and returns the result.

Forum Discussion

Table of Data: Is Countifs my best solution?