Forum Discussion
Extract several values from a column
Hi, Does anyone have any suggestions for this?
Chris
---------------------------------------
Hi I
I have already posted this on the SQL community forum, but I am now thinking this may also be good place to post this question. I use PowerQuery however the job I am trying to do below just causes the application to crash, it seems to be memory related So I am hoping doing this in MS Access will be much easier and less strain on excel.
So my rawdata comes from an SQL View, I want to strip out all the corresponding codes in the column called 'IN_CODE', and create a new record for each corresponding code I find. Some columns can have 10 or more codes, so the routine needs to know when there are no more values to check or extract. I want to strip out any unnecessary text, for example, any commas, periods, names or words and even brackets used around any text etc. Each code is only 6 digits long, I thought if I could remove unwanted text, and try and remove every space, I could extract the code in 6 digits chunks; this roughly what I was doing in power query but the process or task was too memory intensive.
Step 1; my raw data, I am needing to extract all the 6 digit codes from the column called 'IN_CODE'.
| Step1 | Codes are all 6 digits length, there also extra spaces between codes | |
| PROJECT_ID | PROJECT_TITLE | IN_CODE |
| 1246 | NAME2 | R12345 and R45647, RA4564 & RB4561 |
| 1261 | NAME3 | R12478 |
| 1276 | NAME4 | R12457, |
| 1291 | NAME5 | RA4789 |
| 1306 | NAME6 | TY7894 |
| 1321 | NAME7 | FG5674 |
| 1336 | NAME8 | R45789 , TY7894 |
| 1351 | NAME9 | R13445 R78456 & R96325 (SCCA) R44557 (Dept) R77889 R11122 |
| 1366 | NAME10 | RT7845 |
| 1381 | NAME11 | RT1234 |
Step 2, I then want to remove all unwanted text, e.g. 'and' ',', '&' , '(SCCA)' etc There are also extra spaces just for a good measure.
| Step2 | Codes are all 6 digits length, there also extra spaces between codes | |
| PROJECT_ID | PROJECT_TITLE | IN_CODE |
| 1246 | NAME2 | R12345 R45647 RA4564 RB4561 |
| 1261 | NAME3 | R12478 |
| 1276 | NAME4 | R12457 |
| 1291 | NAME5 | RA4789 |
| 1306 | NAME6 | TY7894 |
| 1321 | NAME7 | FG5674 |
| 1336 | NAME8 | R45789 TY7894 |
| 1351 | NAME9 | R13445 R78456 R96325 R44557 R77889 R11122 |
| 1366 | NAME10 | RT7845 |
| 1381 | NAME11 | RT1234 |
Step 3, so I want to get the data from step 1 to look like this -
| Step3 | ||
| PROJECT_ID | PROJECT_TITLE | IN_CODE |
| 1246 | NAME2 | R12345 |
| 1246 | NAME2 | R45647 |
| 1246 | NAME2 | RA4564 |
| 1246 | NAME2 | RB4561 |
| 1261 | NAME3 | R12478 |
| 1276 | NAME4 | R12457 |
| 1291 | NAME5 | RA4789 |
| 1306 | NAME6 | TY7894 |
| 1321 | NAME7 | FG5674 |
| 1336 | NAME8 | R45789 |
| 1336 | NAME8 | TY7894 |
| 1351 | NAME9 | R13445 |
| 1351 | NAME9 | R78456 |
| 1351 | NAME9 | R96325 |
| 1351 | NAME9 | R44557 |
| 1351 | NAME9 | R77889 |
| 1351 | NAME9 | R11122 |
| 1366 | NAME10 | RT7845 |
| 1381 | NAME11 | RT1234 |
Any help appreciated.
Chris C
You could solve this with a helper table and VBA in Access. The helper table is called UnwantedText and contains the (pieces of) text you want to remove from IN_CODE:
You can extend this table with any text you want/need.
Next, we define a table (ProjectParsed) for the parsed result:
Note that the three fields together form the primary key.
Assuming that the raw project data is in a (linked) table which is called ProjectRaw (you can name it anything you like, but you have to make a slight modification in the coming VBA code if you name it different), this VBA subroutine removes the unwanted text and parses the 6 digit occurrences in IN_CODE into separate rows/records in the table ProjectParsed:
Sub RemoveAndParse() 'Declare Dim sInCode As String, sSQL As String Dim i As Integer 'Switch off warnings DoCmd.SetWarnings False 'Remove all earlier records from the result table sSQL = "DELETE FROM ProjectParsed" DoCmd.RunSQL sSQL 'Open tables Project and UnwantedText as recordsets to loop over Dim rsProject As DAO.Recordset Set rsProject = CurrentDb.OpenRecordset("ProjectRaw", dbOpenTable) Dim rsUnwantedText As DAO.Recordset Set rsUnwantedText = CurrentDb.OpenRecordset("UnwantedText", dbOpenTable) 'Loop over the (raw) projects Do Until rsProject.EOF sInCode = rsProject!IN_CODE 'Remove all the spaces from IN_CODE sInCode = Replace(sInCode, " ", "") 'Remove the unwanted text from IN_CODE rsUnwantedText.MoveFirst 'go back to first unwanted text for every cycle 'Loop over the unwanted text Do Until rsUnwantedText.EOF sInCode = Replace(sInCode, rsUnwantedText!Unwanted, "") 'remove text (replace it by a blank) rsUnwantedText.MoveNext 'move to next unwanted text Loop 'Parse the 6 digit codes For i = 1 To Len(sInCode) Step 6 sSQL = "INSERT INTO ProjectParsed VALUES('" & rsProject!PROJECT_ID & "', '" & _ rsProject!PROJECT_TITLE & "', '" & Mid(sInCode, i, 6) & "')" DoCmd.RunSQL sSQL Next rsProject.MoveNext 'move to next project Loop rsUnwantedText.Close rsProject.Close 'Switch on warnings again DoCmd.SetWarnings True End SubI assume and hope that the comments in the code explain what the subroutine does. You can run the routine from the VBA editor or you can call it from a button in one of your Access forms.
After running it you'll get something in the table ProjectParsed like this:
Hope this does the job. And if you need more clarification, please let me know.
Best wishes,
Tieme
2 Replies
- WoldmanIron Contributor
You could solve this with a helper table and VBA in Access. The helper table is called UnwantedText and contains the (pieces of) text you want to remove from IN_CODE:
You can extend this table with any text you want/need.
Next, we define a table (ProjectParsed) for the parsed result:
Note that the three fields together form the primary key.
Assuming that the raw project data is in a (linked) table which is called ProjectRaw (you can name it anything you like, but you have to make a slight modification in the coming VBA code if you name it different), this VBA subroutine removes the unwanted text and parses the 6 digit occurrences in IN_CODE into separate rows/records in the table ProjectParsed:
Sub RemoveAndParse() 'Declare Dim sInCode As String, sSQL As String Dim i As Integer 'Switch off warnings DoCmd.SetWarnings False 'Remove all earlier records from the result table sSQL = "DELETE FROM ProjectParsed" DoCmd.RunSQL sSQL 'Open tables Project and UnwantedText as recordsets to loop over Dim rsProject As DAO.Recordset Set rsProject = CurrentDb.OpenRecordset("ProjectRaw", dbOpenTable) Dim rsUnwantedText As DAO.Recordset Set rsUnwantedText = CurrentDb.OpenRecordset("UnwantedText", dbOpenTable) 'Loop over the (raw) projects Do Until rsProject.EOF sInCode = rsProject!IN_CODE 'Remove all the spaces from IN_CODE sInCode = Replace(sInCode, " ", "") 'Remove the unwanted text from IN_CODE rsUnwantedText.MoveFirst 'go back to first unwanted text for every cycle 'Loop over the unwanted text Do Until rsUnwantedText.EOF sInCode = Replace(sInCode, rsUnwantedText!Unwanted, "") 'remove text (replace it by a blank) rsUnwantedText.MoveNext 'move to next unwanted text Loop 'Parse the 6 digit codes For i = 1 To Len(sInCode) Step 6 sSQL = "INSERT INTO ProjectParsed VALUES('" & rsProject!PROJECT_ID & "', '" & _ rsProject!PROJECT_TITLE & "', '" & Mid(sInCode, i, 6) & "')" DoCmd.RunSQL sSQL Next rsProject.MoveNext 'move to next project Loop rsUnwantedText.Close rsProject.Close 'Switch on warnings again DoCmd.SetWarnings True End SubI assume and hope that the comments in the code explain what the subroutine does. You can run the routine from the VBA editor or you can call it from a button in one of your Access forms.
After running it you'll get something in the table ProjectParsed like this:
Hope this does the job. And if you need more clarification, please let me know.
Best wishes,
Tieme
- ChrisC365Brass Contributor