SOLVED

Extract several values from a column

Brass Contributor

Hi, Does anyone have any suggestions for this?

 

Chris

---------------------------------------

 

Hi  I

 

I have already posted this on the SQL community forum, but I am now thinking this may also be good place to post this question.   I use PowerQuery however the job I am trying to do below just causes the application to crash, it seems to be memory related  So I am hoping doing this in MS Access will be much easier and less strain on excel.  

 

So my rawdata comes from an SQL View, I want to strip out all the corresponding codes in the column called 'IN_CODE', and create a new record for each corresponding code I find.  Some columns can have 10 or more codes, so the routine needs to know when there are no more values to check or extract.  I want to strip out any unnecessary text, for example, any commas, periods, names or words and even brackets used around any text etc.  Each code is only 6 digits long, I thought if I could remove unwanted text, and try and remove every space, I could extract the code in 6 digits chunks; this roughly what I was doing in power query but the process or task was too memory intensive.

 

Step 1; my raw data, I am needing to extract all the 6 digit codes from the column called 'IN_CODE'.  

Step1 Codes are all 6 digits length, there also extra spaces between codes
PROJECT_IDPROJECT_TITLEIN_CODE
1246NAME2R12345 and    R45647, RA4564 & RB4561
1261NAME3R12478
1276NAME4R12457,
1291NAME5RA4789
1306NAME6TY7894
1321NAME7FG5674
1336NAME8R45789 ,   TY7894
1351NAME9R13445 R78456 & R96325 (SCCA) R44557 (Dept) R77889 R11122
1366NAME10RT7845
1381NAME11RT1234

 

Step 2, I then want to remove all unwanted text, e.g. 'and' ',', '&' '(SCCA)' etc  There are also extra spaces just for a good measure. 

Step2 Codes are all 6 digits length, there also extra spaces between codes
PROJECT_IDPROJECT_TITLEIN_CODE
1246NAME2R12345 R45647    RA4564 RB4561
1261NAME3R12478
1276NAME4R12457
1291NAME5RA4789
1306NAME6TY7894
1321NAME7FG5674
1336NAME8R45789 TY7894
1351NAME9R13445 R78456 R96325 R44557 R77889 R11122
1366NAME10RT7845
1381NAME11RT1234

 

Step 3, so I want to get the data from step 1 to look like this - 

Step3  
PROJECT_IDPROJECT_TITLEIN_CODE
1246NAME2R12345
1246NAME2R45647
1246NAME2RA4564
1246NAME2RB4561
1261NAME3R12478
1276NAME4R12457
1291NAME5RA4789
1306NAME6TY7894
1321NAME7FG5674
1336NAME8R45789
1336NAME8TY7894
1351NAME9R13445
1351NAME9R78456
1351NAME9R96325
1351NAME9R44557
1351NAME9R77889
1351NAME9R11122
1366NAME10RT7845
1381NAME11RT1234

 

Any help appreciated.

 

Chris C

2 Replies
best response confirmed by ChrisC365 (Brass Contributor)
Solution

@ChrisC365 

You could solve this with a helper table and VBA in Access. The helper table is called UnwantedText and contains the (pieces of) text you want to remove from IN_CODE:

incode1.png

You can extend this table with any text you want/need.

 

Next, we define a table (ProjectParsed) for the parsed result:

incode2.png

Note that the three fields together form the primary key.

 

Assuming that the raw project data is in a (linked) table which is called ProjectRaw (you can name it anything you like, but you have to make a slight modification in the coming VBA code if you name it different), this VBA subroutine removes the unwanted text and parses the 6 digit occurrences in IN_CODE into separate rows/records in the table ProjectParsed:

Sub RemoveAndParse()
  'Declare
  Dim sInCode As String, sSQL As String
  Dim i As Integer
    
  'Switch off warnings
  DoCmd.SetWarnings False
    
  'Remove all earlier records from the result table
  sSQL = "DELETE FROM ProjectParsed"
  DoCmd.RunSQL sSQL
      
  'Open tables Project and UnwantedText as recordsets to loop over
  Dim rsProject As DAO.Recordset
  Set rsProject = CurrentDb.OpenRecordset("ProjectRaw", dbOpenTable)

  Dim rsUnwantedText As DAO.Recordset
  Set rsUnwantedText = CurrentDb.OpenRecordset("UnwantedText", dbOpenTable)

  'Loop over the (raw) projects
  Do Until rsProject.EOF
    sInCode = rsProject!IN_CODE
    
    'Remove all the spaces from IN_CODE
    sInCode = Replace(sInCode, " ", "")
    
    'Remove the unwanted text from IN_CODE
    rsUnwantedText.MoveFirst 'go back to first unwanted text for every cycle
    'Loop over the unwanted text
    Do Until rsUnwantedText.EOF
      sInCode = Replace(sInCode, rsUnwantedText!Unwanted, "") 'remove text (replace it by a blank)
      rsUnwantedText.MoveNext 'move to next unwanted text
    Loop
    
    'Parse the 6 digit codes
    For i = 1 To Len(sInCode) Step 6
      sSQL = "INSERT INTO ProjectParsed VALUES('" & rsProject!PROJECT_ID & "', '" & _
             rsProject!PROJECT_TITLE & "', '" & Mid(sInCode, i, 6) & "')"
      DoCmd.RunSQL sSQL
    Next
    
    rsProject.MoveNext 'move to next project
  Loop
  
  rsUnwantedText.Close
  rsProject.Close
  
  'Switch on warnings again
  DoCmd.SetWarnings True
End Sub

 

I assume and hope that the comments in the code explain what the subroutine does. You can run the routine from the VBA editor or you can call it from a button in one of your Access forms.

 

After running it you'll get something in the table ProjectParsed like this:

incode3.png

 

Hope this does the job. And if you need more clarification, please let me know.

 

Best wishes,

Tieme

 

@Tieme Woldman 

 

That looks really good, I will give it a whirl.

 

thanks.

 

1 best response

Accepted Solutions
best response confirmed by ChrisC365 (Brass Contributor)
Solution

@ChrisC365 

You could solve this with a helper table and VBA in Access. The helper table is called UnwantedText and contains the (pieces of) text you want to remove from IN_CODE:

incode1.png

You can extend this table with any text you want/need.

 

Next, we define a table (ProjectParsed) for the parsed result:

incode2.png

Note that the three fields together form the primary key.

 

Assuming that the raw project data is in a (linked) table which is called ProjectRaw (you can name it anything you like, but you have to make a slight modification in the coming VBA code if you name it different), this VBA subroutine removes the unwanted text and parses the 6 digit occurrences in IN_CODE into separate rows/records in the table ProjectParsed:

Sub RemoveAndParse()
  'Declare
  Dim sInCode As String, sSQL As String
  Dim i As Integer
    
  'Switch off warnings
  DoCmd.SetWarnings False
    
  'Remove all earlier records from the result table
  sSQL = "DELETE FROM ProjectParsed"
  DoCmd.RunSQL sSQL
      
  'Open tables Project and UnwantedText as recordsets to loop over
  Dim rsProject As DAO.Recordset
  Set rsProject = CurrentDb.OpenRecordset("ProjectRaw", dbOpenTable)

  Dim rsUnwantedText As DAO.Recordset
  Set rsUnwantedText = CurrentDb.OpenRecordset("UnwantedText", dbOpenTable)

  'Loop over the (raw) projects
  Do Until rsProject.EOF
    sInCode = rsProject!IN_CODE
    
    'Remove all the spaces from IN_CODE
    sInCode = Replace(sInCode, " ", "")
    
    'Remove the unwanted text from IN_CODE
    rsUnwantedText.MoveFirst 'go back to first unwanted text for every cycle
    'Loop over the unwanted text
    Do Until rsUnwantedText.EOF
      sInCode = Replace(sInCode, rsUnwantedText!Unwanted, "") 'remove text (replace it by a blank)
      rsUnwantedText.MoveNext 'move to next unwanted text
    Loop
    
    'Parse the 6 digit codes
    For i = 1 To Len(sInCode) Step 6
      sSQL = "INSERT INTO ProjectParsed VALUES('" & rsProject!PROJECT_ID & "', '" & _
             rsProject!PROJECT_TITLE & "', '" & Mid(sInCode, i, 6) & "')"
      DoCmd.RunSQL sSQL
    Next
    
    rsProject.MoveNext 'move to next project
  Loop
  
  rsUnwantedText.Close
  rsProject.Close
  
  'Switch on warnings again
  DoCmd.SetWarnings True
End Sub

 

I assume and hope that the comments in the code explain what the subroutine does. You can run the routine from the VBA editor or you can call it from a button in one of your Access forms.

 

After running it you'll get something in the table ProjectParsed like this:

incode3.png

 

Hope this does the job. And if you need more clarification, please let me know.

 

Best wishes,

Tieme

 

View solution in original post