Forum Discussion

ChrisC365's avatar
ChrisC365
Brass Contributor
Mar 05, 2020
Solved

Extract several values from a column

Hi, Does anyone have any suggestions for this?

 

Chris

---------------------------------------

 

Hi  I

 

I have already posted this on the SQL community forum, but I am now thinking this may also be good place to post this question.   I use PowerQuery however the job I am trying to do below just causes the application to crash, it seems to be memory related  So I am hoping doing this in MS Access will be much easier and less strain on excel.  

 

So my rawdata comes from an SQL View, I want to strip out all the corresponding codes in the column called 'IN_CODE', and create a new record for each corresponding code I find.  Some columns can have 10 or more codes, so the routine needs to know when there are no more values to check or extract.  I want to strip out any unnecessary text, for example, any commas, periods, names or words and even brackets used around any text etc.  Each code is only 6 digits long, I thought if I could remove unwanted text, and try and remove every space, I could extract the code in 6 digits chunks; this roughly what I was doing in power query but the process or task was too memory intensive.

 

Step 1; my raw data, I am needing to extract all the 6 digit codes from the column called 'IN_CODE'.  

Step1 Codes are all 6 digits length, there also extra spaces between codes
PROJECT_IDPROJECT_TITLEIN_CODE
1246NAME2R12345 and    R45647, RA4564 & RB4561
1261NAME3R12478
1276NAME4R12457,
1291NAME5RA4789
1306NAME6TY7894
1321NAME7FG5674
1336NAME8R45789 ,   TY7894
1351NAME9R13445 R78456 & R96325 (SCCA) R44557 (Dept) R77889 R11122
1366NAME10RT7845
1381NAME11RT1234

 

Step 2, I then want to remove all unwanted text, e.g. 'and' ',', '&' '(SCCA)' etc  There are also extra spaces just for a good measure. 

Step2 Codes are all 6 digits length, there also extra spaces between codes
PROJECT_IDPROJECT_TITLEIN_CODE
1246NAME2R12345 R45647    RA4564 RB4561
1261NAME3R12478
1276NAME4R12457
1291NAME5RA4789
1306NAME6TY7894
1321NAME7FG5674
1336NAME8R45789 TY7894
1351NAME9R13445 R78456 R96325 R44557 R77889 R11122
1366NAME10RT7845
1381NAME11RT1234

 

Step 3, so I want to get the data from step 1 to look like this - 

Step3  
PROJECT_IDPROJECT_TITLEIN_CODE
1246NAME2R12345
1246NAME2R45647
1246NAME2RA4564
1246NAME2RB4561
1261NAME3R12478
1276NAME4R12457
1291NAME5RA4789
1306NAME6TY7894
1321NAME7FG5674
1336NAME8R45789
1336NAME8TY7894
1351NAME9R13445
1351NAME9R78456
1351NAME9R96325
1351NAME9R44557
1351NAME9R77889
1351NAME9R11122
1366NAME10RT7845
1381NAME11RT1234

 

Any help appreciated.

 

Chris C

  • ChrisC365 

    You could solve this with a helper table and VBA in Access. The helper table is called UnwantedText and contains the (pieces of) text you want to remove from IN_CODE:

    You can extend this table with any text you want/need.

     

    Next, we define a table (ProjectParsed) for the parsed result:

    Note that the three fields together form the primary key.

     

    Assuming that the raw project data is in a (linked) table which is called ProjectRaw (you can name it anything you like, but you have to make a slight modification in the coming VBA code if you name it different), this VBA subroutine removes the unwanted text and parses the 6 digit occurrences in IN_CODE into separate rows/records in the table ProjectParsed:

    Sub RemoveAndParse()
      'Declare
      Dim sInCode As String, sSQL As String
      Dim i As Integer
        
      'Switch off warnings
      DoCmd.SetWarnings False
        
      'Remove all earlier records from the result table
      sSQL = "DELETE FROM ProjectParsed"
      DoCmd.RunSQL sSQL
          
      'Open tables Project and UnwantedText as recordsets to loop over
      Dim rsProject As DAO.Recordset
      Set rsProject = CurrentDb.OpenRecordset("ProjectRaw", dbOpenTable)
    
      Dim rsUnwantedText As DAO.Recordset
      Set rsUnwantedText = CurrentDb.OpenRecordset("UnwantedText", dbOpenTable)
    
      'Loop over the (raw) projects
      Do Until rsProject.EOF
        sInCode = rsProject!IN_CODE
        
        'Remove all the spaces from IN_CODE
        sInCode = Replace(sInCode, " ", "")
        
        'Remove the unwanted text from IN_CODE
        rsUnwantedText.MoveFirst 'go back to first unwanted text for every cycle
        'Loop over the unwanted text
        Do Until rsUnwantedText.EOF
          sInCode = Replace(sInCode, rsUnwantedText!Unwanted, "") 'remove text (replace it by a blank)
          rsUnwantedText.MoveNext 'move to next unwanted text
        Loop
        
        'Parse the 6 digit codes
        For i = 1 To Len(sInCode) Step 6
          sSQL = "INSERT INTO ProjectParsed VALUES('" & rsProject!PROJECT_ID & "', '" & _
                 rsProject!PROJECT_TITLE & "', '" & Mid(sInCode, i, 6) & "')"
          DoCmd.RunSQL sSQL
        Next
        
        rsProject.MoveNext 'move to next project
      Loop
      
      rsUnwantedText.Close
      rsProject.Close
      
      'Switch on warnings again
      DoCmd.SetWarnings True
    End Sub

     

    I assume and hope that the comments in the code explain what the subroutine does. You can run the routine from the VBA editor or you can call it from a button in one of your Access forms.

     

    After running it you'll get something in the table ProjectParsed like this:

     

    Hope this does the job. And if you need more clarification, please let me know.

     

    Best wishes,

    Tieme

     

2 Replies

  • Woldman's avatar
    Woldman
    Iron Contributor

    ChrisC365 

    You could solve this with a helper table and VBA in Access. The helper table is called UnwantedText and contains the (pieces of) text you want to remove from IN_CODE:

    You can extend this table with any text you want/need.

     

    Next, we define a table (ProjectParsed) for the parsed result:

    Note that the three fields together form the primary key.

     

    Assuming that the raw project data is in a (linked) table which is called ProjectRaw (you can name it anything you like, but you have to make a slight modification in the coming VBA code if you name it different), this VBA subroutine removes the unwanted text and parses the 6 digit occurrences in IN_CODE into separate rows/records in the table ProjectParsed:

    Sub RemoveAndParse()
      'Declare
      Dim sInCode As String, sSQL As String
      Dim i As Integer
        
      'Switch off warnings
      DoCmd.SetWarnings False
        
      'Remove all earlier records from the result table
      sSQL = "DELETE FROM ProjectParsed"
      DoCmd.RunSQL sSQL
          
      'Open tables Project and UnwantedText as recordsets to loop over
      Dim rsProject As DAO.Recordset
      Set rsProject = CurrentDb.OpenRecordset("ProjectRaw", dbOpenTable)
    
      Dim rsUnwantedText As DAO.Recordset
      Set rsUnwantedText = CurrentDb.OpenRecordset("UnwantedText", dbOpenTable)
    
      'Loop over the (raw) projects
      Do Until rsProject.EOF
        sInCode = rsProject!IN_CODE
        
        'Remove all the spaces from IN_CODE
        sInCode = Replace(sInCode, " ", "")
        
        'Remove the unwanted text from IN_CODE
        rsUnwantedText.MoveFirst 'go back to first unwanted text for every cycle
        'Loop over the unwanted text
        Do Until rsUnwantedText.EOF
          sInCode = Replace(sInCode, rsUnwantedText!Unwanted, "") 'remove text (replace it by a blank)
          rsUnwantedText.MoveNext 'move to next unwanted text
        Loop
        
        'Parse the 6 digit codes
        For i = 1 To Len(sInCode) Step 6
          sSQL = "INSERT INTO ProjectParsed VALUES('" & rsProject!PROJECT_ID & "', '" & _
                 rsProject!PROJECT_TITLE & "', '" & Mid(sInCode, i, 6) & "')"
          DoCmd.RunSQL sSQL
        Next
        
        rsProject.MoveNext 'move to next project
      Loop
      
      rsUnwantedText.Close
      rsProject.Close
      
      'Switch on warnings again
      DoCmd.SetWarnings True
    End Sub

     

    I assume and hope that the comments in the code explain what the subroutine does. You can run the routine from the VBA editor or you can call it from a button in one of your Access forms.

     

    After running it you'll get something in the table ProjectParsed like this:

     

    Hope this does the job. And if you need more clarification, please let me know.

     

    Best wishes,

    Tieme

     

Resources