SOLVED

Convert Json With Nested Arrays to CSV

Copper Contributor

We are trying to get data that is exported through an API converted to a CSV. Each "columns" array is the data we need, and ALL the "columns" arrays are nested in a single "rows" array.

 

To give you the entire workflow, we call for the API, the JSON output is saved to a JSON file. That file is then recalled to process the CSV export.

 

Here is a code sample. Keep in mind that this is just one "columns" worth of data, and we can expect there to be ~1000 "columns" arrays. The "name" represents the headers and "value" represents the data, and each "columns" array should be a new row.

 

 

"rows": [
            {
                "columns":  [
                                {
                                "name":  "student_user_id",
                                "value":  "123456"
                                },
                                {
                                "name":  "ec_firstname"
                                },
                                {
                                "name":  "ec_lastname"
                                },
                                {
                                "name":  "ec_phonetype"
                                },
                                {
                                "name":  "ec_phone"
                                },
                                {
                                "name":  "ec_sortorder"
                                }
                            ]
            }
        ]

 

 

Some other details that might be relevant: "name is the column header. "student_user_id" will always be populated with a "value" however not all "name" headers will have a "value".

 

Thanks for the input!

5 Replies

@MikeRD 

 

Here's a sample based on only caring about the contents of the "columns" array rather than the parent JSON structures.

 

Hopefully this gives you some ideas on how to get what's effectively a hash table into object notation.

 

$jsontext = '
{
    "rows": [
            {
                "columns":  [
                                {
                                "name":  "student_user_id",
                                "value":  "123456"
                                },
                                {
                                "name":  "ec_firstname"
                                },
                                {
                                "name":  "ec_lastname"
                                },
                                {
                                "name":  "ec_phonetype"
                                },
                                {
                                "name":  "ec_phone"
                                },
                                {
                                "name":  "ec_sortorder"
                                }
                            ]
            }
        ]
}
';

$json = ConvertFrom-Json -InputObject $jsontext;
$ht = [System.Collections.Hashtable]::new();
$json.rows.columns | ForEach-Object { $ht.Add($_.name, $_.value); }
[PSCustomObject]$ht;

 

The output produced from using your sample looks like this (Format-List presentation.)

 

ec_phonetype :
ec_lastname :
ec_phone :
ec_sortorder :
ec_firstname :
student_user_id : 123456

 

Cheers,

Lain

change the last line to accomplish the CSV output you mentioned in your original post.

 

 

[PSCustomObject]$ht | ConvertTo-Csv -NoTypeInformation

 

 

 

 

The output should look like this:

 

"ec_phonetype","ec_lastname","ec_phone","ec_sortorder","ec_firstname","student_user_id"
,,,,,"123456"

 

@LainRobertson

Great job with your conversion process.

 

Thanks,

tumtum

 

 

@LainRobertson Thanks! This certainly did help. However getting this into production has some other issues.  have tried the last few days to come up with a solution, but am not successful. 

 

The issue is, the file we want to convert actually has around 500 of the nested arrays. From what I understand about hashtables, you cant have duplicate keys. I have tried to loop the hash to object portion of the script, but has not been successful. Here is a sample:

 

 "results":  {
        "rows":  [
                     {
                         "columns":  [
                                         {
                                             "name":  "student_user_id",
                                             "value":  "1234"
                                         },
                                         {
                                             "name":  "ec_firstname"
                                         },
                                         {
                                             "name":  "ec_lastname"
                                         },
                                         {
                                             "name":  "ec_phonetype"
                                         },
                                         {
                                             "name":  "ec_phone"
                                         },
                                         {
                                             "name":  "ec_sortorder"
                                         }
                                     ]
                     },
                     {
                         "columns":  [
                                         {
                                             "name":  "student_user_id",
                                             "value":  "5678"
                                         },
                                         {
                                             "name":  "ec_firstname"
                                         },
                                         {
                                             "name":  "ec_lastname"
                                         },
                                         {
                                             "name":  "ec_phonetype"
                                         },
                                         {
                                             "name":  "ec_phone"
                                         },
                                         {
                                             "name":  "ec_sortorder"
                                         }
                                     ]
                     }
                ]
            }

 

With all the resource I found, I either got errors or it returns a blank object.

best response confirmed by MikeRD (Copper Contributor)
Solution

@MikeRD 

 

It's pretty much in line with what I had above.

 

Here's a more complete, yet still basic script. I simply named this Get-StudentsFromJson.ps1 so I could provide some use examples below but you can call it what you like.

 

Use example 1: Providing the path to a properly-formatted JSON file.

 

 

.\Get-StudentsFromJson.ps1 -Path .\json1.txt

 

 

 

Use example 2: Piping in the text.

Here, I'm taking it from a file but it could just as easily come from any command that returns peoperly-formatted JSON, such as Invoke-Method.

 

 

Get-Content -Path .\json.txt -Raw | .\Get-StudentsFromJson.ps1

 

 

 

The script itself (again, this is nothing fancy and I'm stopping at converting it to object notation as I fully expect you know how to then further convert it to CSV or whatever else you'd like to do with the object(s).)

 

 

 

[cmdletbinding(DefaultParameterSetName="ByPath")]
param(
    [parameter(Mandatory=$true, ParameterSetName="ByPath")][ValidateNotNullOrEmpty()][string] $Path
    , [parameter(Mandatory=$true, ParameterSetName="ByPipe", ValueFromPipeline=$true, Position=0)][ValidateNotNullOrEmpty()][string[]] $RawText
)

# If ByPath is the entry point, check that the specified file actually exists.
if ($Path)
{
    if (Test-Path -Path $Path)
    {
        (Get-Content -Path $Path -Raw | ConvertFrom-Json).results.rows | ForEach-Object {
            $ht = [System.Collections.Hashtable]::new();
            $_.columns | ForEach-Object { $ht.Add($_.name, $_.value); }
            [PSCustomObject]$ht;
        };
    }
    else
    {
        throw [System.IO.FileNotFoundException]::new();
    }

}
else
{
    ($RawText | ConvertFrom-Json).results.rows | ForEach-Object {
        $ht = [System.Collections.Hashtable]::new();
        $_.columns | ForEach-Object { $ht.Add($_.name, $_.value); }
        [PSCustomObject]$ht;
    };
}

 

 

 

It's worth noting I've tried to chain pipes together as much as possible to promote early object release. You can store conversions in things like variables but that can manifest as memory issues where very large input sources are used.

 

That's not relevant where you're only talking about 500 - or even thousands - of rows, but if you get into the hundreds of thousands, it's a different story. Anyway, I digress.

 

Note

With your JSON example, it's missing the outer prefixed "{" and closing "}", which in turn causes ConvertFrom-Json to throw a hissy-fit.

 

If you add that in to get a properly formatted response then you get this as example output.

 

 

 

ec_phonetype    :
ec_lastname     :
ec_phone        :
ec_sortorder    :
ec_firstname    :
student_user_id : 1234

ec_phonetype    :
ec_lastname     :
ec_phone        :
ec_sortorder    :
ec_firstname    :
student_user_id : 5678

 

 

 

Cheers,

Lain

@LainRobertson 

 

I truly appreciate you taking the time to explain your work. It has helped tremendously! I was able to export a CSV by assigning your script to an $output object and defining that in the export parameters.

 

Thanks again for your help!

 

1 best response

Accepted Solutions
best response confirmed by MikeRD (Copper Contributor)
Solution

@MikeRD 

 

It's pretty much in line with what I had above.

 

Here's a more complete, yet still basic script. I simply named this Get-StudentsFromJson.ps1 so I could provide some use examples below but you can call it what you like.

 

Use example 1: Providing the path to a properly-formatted JSON file.

 

 

.\Get-StudentsFromJson.ps1 -Path .\json1.txt

 

 

 

Use example 2: Piping in the text.

Here, I'm taking it from a file but it could just as easily come from any command that returns peoperly-formatted JSON, such as Invoke-Method.

 

 

Get-Content -Path .\json.txt -Raw | .\Get-StudentsFromJson.ps1

 

 

 

The script itself (again, this is nothing fancy and I'm stopping at converting it to object notation as I fully expect you know how to then further convert it to CSV or whatever else you'd like to do with the object(s).)

 

 

 

[cmdletbinding(DefaultParameterSetName="ByPath")]
param(
    [parameter(Mandatory=$true, ParameterSetName="ByPath")][ValidateNotNullOrEmpty()][string] $Path
    , [parameter(Mandatory=$true, ParameterSetName="ByPipe", ValueFromPipeline=$true, Position=0)][ValidateNotNullOrEmpty()][string[]] $RawText
)

# If ByPath is the entry point, check that the specified file actually exists.
if ($Path)
{
    if (Test-Path -Path $Path)
    {
        (Get-Content -Path $Path -Raw | ConvertFrom-Json).results.rows | ForEach-Object {
            $ht = [System.Collections.Hashtable]::new();
            $_.columns | ForEach-Object { $ht.Add($_.name, $_.value); }
            [PSCustomObject]$ht;
        };
    }
    else
    {
        throw [System.IO.FileNotFoundException]::new();
    }

}
else
{
    ($RawText | ConvertFrom-Json).results.rows | ForEach-Object {
        $ht = [System.Collections.Hashtable]::new();
        $_.columns | ForEach-Object { $ht.Add($_.name, $_.value); }
        [PSCustomObject]$ht;
    };
}

 

 

 

It's worth noting I've tried to chain pipes together as much as possible to promote early object release. You can store conversions in things like variables but that can manifest as memory issues where very large input sources are used.

 

That's not relevant where you're only talking about 500 - or even thousands - of rows, but if you get into the hundreds of thousands, it's a different story. Anyway, I digress.

 

Note

With your JSON example, it's missing the outer prefixed "{" and closing "}", which in turn causes ConvertFrom-Json to throw a hissy-fit.

 

If you add that in to get a properly formatted response then you get this as example output.

 

 

 

ec_phonetype    :
ec_lastname     :
ec_phone        :
ec_sortorder    :
ec_firstname    :
student_user_id : 1234

ec_phonetype    :
ec_lastname     :
ec_phone        :
ec_sortorder    :
ec_firstname    :
student_user_id : 5678

 

 

 

Cheers,

Lain

View solution in original post