Forum Discussion
Column-Level Lineage Visualization Issue for Custom Entities and Processes in Azure Purview
I’m trying to implement column-level lineage for data assets and custom transformation processes in Azure Purview using the Atlas API. I have defined custom typedefs for tables (yellowbrick_table), columns (column), and a process type (custom_data_transformation_process), and I’m uploading entities with composition relationships and detailed column mappings.
Although the generated JSON and typedefs appear to be correct according to the Apache Atlas documentation, I’m unable to get the column-level lineage to display properly in the Purview UI
Specific Issues:
1. Missing 'Schema' tab in custom table entities (yellowbrick_table): When navigating to the detail page of a yellowbrick_table entity, the 'Schema' tab — where the contained columns should be listed — does not appear.
2. Missing 'Switch to column lineage' button in the lineage view of custom processes: In the 'Lineage' tab of a custom_data_transformation_process entity, the side panel titled 'Columns' displays "No mapped columns found," and there is no button or option to switch to column-level lineage view. This happens even when the columnMapping attribute is correctly populated in the process entity.
3. Error message when trying to edit column mappings in the process lineage panel: If I attempt to edit the column mapping in the lineage side panel of the process, I receive the error: "Unable to map columns for this asset. It's a process type asset that doesn't have a schema." (This is expected, as processes don’t have schemas, but it confirms that the UI is not interpreting the columnMapping for visualization purposes.)
Context and Steps Taken:
I have followed the Apache Atlas documentation and modeling patterns for lineage:
Typedef column: Defined with superTypes: ["Referenceable"].
RelationshipDef table_columns: Defined as a COMPOSITION between DataSet (extended by yellowbrick_table) and column, with cardinality: SET on the column side.
Typedef yellowbrick_table: Contains an attribute columns with typeName: "array<column>" and relationshipTypeName: "table_columns".
Typedef custom_data_transformation_process: Extends from Process and includes a columnMapping attribute of typeName: "array<string>".
Entities Uploaded (JSON):
* Table entities include complete definitions of their nested columns in the columns attribute.
* Process entities include the columnMapping attribute as a list of JSON strings, where each string represents a DatasetMapping with a nested ColumnMapping that uses only the column names (e.g., "Source": "COLUMN_NAME").
* I’ve tested with different browsers
Despite these efforts, the issue persists. I would like to know if there are any additional requirements or known behaviors in the Purview UI regarding column lineage visualization for custom types.
Specific Questions:
1. Is there any additional attribute or configuration required in the typedefs or entities to make the 'Schema' tab appear in my custom table entities?
2. Are there any specific requirements for the qualifiedName of tables or columns that could be preventing the column-level lineage from being visualized?
3. Could there be a known issue or limitation in the Purview UI regarding column-level lineage rendering for user-defined asset types?
4. Is there any way to verify on the Purview backend that the column composition relationships and the columnMapping of processes have been correctly indexed?"
Annexes:
# 1. Define the 'column' type
# Ensure the superType for 'column' is 'Referenceable'
typedef_payload_column = {
"entityDefs": [{
"category": "ENTITY",
"name": "column",
"description": "Columna lógica para linaje columna a columna",
"typeVersion": "1.0",
"superTypes": ["Referenceable"],
"attributeDefs": []
}]
}
# response = requests.post(typedef_url, headers=headers, json=typedef_payload_column)
# print(f"Estado typedef column: {response.status_code} ({response.text.strip()[:100]}...)")
# 2. Define the explicit relationship 'table_columns'
# Ensure that the cardinality of 'endDef2' (columns) is 'SET'
typedef_payload_table_columns_relationship = {
"relationshipDefs": [
{
"category": "RELATIONSHIP",
"name": "table_columns",
"description": "Relación entre una tabla y sus columnas",
"typeVersion": "1.0",
"superTypes": ["AtlasRelationship"],
"endDef1": {
"type": "DataSet",
"name": "parentTable",
"isContainer": True,
"cardinality": "SINGLE",
"isLegacyAttribute": False
},
"endDef2": {
"type": "column",
"name": "columns",
"isContainer": False,
"cardinality": "SET",
"isLegacyAttribute": False
},
"relationshipCategory": "COMPOSITION",
"attributeDefs": []
}
]
}
# response = requests.post(typedef_url, headers=headers, json=typedef_payload_table_columns_relationship)
# print(f"Estado relationshipDef table_columns: {response.status_code} ({response.text.strip()[:100]}...)")
# 3. Modify the table's typedef to use this relationship
# Ensure the 'columns' attribute on the table points to 'table_columns'
typedef_payload_yellowbrick_table = {
"entityDefs": [{
"category": "ENTITY",
"name": "yellowbrick_table",
"description": "Tabla en Yellowbrick",
"typeVersion": "1.0",
"superTypes": ["DataSet"],
"attributeDefs": [
{
"name": "columns",
"typeName": "array<column>",
"isOptional": True,
"cardinality": "LIST",
"valuesMinCount": 0,
"valuesMaxCount": -1,
"isUnique": False,
"isIndexable": False,
"includeInNotification": True,
"relationshipTypeName": "table_columns"
}
]
}]
}
# response = requests.post(typedef_url, headers=headers, json=typedef_payload_yellowbrick_table)
# print(f"Estado typedef yellowbrick_table: {response.status_code} ({response.text.strip()[:100]}...)")
# 4. Define the custom process type
# Ensure the 'columnMapping' attribute is a string array
typedef_payload_process = {
"entityDefs": [{
"category": "ENTITY",
"name": "custom_data_transformation_process",
"description": "Proceso de transformación de datos con linaje de columna (Custom)",
"typeVersion": "1.0",
"superTypes": ["Process"],
"attributeDefs": [
{
"name": "columnMapping",
"typeName": "array<string>",
"isOptional": True,
"cardinality": "LIST",
"valuesMinCount": 0,
"valuesMaxCount": -1
}
]
}]
}
# response = requests.post(typedef_url, headers=headers, json=typedef_payload_process)
# print(f"Estado typedef custom_data_transformation_process: {response.status_code} ({response.text.strip()[:100]}...)")
example of generated json:
[
{
"typeName": "yellowbrick_table",
"guid": "-105",
"attributes": {
"qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn",
"name": "TBL8_CONOCIMIENTO_CLIENTE",
"description": "Tabla origen: TBL8_CONOCIMIENTO_CLIENTE",
"columns": [
{
"typeName": "column",
"guid": "-336",
"attributes": {
"qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#CUENTA",
"name": "CUENTA",
"description": "Columna CUENTA de tabla tbl8_conocimiento_cliente",
"type": "string",
"dataType": "string"
}
},
{
"typeName": "column",
"guid": "-338",
"attributes": {
"qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#TIPO_DOCUMENTO",
"name": "TIPO_DOCUMENTO",
"description": "Columna TIPO_DOCUMENTO de tabla tbl8_conocimiento_cliente",
"type": "string",
"dataType": "string"
}
},
{
"typeName": "column",
"guid": "-340",
"attributes": {
"qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#IDENTIFICACION",
"name": "IDENTIFICACION",
"description": "Columna IDENTIFICACION de tabla tbl8_conocimiento_cliente",
"type": "string",
"dataType": "string"
}
},
{
"typeName": "column",
"guid": "-342",
"attributes": {
"qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#NOMBRE_1",
"name": "NOMBRE_1",
"description": "Columna NOMBRE_1 de tabla tbl8_conocimiento_cliente",
"type": "string",
"dataType": "string"
}
},
{
"typeName": "column",
"guid": "-344",
"attributes": {
"qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#APELLIDO_1",
"name": "APELLIDO_1",
"description": "Columna APELLIDO_1 de tabla tbl8_conocimiento_cliente",
"type": "string",
"dataType": "string"
}
},
{
"typeName": "column",
"guid": "-346",
"attributes": {
"qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#APELLIDO_2",
"name": "APELLIDO_2",
"description": "Columna APELLIDO_2 de tabla tbl8_conocimiento_cliente",
"type": "string",
"dataType": "string"
}
},
{
"typeName": "column",
"guid": "-348",
"attributes": {
"qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#GENERO",
"name": "GENERO",
"description": "Columna GENERO de tabla tbl8_conocimiento_cliente",
"type": "string",
"dataType": "string"
}
},
{
"typeName": "column",
"guid": "-349",
"attributes": {
"qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#EDAD",
"name": "EDAD",
"description": "Columna EDAD de tabla tbl8_conocimiento_cliente",
"type": "string",
"dataType": "string"
}
},
{
"typeName": "column",
"guid": "-350",
"attributes": {
"qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#GENERACION",
"name": "GENERACION",
"description": "Columna GENERACION de tabla tbl8_conocimiento_cliente",
"type": "string",
"dataType": "string"
}
}
]
}
},
{
"typeName": "custom_data_transformation_process",
"guid": "-107",
"attributes": {
"qualifiedName": "linaje_process_from_tbl8_conocimiento_cliente_to_tbl_tmp_clien_conocimiento_cliente_c@yellowbrick_conn",
"name": "linaje_tbl8_conocimiento_cliente_to_tbl_tmp_clien_conocimiento_cliente_c",
"description": "Proceso que conecta TBL8_CONOCIMIENTO_CLIENTE a TBL_TMP_CLIEN_CONOCIMIENTO_CLIENTE_C",
"inputs": [
{
"guid": "-105"
}
],
"outputs": [
{
"guid": "-106"
}
],
"columnMapping": [
"{\"DatasetMapping\": {\"Source\": \"DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn\", \"Sink\": \"DB_DWH_STAG.CLIENTES.TBL_TMP_CLIEN_CONOCIMIENTO_CLIENTE_C@yellowbrick_conn\"}, \"ColumnMapping\": [{\"Source\": \"CUENTA\", \"Sink\": \"CUENTA\"}, {\"Source\": \"TIPO_DOCUMENTO\", \"Sink\": \"TIPO_DOCUMENTO\"}, {\"Source\": \"IDENTIFICACION\", \"Sink\": \"IDENTIFICACION\"}, {\"Source\": \"NOMBRE_1\", \"Sink\": \"NOMBRE_1\"}, {\"Source\": \"APELLIDO_1\", \"Sink\": \"APELLIDO_1\"}, {\"Source\": \"APELLIDO_2\", \"Sink\": \"APELLIDO_2\"}, {\"Source\": \"GENERO\", \"Sink\": \"GENERO\"}, {\"Source\": \"EDAD\", \"Sink\": \"EDAD\"}, {\"Source\": \"GENERACION\", \"Sink\": \"GENERACION\"}]}"
]
}
}
]
1 Reply
- AS1522
Microsoft
I too have this query - testing