Help explaining why you don't just enter whatever you want in a DB (This should not be necessary...)

Copper Contributor

Hi everyone. I am not a DB admin, I'm a cybersecurity and risk management professional with a broad technical background and one college DB class a couple years ago. Let me start by saying I'm sorry I even have to ask this, I've never had this much trouble explaining what I thought was a basic concept.

 

Quick version: How do I explain to a Program Manager(ex-programmer 20 years ago) and managers with a mechanical engineering background, that you can't just use the hierarchical Assemblies & Parts tables, specifically AssemblyName and PartID columns, for employee names, location names, or other random data you want to store, instead of the tables & columns that exist for that purpose and expect to be able to use the data in any meaningful way later?

 

It's not like any of these people would look at a web form and see "Name" and just put in their favorite color or something...I hope.) My honest initial response to them was, "If you don't understand why you shouldn't do that, I'm not sure I know how to explain it;" which apparently hurt the PM's feelings :cry:

 

 

More detail if you want it:

The company I started working for has an manufacturing/inventory management DB (eci M1) that they have been using for less than a year. The person "admin" who is actually a Project Manager who just happens to be the one who pushed for this program and the only one who had experience using it, has been using columns within the Jobs and Parts tables to add his own hierarchies by the engineer's name, site the part is for, and other random information he wants to store that has nothing to do with the column-name it is stored in. He's also been teaching other engineers to do this when preparing their proposals and parts lists.

 

I learned all of this working on a side task to help one of our other PMs develop a procurement request process using the same DB, at which time I suggested that this was a bad idea if we planned to expand our use of the DB to accurately track inventory (or anything really), take advantage of business analytics built into the program, and be able to separate tracking and permissions for different customers' data. But I can't seem to get through to the PM who runs it, my manager, or his boss. They just keep asking for specific examples of what issues it could cause; but, I don't know because we haven't started using it enough yet to see the impact on resources or reporting. I've tried to tell them about Normalization and explain that my recommendations are based on DB standards and Best Practices from multiple sources, but they just ask "well why is it Best Practice" or one of my least favorite saying ever that the PM used, "I did it this way for 15 years and never had an issue (he was also a programmer...20 years ago)."

 

How, what the, I, um, no, aaargh??? (Start looking for a new job would be an acceptable response if you've got nothing else.)

 

1 Reply

@Startzc if it's personal data including employee names etc, then data protection law applies.  Personal identity data is to be used only when and how it is necessary - not just convenient but absolutely required - for the purpose of the business.  There's also stuff about fairness which also has the force of law.  The business also may have its own rules about how to manage personnel records, that are broken by putting the data in this database, where it doesn't really belong.

 

More generally, article https://en.wikipedia.org/wiki/Column_(database) is an elementary description of how and why a database field contains a particular type of data value, such as an address "zipcode".  You might want to look up something that goes into more detail on the subject.  Basically the point is that every person and every program would expect that data column to contain that type of data value and nothing else, and would expect to find that data in that column and not elsewhere, and if that isn't respected, your data has departed from being valid and consistent.

 

In actual practice, it may be tolerable for a certain data field to be used to store more than one kind of information, when software can be used to distinguish one thing from another.  But when doing things properly, a data table should have two separate columns for different types of data.  Or else, why have a database at all, and not just write things out anyhow?

 

Some of the greatest "database" disasters on a personal scale have been created by storing data in a spreadsheet calculation program like Microsoft Excel, and then not looking after it.  You would find terrifying stories about that.  One that happens over and over again is to carefully build a table of data rows in the spreadsheet, and then someone decides to sort the data in order which Excel does automatically...  but some of the columns are included in the sorting and some are not, so that the column values in one row of the spreadsheet now don't belong together.  This comes in two versions - when somebody notices that they have just irreversibly vandalised their own database...  or, they don't notice it, until much later.

 

As another example, our contacts database had a field for telephone number, so we expected that to be a sequence of digits.  But users put numbers in with place names, or they put in two telephone numbers for a contact "123122 or 125414", or they put an e-mail address in instead.  When we tried to send telephone messages automatically from the database, the process was chaotic.

 

So we gave them another field to store an e-mail address properly, and then we found it was being used for web site addresses instead.  Chaos again.