The Importance of Comments in Data Projects
Published Jul 21 2020 06:26 AM 2,267 Views

Project management, scientific experimentation and software engineering all have at least one component in comment: documentation. Without the basic concept of transferring the knowledge of a given operation from the author to the reader, projects of any nature are doomed to become a maintenance issue, with potentially devastating results. 


In Data Projects, we have an interesting issue with this documentation. Whilst the project plans, software specifications and so on are well-defined and mostly consistent in nature and delivery method (such as a Microsoft Word document), comments within the code for a given component are not. Different languages, platforms and other constructs make consistency more challenging. This can become a huge issue when the calling or receiving component needs to rely on the operation of the other component. 


To state the obvious: At the very least, you should comment your code with complete, informative information. It's up to you to understand how your language or compiler uses comments, and you will also have to learn how other popular languages use comments since you may need to read source code from your team.


When I learned to program (on a Mainframe, several hundred years ago) I was taught to write comments detailing the flow of the program first, and then go lay in my code underneath the comments I wrote. "Comment-First" coding. 


Depending on the language/interpreter, there are (usually) two types of comments: Line and Block. A Line comment is indicated by some set of symbols (such as -- in T-SQL), and is terminated with the end of the line. A Block comment uses different symbols to "start" and "stop" comment text (such as /* and */ in T-SQL), and can span multiple lines. 


In general, always prefer Block comments to Line comments. The reason is that lines of text often have different ASCII characters to signal the "EOL" or End of Line for a given software/hardware environment - Linux and Windows terminators for instance. Take, for example, this unfortunate comment: 


-- Whatever you do, do not run 


-- On this code!


(Yes, I've something just like this) If the -- at the start of the line is removed for the middle component by some accident, you can see that would have a tragic result. I recommend the comment be changed to this: 


/* Whatever you do, do not run TRUNCATE TABLE 

 On this code!



Or even


/* Whatever you do, do not run TRUNCATE TABLE  On this code! */


That way you'll get a syntax error alerting you to an issue if you leave out the start or end comment symbols. 


As an aside, each language may handle these comments differently, so make sure you understand how they work, or are even stored. For instance, in some SQL dialects, starting a Stored Procedure with a comment may not save the comment in the Stored Procedure definition (although if you keep the source code it's there of course). For instance, this: 


/* Let's Create a Procedure to deal with that return data: */




Might be different when you call to view the text of the Stored Procedure than this: 



/* Let's Create a Procedure to deal with that return data: */



So what is a "Good" Comment? Well, since I am "old-school", my comments at the start of the code looks like this: 


/* <MyObjectOrFileName>

Purpose: <PurposeOf Code>

Author: <AuthorName>

Date Created: <DateCodeOriginallyCreated>






/* <Code SegmentComment>  */


/* EOF <MyObjectOrFileName>*/


In fact, for Transact-SQL code, I use this handy tip from my friend Dr. Greg Low to make text that a default Query Window in SQL Server Management Studio.


Other tools have similar constructs, or you can just paste that in OneNote to use.


Is all this a bit much? Yes. Until you need it. Also, coding my comments makes me think more about what I am doing, and slows me down a bit to put higher quality into my work.


There is an interesting new development in Data Projects: Notebooks. I use Jupyter Notebooks quite a bit in Data Science work. Jupyter Notebooks have "Cells" that allow you to enter either Code or Text. The text is usually longer, can be formatted, have links and graphics, and can be quite descriptive. In a way, it's like a hyper set of comments. So are comments still needed in the Code cells? 


Like most Data Project questions, the answer is "it depends". If the Notebook itself is a code artifact, the Code Cells do not need to be further annotated - that's the point of the text. If, however, the code in a Cell can be "extracted" for use in some other way, or the Text Cell is used to explain the purpose but not the code flow, then yes, comments are still needed. 


So stick to the basics in your software engineering and Data Science work, and ensure you comment your code. As I was taught early on, "Pretend that the person that will maintain your code is a very easily triggered person, and knows where you live." That's good advice.  

Version history
Last update:
‎Jul 21 2020 06:56 AM
Updated by: