Code Analyzer to analyze coding standards beyond default rulesets

By Daiva Vara Kumar P

Recently, we got a requirement to write our own code analyzer (i.e. visual studio extension) to achieve certain coding standards which are not covered in default coding rules by visual studio for our C# coding. These coding standards are required to make our projects to be more readable and maintainable.

Why do we need Coding Standards? Coding standards are very important for any project for code readability and maintainability. We may not feel much difference when we code a small program or module, but imagine if it is a big project. Also, imagine if each developer writes a different style of code, it will mess up the total project and after few months there may be a chance that no one knows what the exact functionality of your code is. Then it creates a problem.

So it’s necessary to maintain coding standards in any organization, it should be documented, circulated and encouraged to use by each and every developer. But, just assume, if no developer is caring about the document, then it’s a big task to the reviewer. To avoid these situations, we need a reviewer who do the code review automatically and alert the developer immediately; which is exactly the “Code Analyzer”.

Research and Evolution: We were searching about what would be the best technology or tool to inject our custom rules and alert the user immediately about violation of a coding standard. Now a days, there are more code analyzers are available in the market for different languages. When it comes to .Net, the major code analyzer is FxCop which was developed by Microsoft. It provides analysis at build-time. But, “Roslyn” provides code analysis APIs to write our own real time code analyzer which alert the developer what is the problem or violation of the coding standard immediately while writing the code.

.Net Compiler Platform (Roslyn):

Roslyn was actually the project by Microsoft to rewrite their Visual Basic and C# compilers and its language services in their respective managed code languages (i.e. Visual Basic was to rewrite in Visual Basic and C# was to rewrite in C#). The .NET Compiler Platform (“Roslyn”) provides a rich set of code analysis APIs. These public APIs are provided to developers to write their own code analysis extensions. In other words, we can build our own code analysis extensions with the same APIs that Microsoft has used to implement their Visual Studio language services.

The .Net compiler platform (Roslyn) mainly consists of two major layers of APIs 1) Compiler APIs and 2) Workspaces APIs

Compiler APIs: Compiler layer always have the information which are exposed at each phase of compiler pipeline and it also contains an immutable snapshot of compiler’s invocation, these includes assembly references, compiler options, and source code files etc., There are two types of Compiler APIs which represent the C# language and the Visual Basic language. See the below

Diagnostic APIs: Generally any language compiler do analysis of a code as a part of their compilation. It produces diagnostics by covering syntax or semantic errors, warnings and other diagnostics information etc., This compiler API layer provides diagnostics through an extensible API which allows us to plug in our own user defined analyzers and diagnostics.

Scripting APIs: These are the APIs which executes code snippets dynamically. For example see the below

var sum = await CSharpScript.EvaluateAsync("5 + 2");

Console.WriteLine(sum); // sum: 7

 

We use compiler APIs to write our own custom analyzers. The most important data structure which is exposed by the compiler APIs is syntax tree and the below are important elements to write our custom analyzers

 

Syntax tree: The syntax tree contains every piece of information of a source code, every grammatical construct, white space, comments and preprocessor directives etc. It represents the syntactic structure of source code. Syntax node: Syntax node is an element of syntax tree, each syntax node is represented by a separate class derived from a major class “SyntaxNode” which contains properties to hold information, for example SyntaxNode about method declaration contains method name in “Identifier property” etc., So it means, Nodes represent syntactic constructs such as declarations, statements and expressions (ex: Method declarations, Property declarations etc.,)

Syntax token: These are the terminals of the language grammar which represents the smallest syntactic fragments of the code. Syntax tokens consists of keywords, identifiers, literals etc.,

Workspaces APIs: Work space APIs provide access to the source code projects and documents in a solution and to their associated syntax trees, compilations and semantic models. It has an immutable snapshot of the projects and documents of a solution in visual studio.

Creating custom analyzer:

As said, we got a requirement to write our own code analyzer (i.e visual studio extension) to achieve coding standards defined for our project, one of the custom rules is that the name of any method should not contain underscore. As mentioned, it is easy to write an extension using Roslyn API. Below are the steps we used to create it and how they works

First step: Select New project, Visual C#, Extensibility then select “Analyzer with Code Fix (NuGet + VSIX). It creates three projects under a single solution, one is major project to implement our custom analyzer code, second one is the project to implement tests and another one is setup project. In major project where we use it to write our analyzer, it creates two important class files 1) DiagnosticAnalyzer 2) CodeFixProvider. The class name in DiagnosticAnalyzer code file would be created with suffix “Analyzer” (like {ProjectName}Analyzer) which implements DiagnosticAnalyzer interface by having [DiagnosticAnalyzer] attribute and the class name in CodeFixProvider code file would be created with suffix “CodeFixProvider” (like {ProjectName}CodeFixProvider) which implements CodeFixProvider interface by having [ExportCodeFixProvider] attribute.

Second step: In DiagnosticAnalyzer class, we need to define DiagnosticId, DiagnosticDescriptor (I.e. custom rule) and other variables like title, message, and description. These values will be displayed on the IDE when this custom rule is applied on particular code snippet. And we need to include this Rule in immutable Array (i.e. like “SupportedDiagnostics” array in below screen).

Third step: Register an action to be executed at completion of analysis of SyntaxNode with a proper kind (Kind is an example of method declaration, property declaration and class declaration etc.,). See the below code for example

 

public override void Initialize(AnalysisContext context)

{

       context.RegisterSyntaxNodeAction(AnalyzeMethodSyntax, SyntaxKind.MethodDeclaration);

}

private void AnalyzeMethodSyntax(SyntaxNodeAnalysisContext context)

{

#region

   var methodDeclaration = (MethodDeclarationSyntax)context.Node;

   string methodName = methodDeclaration.Identifier.Text;

   if (methodName.Contains("_"))

   {

      context.ReportDiagnostic(Diagnostic.Create(UnderscoreRule, methodDeclaration.Identifier.GetLocation(), new string[] { methodName }));

}

#endregion

}

 

As we see above, the SyntaxNode “methodDeclaration” holds all the information about a method like name, span of that method body etc., We check here if the method name contains “_” then report diagnostic. Once we install this extension, we see a green line for any code violation, when we mouseover on the green line it is shows the alert of code violation. It shows like below screen

Writing code fix:

First step: In code fix provider class, we need to include our “DiagnosticId” which is defined in above code snippet in an immutable array and register code fix action like below.

 

private const string title = "Remove underscore";

public sealed override ImmutableArray<string> FixableDiagnosticIds

{

get { return ImmutableArray.Create(roslyn_testAnalyzer.DiagnosticId); }

}

public sealed override FixAllProvider GetFixAllProvider()

{

return WellKnownFixAllProviders.BatchFixer;

}

public sealed override async Task RegisterCodeFixesAsync(CodeFixContext context)

{

var root = await context.Document.GetSyntaxRootAsync(context.CancellationToken).ConfigureAwait(false);

var diagnostic = context.Diagnostics.First();

var diagnosticSpan = diagnostic.Location.SourceSpan;

var declaration = root.FindToken(diagnosticSpan.Start).Parent.AncestorsAndSelf().OfType<MethodDeclarationSyntax>().First();

// Register a code action that will invoke the fix.

context.RegisterCodeFix(

CodeAction.Create(

title: title,

createChangedSolution: c => RemoveUnderscoreAsync(context.Document, declaration, c),

equivalenceKey: title),

diagnostic);

}

 

Second step: Now we need to implement an asynchronous method “RemoveUnderscoreAsync” to identify method name to provide new name and providing new solution like below.

 

private async Task<Solution> RemoveUnderscoreAsync(Document document, MethodDeclarationSyntax typeDecl, CancellationToken cancellationToken)

{

// Compute new uppercase name.

var identifierToken = typeDecl.Identifier;

var newName = identifierToken.Text.Replace("_", "");




// Get the symbol representing the type to be renamed.

var semanticModel = await document.GetSemanticModelAsync(cancellationToken);

var typeSymbol = semanticModel.GetDeclaredSymbol(typeDecl, cancellationToken);

// Produce a new solution that has all references to that type renamed, including the declaration.

var originalSolution = document.Project.Solution;

var optionSet = originalSolution.Workspace.Options;

var newSolution = await Renamer.RenameSymbolAsync(document.Project.Solution, typeSymbol, newName, optionSet, cancellationToken).ConfigureAwait(false);

// Return the new solution with the now-uppercase type name.

return newSolution;

}

 

When we click “Show potential fixes”, it shows the below screen now

If we click on “Preview changes” then it will show the screen to show your code after making the changes by editor (like below). We can click on “Apply” to make changes immediately by editor.

This is a simple way to fix code violations and we don’t need to wait until the build time or the reviewer to review the code.

Code Review Statistics:

Some of the case studies at industry level providing some guide lines about the code reviews, they are indicating that the code review should not be more than 200 – 400 lines of code and should not be more than 60 – 90 minutes at a stretch for a code review, otherwise it would drop off the Defect Density (Defect Density means the number of defects a reviewer can find for a 1000 lines of code). If we take this lines of code guide line for our projects, as per our observation and based on our project complexity, if everything done by manually it would take 30 – 40 minutes for 200 – 400 lines, but with these code analysis extensions, we could approximately reduce 50% (15 – 20 minutes) of actual effort which takes by manual code review.

Coming to code quality, to be honest we can say that every custom rule cannot be automated because of their complexity. So, if we assume that we could automate custom rules up to medium level of complexity and the remaining taken by manual code review (for such a very complex custom rules), we could say that approximately we can achieve 80 – 85% of code quality.

Links referred:

https://joshvarty.wordpress.com/2014/07/11/learn-roslyn-now-part-3-syntax-nodes-and-syntax-tokens/

https://visualstudiomagazine.com/articles/2012/03/20/10-questions-10-answers-on-roslyn.aspx

https://roslyn.codeplex.com/wikipage?title=Overview&referringTitle=Documentation#Compiler

https://github.com/dotnet/roslyn/wiki/How-To-Write-a-C%23-Analyzer-and-Code-Fix

http://www.coderesx.com/roslyn/html/8CA60DCB.htm

https://www.ibm.com/developerworks/rational/library/11-proven-practices-for-peer-review/

Show Buttons
Hide Buttons
error: Content is protected !!