JavalibSawja

Nit: Nullability Inference Tool

Presentation

Nit is infers nullness properties of variables and can prove that a program is NullPointerException free. It allows to find suitable nullability annotations for fields, method parameters, return values and local variables. It works at the bytecode level (on .class files or .jar files) so it can be used on programs where the source is not available. While this can look strange for a programmer, this tool can also be used by other static analyses to improve their precision (through annotation). The main improvement over the last version of Nit (website of old version) is that instead of using directly the Bytecode representation, it benefits of the Sawja JBir representation. It leads to more simple and easy to maintain code. Moreover it drastically reduces the time and memory consumption of an analysis.

Licence

This software is distributed under the GNU General Public License.

News

October 2013
Nit 0.6 is out. This is a rewriting of Nit with the facilities of Sawja
September 2009
Nit 0.5d is out. This is a bugfix release. Check the CHANGELOG file for all the details.
June 2009
Nit and the brand new Eclipse plugin are at JavaOne, at the INRIA booth.
June 2009
An Eclipse plugin is out. You can find it in the Nit/Eclipse section of the website.
May 2009
Nit 0.5 is out. The major improvements are:
  • the use of JavaLib 1.8 which improves the performances and the precision (call graph computed with RTA instead of CHA);
  • the addition of an output format compatible with the Annotation File Utilities (http://types.cs.washington.edu/annotation-file-utilities/), which should allow to export computed annotations back into the Java source code;
  • the compatibility with Cygwin, which allows to compile and use Nit on Microsoft Windows.
Check the CHANGELOG file for all the details.
October 2008
Nit 0.4 is out. The major improvement is in the use of JavaLib 1.7 which improves performances by more than 7% and reduce memory consumption. Some improvements have also been made on precision. Check the CHANGELOG file for all the details.
August 2008
Nit 0.3 is out. The precision has been greatly improved: the new must-alias analysis is more powerful and a new additional analysis allows to recover information from the instranceof instruction (do not worry if you do not understand: it just means it is more precise). The other good news: it uses even less memory in the general case.
June 2008
A brand new web site with a lot more than before in the documentation section.
June 2008
Nit 0.2 is out. This version contains much more documentation (which is easy as the previous ones were almost undocumented) and a bug found by Fausto Spoto in the statistics computation has been fixed.
May 2008
Nit 0.1.2 is out. It is a bug-fix version.
May 2008
Nit 0.1 is out. It is the first version publicly available.

Installation

Using OPAM

A package for OPAM is provided, you should only have to get it using:

opam install nit

Compilation from source

User documentation

Command line options

Usage: nit [options ... ] --main <string>

nit infers nullness properties for the reference variables of a Java program and can therefore be used to infer null-ability annotations for fields, local variables, methods parameters and return values.

It takes options but no argument and analyzes the program starting from the given main method (with --main option). At least one output option must be given.

Options:

--classpath <string> It takes a list of directories and .jar files concatenated with ':' (or ';' under Windows) where needed classes (including boot strap classes) will be looked for. It should therefore contain the path of your runtime (or bootclasspath in Java). If no --classpath option is given, then the environment variable CLASSPATH is used.
--main <string> Set the main method of the program. It takes as argument a class name (such as foo.Bar) optionally followed by a method signature in the JVM (internal) syntax (such as '.m(IIL)V'). If no method signature is given, the default is '.main([Ljava/lang/String;)V', which corresponds to the void main(String[]) method. This option must be specified at least once and can be specified more than once.
--heap-input <file> Try to use as an heap dump file and init the Nit initial state with the information from this file.
--html-output <directory> Outputs the result as HTML code in the given directory (including annotated opcodes).
--xml-output <f.xml> Output the result as XML in the given file (excluding opcodes, i.e. only method and field annotations).
--anno-file-output <f.jaif> Output the result (only method and field annotations and only Nullable/NonNull, ie no raw annotations) in the given file in a format suitable for use with the Annotation File Utilities (http://types.cs.washington.edu/annotation-file-util ities/).
--warnings Output warning for each unsafe dereference on the standard output (stdout).
--version Print version information and exit.
--no-others-entryPoints Dont add any implicit entrypoints (internal to the JVM).
--debug Print every debugging information.
--verbose Print some debugging information.
-v Print some debugging information.
--no-compact Avoid compacting the result of intermediate analyses (It can sometimes save time but always costs memory).
--remove-package <string> It takes a comma separated list of package names (e.g. "java,sun" or "java.lang") and removes all classes in these packages from the output. This option can be specified multiple times. Note: these packages are still analyzed, the option just removes the information concerning these packages from the output.
--class-analysis {RTA|CRA} Either RTA or CRA (RTA is the default option). It selects the analysis used to load the program (RTA is much more precise than CRA but is also very sensitive to native methods and may therefore produce incomplete programs) (the type system then adds co-variant and contra-variant constraints on types so it does not change the analysis).
--no-field-annotation Deactivate field annotations. It reduces the precision of the analysis without significantly improving time and memory consumption.
--no-method-annotation Deactivate method annotations. It reduces the precision of the analysis without significantly improving time and memory consumption.
--unsafe-exception Annotate caught exception object as if they were always initialized (NonNull) objects. It improves the results but it is incorrect (as it is possible to throw the current object this in a constructor).
--unsafe-native Annotate native methods as if they were always returning initialized objects (NullableInit). It improves the results but it is incorrect.
--unsafe-static Annotate static fields as if no class initializer could initialize a static field to not-fully initialized value, i.e to a Raw value. It improves the results but it is incorrect.
--unsafe-array Annotate arrays as if none of their cells were ever assigned a Raw value. It improves the results but it is incorrect.
--aggressive-native Annotate native methods as if their return value were always NonNull. It improves the results but it is incorrect.
--aggressive-array Annotate arrays as if all their cell were cells always NonNull. It improves the results but it is incorrect.
--aggressive-static Annotate static fields as if they were always initialized by their class initializer with a NonNull value. It improves the results but it is incorrect.
--aggressive-field Annotate instance fields as if they were always NonNull. It improves the results but it is incorrect.
--aggressive-parameter Annotate method parameters as if they were always NonNull. It improves the results but it is incorrect.
--aggressive-return Annotate return value of methods as if they were always NonNull. It improves the results but it is incorrect.
-help Display this list of options
--help Display this list of options

Output formats

XML output format

In the following grammar, the star (*) means that the previous token is present any number of times (including zero). The question mark (?) means that the previous token is optional. The plus sign (+) means that the previous token is present any number of times and at least once. The parentheses are not tokens, they are here to group tokens together so the following star, question mark or plus applies to the list of tokens.

PROGRAM ::= <nit>CLASS*</nit>
CLASS ::= <class name="NAME">FIELD* METHOD* WARNING*</class>
FIELD ::= <field descriptor="NAME:TYPE">VALUE</field>
WARNING ::= <warning line="INTEGER">WARNTYPE</warning>
WARNTYPE ::= <getfield signature="NAME:TYPE" info="STRING" />
| <putfield signature="NAME:TYPE" info="STRING" />
| <invoke signature="NAME:(TYPE*)RETURN" info="STRING" />
| <arraylength info="STRING" />
| <vaload signature="TYPE" info="STRING" />
| <vastore signature="TYPE" info="STRING" />
METHOD ::= <method descriptor="name:(TYPE*)RETURN">
(<arguments>(VALUE*)</arguments>)?
(<return>VALUE</return>)?
(<this>THIS</this><post>THIS</post)?
THIS ::= <AllDef /> | <Top /> | <Def field="package.class.fs:type" />+
VALUE ::= <NonNull /> | <NullableInit /> | <Nullable /> | <Raw class="package.class" /> | <Raw>

< and > in method names are respectively replaced by ^ and $ in the XML output. Method and field descriptors are composed of the name of the method or field, respectively, and their "type" (or descriptor) as documented in the JVM Specification (Field Descriptors, Method Descriptors).

Annotation File format (produced with option --anno-file-output))

Jaime Quinonez has developed an annotation file format and a tool suite (which can be found at http://types.cs.washington.edu/annotation-file-utilities/) that allows to extract annotations from Java source file or Java bytecode (.class) files to this file format and to insert annotations for a such file into Java source or Java bytecode files. This tool suite is supposed to work on the annotations defined in the JSR 308, which should be part of Java 7. It now seems to be maintained by Michael Ernst.

HTML output formats

The text output is in the JBir intermediate language and should be human readable (it is easier than the bytecode).

Field declarations are annotated with annotation starting with @ written just after the field name. Method annotations are written after the field signature. They include the list of parameters, including the self reference in the case of a virtual method. One-word non-reference type are annotated with NonNull, two-word non-reference type are annotated with two NonNull annotations.

Instruction are also annotated with the flow-sensitive information computed by the analysis:

In addition to those annotations computed by the analysis (the real purpose of the program) and to the information javap -verbose -private usually outputs, there are also some other information outputted to help to navigate in the program and to understand the computed annotations. Their name should be sufficiently meaningful to be understandable without further documentation. Here is the kind of information added for each kind of data.

DataInformation added
Class The direct sub-classes
Interface The direct implementations (sub-classes) and direct sub-interfaces.
Non-abstract method The interfaces that define this method and the classes in which this method is overridden.
Abstract method The class or interface that define a method that this method definition overrides, the class or interface that overrides this method definition and the class that implements this method.
Statistics output format

Statistics can be made on the result of the analysis via option --stats. It counts dereferences in field reads (getfield), field writes (putfield) and virtual method calls (invokevirtual, invokeinterface and invokespecial) of each type (@Nullable, @NonNull, @Raw). It also counts annotations of each type for fields, method formal parameters and return values (including static methods) if the Java type is of type reference (object or array) and if the code is not proved dead (field actually read, method actually called).

The output format is the CSV format where the first line is the caption and the second line contains the data. The column separator is the ';' character. The title of each column is explained here.

Field nameDescription
Name Gives the name of the main class of the program that has been analyzed.
Free Unused.
RAW Fields Number of fields annotated as Raw.
MBN fields Number of fields annotated as Nullable.
NN fields Number of fields annotated as NonNull.
total invokes Total number of method calls taken into account (reachable invokevirtual, invokespecial, invokeinterface and invokestatic).
Raw invokes Number of virtual method calls on reference annotated as Raw.
NN invokes Number of virtual method calls on reference annotated as NonNull.
MBN invokes Number of virtual method calls on reference annotated as Nullable.
invokes Raw Number of method calls that return values annotated as Raw.
invokes NN Number of method calls that return values annotated as NonNull.
invokes MBN Number of method calls that return values annotated as Nullable.
total arguments Total number of actual arguments taken into account (i.e. value of type reference that have been passed as arguments of method calls taken in account). Note: Total arguments = Raw arguments + MBN arguments)
Raw arguments Number of arguments that are annotated as Raw.
MBN arguments Number of arguments that are annotated as Nullable.
total getfields Total number of field readings taken into account. Note: Total getfields = getfields on MBN + getfields on Raw + getfields on NN.
getfields on MBN Number of field reading on values annotated as Nullable.
getfields on Raw Number of field readings on values annotated as Raw.
getfields of MBN Number of field readings that have return values annotated as Nullable.
getfields of RAW Number of field readings that have return values annotated as Raw.
getfields of NN Number of field readings that have return values annotated as NonNull.
total putfields Total number of field writings that have been taken into account. Note: Total putfields = putfields on MBN + putfields on Raw + putfields on NN.
putfields on MBN Number of field writings on values annotated as Nullable.
putfields on RAW Number of field writings on values annotated as Raw.
putfields of MBN Number of field writings where written values are annotated as Nullable.
putfields of RAW Number of field writings where written values are annotated as Raw.
putfields of NN Number of field writings where written values are annotated as NonNull.
total parameters Total number of formal parameters taken into account (i.e. formal parameters of type reference of methods (virtual or static) that have been taken into account).Note: Total parameters = MBN parameters + Raw parameters + NN parameters.
MBN parameters Number of formal parameters annotated as Nullable.
Raw parameters Number of formal parameters annotated as Raw.
total return Total number of return annotations taken into account (i.e. one for each method taken in account if the return type of the method is of type reference). Note: Total return = MBN return + NN return.
MBN return Number of returns annotated as Nullable.
Raw return Number of returns annotated as Raw.
MBN ArrayLoad Number of array loads on array references annotated as Nullable.
NN ArrayLoad Number of array loads on array references annotated as NonNull or Raw.
MBN ArrayStore Number of array stores on array references annotated as Nullable.
NN ArrayStore Number of array stores on array references annotated as NonNull or Raw.
MBN ArrayLength Number of arraylength instructions on array references annotated as Nullable.
NN ArrayLoad Number of arraylength instructions on array references annotated as NonNull or Raw.
Mem usage (MB) Maximum quantity of memory used by the process during its whole life in megabytes.
Time (s) Quantity of CPU time used by or for the process (user time and system time) in seconds.
Version A string with the compile-time options that were used when building the program and the command line arguments given for that execution. (The version is to identify the statistics, not the software version.)
Last modification: