Auditor’s Notes: Semantic Grep & Solidity

Today, we’d like to kick off a series of articles in which we’ll look at new promising tools and learn exactly how to use them, as well as their benefits and drawbacks.

In this series, we will only cover aspects that are relevant to auditing and bug bounty hacking but are not covered elsewhere!

Authors: Nikita Kirillov, officercia.eth

Image: Sudoku + Photomosh

Greetings, dear readers!

First and foremost, we want to start by sincerely thanking the people who made the SolGrep and SemGrep tools, everyone who supports them, and the authors of all the reference materials! And today, dear readers, it will be made available to you!

We tend to believe that there is no one who doubts that the basis of any secure implementation is a special approach to writing code. Consequently, this article will be focused only on those aspects that can be really useful for making your code safe and secure!

Therefore, below you will see not a typical article but a systematization of knowledge (SoK), in which I will rely on authors that I myself trust in this matter and, of course, our auditors!

A few words about our tool, SmartCheck — which can serve as a reinforcement of what we will talk about next. Even in its raw form, it shows good results, and second place in ToB’s article is not bad for a tool we stopped supporting three years ago. However, let’s move on to our today’s topic. Stay tuned!

Make sure to read the rest of the series:

Basics: What is Semantics?

In order to fully understand what we are going to talk about next, I suggest that we begin by explaining some of the terms that we will often use later in the text!

We understand and respect your limited time, therefore we created a specific cheat sheet with nothing superfluous only for you (in the end of the article)!

Semantics: In-Depth

Semantics — the branch of linguistics and logic concerned with meaning. Semantics is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and computer science.

As a scientific discipline, semantics describes the processes a computer follows when executing a program in that specific language. This can be shown by describing the relationship between the input and output of a program, or an explanation of how the program will be executed on a certain platform, hence creating a model of computation.

Grep (Global Regular Expression Print) Explained

Grep (Global Regular Expression Print) — is a command-line utility for searching plain-text data sets for lines that match a regular expression.

To put it another way, it is a tool (in a broad sense: method) for identifying certain patterns in a given data and uses regular expressions (RegEx) for it.

The word grep is used as a verb to mean, find or search for a string. Often this implies searching for a fixed string. But grep (Global Regular Expression Print) as the command acronym indicates, searches for a regular expression.

It’s a highly versatile *nix command to match strings. See Computerphile: Where GREP Came From for a brief history:

A Regular Expression (or Regex) Explained

A Regular Expression (or Regex) — is a pattern (or filter) that describes a set of strings (in a broad sense: a language for describing text patterns) that matches the pattern.

Web2 Example

For a better understanding, consider how we would proceed if we were to solve the following problem: Create a rule that searches for specific emails in the given database.

1 — Get input data (.txt for example)

2 — Get a proper grep utility (which can search in .txt)

3 — Write a rule (Regular expression).

4 — Search!

Web3 (Solidity) Example

For a better understanding of this topic, consider how we would proceed if we were to solve another problem: Create a rule that searches for specific patterns in the certain Solidity code.

1 — Take the Solidity contract and make sure that it compiles correctly.

2 — Take a suitable grep utility that can work with Solidity code. Keep in mind that the grep utility usually checks the AST. Do not forget that, as a result, your solution should make writing rules easier and more adaptable.

3 — Write a rule (Regular expression).

4 — Search!

How does a compiler actually work?

First of all, we must note that compilers essentially take text, parse and process it, then turn it into binary for your computer to read. This keeps you from having to manually write binary for your computer, and furthermore, allows you to write complex programs easier.

In other words, the compiler converts high-level source code to low-level code. Then, the target machine executes low-level code. The compilation process consists of several phases:

  1. Lexical analysis

  2. Syntax analysis

  3. Semantic analysis

  4. Intermediate code generation

  5. Optimization

  6. Machine Code generation

To proceed to the next topic, we must first understand how this occurs in Solidity and, as a result, how we can use it to audit and write code safely. It’s critical to understand the specifics of the process we’ll go over below.

What does the compiler actually do to Solidity code?

We can roughly break down the entire process into the following three phases:

a) Splits code into tokens

b) Analyzes syntax

c) Constructs an AST (abstract syntax tree)

What does the compiler do with AST?

Here we can also divide the entire process into three phases, which are listed below:

a) Analyzes semantics (at this point compiler errors are exposed/derived!)

b) Optimizes AST (that’s what runs value in the project’s config is specified for — check out this example!)

c) Generates bytecode!

Compiler Performance Examples

Here is a simple example of some of the compiler’s phases:

Greps & Solidity: Solutions

Auditors could not have missed such an intriguing approach for long... The good results in the search for bugs work in its favor as well.

So, what kind of grep utilities are available for Solidity right now? Let’s check them out:

SolGrep Review

So you have a set of smart contracts and want to find all contracts that have a public method named withdrawEth but lexical grep yields a lot of false-positives? Here's where solgrep can help:

  • Advantages: Written specifically for Solidity; Easy to use “out of the box”; Has a list of basic rules for contracts.

  • Disadvantages: This tool is not that useful by default; You also need to write your own rules in JS; Repository hasn’t been updated for a long time.

Solgrep recursively finds smart contracts in a target directory, parses the source units to understand the language semantics, and makes this information available to a powerful javascript-based filter function.

This way, you can:

  • Extract semantic information from solidity source code based on custom filter functions;

  • Find target contracts based on a custom filter script you define;

  • Create & run your own or built-in rules (e.g. for CI checks);

  • Crunch numbers and generate statistics from a code base;

  • Find doppelgangers! i.e. duplicate contracts sharing the same code structure (AST_EXACT and AST_FUZZYmatching)

It uses grep the following way:

  • By function name;

  • Function calls;

  • By contract name;

  • By everything located in the AST. It also searches via custom patterns (by creating a separate library).

Author: tintinweb

SemGrep Review

  • Advantages: Being updated frequently; It has a lot of interfaces, a lot of languages, a lot of features.

  • Disadvantages: Partial (limited) Solidity support (no restrictions noted by this moment); No basic (built-in) rules.

In short, SemGrep is a Lightweight static analysis tool for many languages. With using it, you can find bug variants with patterns that look like source code

Check out:

Here’s where SemGrep can help:

  • It performs a grep search (by writing custom patterns);

  • In essence, it is similar to the tool we discussed above — SolGrep. Roughly speaking, it’s the same as SolGrep, only with ready-made (built-in) rule-creation patterns.

For beginners, it is recommended to start with the Semgrep Cloud Platform because it provides a visual interface, a demo project, result triaging and exploration workflows, and makes setup in CI/CD fast. Scans are still local and code isn’t uploaded.

Alternatively, you can also start with the CLI without logging in and navigate the terminal output to run one-off searches.

How can all this be useful to an auditor?

First of all, it should be noted that this is just one more tool in our toolbox of working methods.

However, we are pessimistic about the use of AI in development at all stages (such tools cannot be 100% trusted) particularly when working with RegEx, but early feedback indicates that it can significantly ease life!

With it, we haven’t yet noticed any clear applications for grep-tools in our audits, but if you do, please share your experiences in the article’s comments! Following our research, it appears that we can use it to the following:

  • For writing various detectors (detectors that cannot be implemented with using a Slither tool);

  • To perform an in-depth search for syntactic constructions (You can use this when performing an initial project inspection. For instance, to quickly determine how many external functions, libraries, and so on are used in a given project).

We recently conducted some research on this subject and came to some interesting conclusions… So, in the coming articles, we’ll look into this as well; stay tuned!

Resources & References




We hope that this article was informative and useful for you! Thank you for reading! What instruments should we review? What would you be interested in reading about?

Authors: Nikita Kirillov, officercia.eth

We promise you that we’ll be posting a lot of interesting stuff soon! Make sure to read the rest of the series:

Support is very important to me, with it I can do what I love — educating users!

If you want to support my work, you can send me a donation to the address:

Stay Safe!

Subscribe to Officer's Blog
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
This entry has been permanently stored onchain and signed by its creator.