Today, we’d like to kick off a series of articles in which we’ll look at new promising tools and learn exactly how to use them, as well as their benefits and drawbacks.
In this series, we will only cover aspects that are relevant to auditing and bug bounty hacking but are not covered elsewhere!
First and foremost, we want to start by sincerely thanking the people who made the SolGrep and SemGrep tools, everyone who supports them, and the authors of all the reference materials! And today, dear readers, it will be made available to you!
We tend to believe that there is no one who doubts that the basis of any secure implementation is a special approach to writing code. Consequently, this article will be focused only on those aspects that can be really useful for making your code safe and secure!
Therefore, below you will see not a typical article but a systematization of knowledge (SoK), in which I will rely on authors that I myself trust in this matter and, of course, our pessimistic.io auditors!
A few words about our tool, SmartCheck — which can serve as a reinforcement of what we will talk about next. Even in its raw form, it shows good results, and second place in ToB’s article is not bad for a tool we stopped supporting three years ago. However, let’s move on to our today’s topic. Stay tuned!
Make sure to read the rest of the series:
In order to fully understand what we are going to talk about next, I suggest that we begin by explaining some of the terms that we will often use later in the text!
We understand and respect your limited time, therefore we created a specific cheat sheet with nothing superfluous only for you (in the end of the article)!
Semantics — the branch of linguistics and logic concerned with meaning. Semantics is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and computer science.
As a scientific discipline, semantics describes the processes a computer follows when executing a program in that specific language. This can be shown by describing the relationship between the input and output of a program, or an explanation of how the program will be executed on a certain platform, hence creating a model of computation.
Grep (Global Regular Expression Print) — is a command-line utility for searching plain-text data sets for lines that match a regular expression.
To put it another way, it is a tool (in a broad sense: method) for identifying certain patterns in a given data and uses regular expressions (RegEx) for it.
The word grep is used as a verb to mean, find or search for a string. Often this implies searching for a fixed string. But
grep (Global Regular Expression Print) as the command acronym indicates, searches for a regular expression.
It’s a highly versatile *nix command to match strings. See Computerphile: Where GREP Came From for a brief history:
For a better understanding, consider how we would proceed if we were to solve the following problem: Create a rule that searches for specific emails in the given database.
1 — Get input data (.txt for example)
2 — Get a proper grep utility (which can search in .txt)
3 — Write a rule (Regular expression).
4 — Search!
For a better understanding of this topic, consider how we would proceed if we were to solve another problem: Create a rule that searches for specific patterns in the certain Solidity code.
1 — Take the Solidity contract and make sure that it compiles correctly.
2 — Take a suitable grep utility that can work with Solidity code. Keep in mind that the grep utility usually checks the AST. Do not forget that, as a result, your solution should make writing rules easier and more adaptable.
3 — Write a rule (Regular expression).
4 — Search!
First of all, we must note that compilers essentially take text, parse and process it, then turn it into binary for your computer to read. This keeps you from having to manually write binary for your computer, and furthermore, allows you to write complex programs easier.
In other words, the compiler converts high-level source code to low-level code. Then, the target machine executes low-level code. The compilation process consists of several phases:
Intermediate code generation
Machine Code generation
To proceed to the next topic, we must first understand how this occurs in Solidity and, as a result, how we can use it to audit and write code safely. It’s critical to understand the specifics of the process we’ll go over below.
We can roughly break down the entire process into the following three phases:
a) Splits code into tokens
b) Analyzes syntax
c) Constructs an AST (abstract syntax tree)
Here we can also divide the entire process into three phases, which are listed below:
a) Analyzes semantics (at this point compiler errors are exposed/derived!)
c) Generates bytecode!
Here is a simple example of some of the compiler’s phases:
Auditors could not have missed such an intriguing approach for long... The good results in the search for bugs work in its favor as well.
So, what kind of grep utilities are available for Solidity right now? Let’s check them out:
Advantages: Written specifically for Solidity; Easy to use “out of the box”; Has a list of basic rules for contracts.
Disadvantages: This tool is not that useful by default; You also need to write your own rules in JS; Repository hasn’t been updated for a long time.
This way, you can:
Extract semantic information from solidity source code based on custom filter functions;
Find target contracts based on a custom filter script you define;
Create & run your own or built-in rules (e.g. for CI checks);
Crunch numbers and generate statistics from a code base;
Find doppelgangers! i.e. duplicate contracts sharing the same code structure (AST_EXACT and AST_FUZZYmatching)
It uses grep the following way:
By function name;
By contract name;
By everything located in the AST. It also searches via custom patterns (by creating a separate library).
Advantages: Being updated frequently; It has a lot of interfaces, a lot of languages, a lot of features.
Disadvantages: Partial (limited) Solidity support (no restrictions noted by this moment); No basic (built-in) rules.
In short, SemGrep is a Lightweight static analysis tool for many languages. With using it, you can find bug variants with patterns that look like source code
Here’s where SemGrep can help:
It performs a grep search (by writing custom patterns);
For beginners, it is recommended to start with the Semgrep Cloud Platform because it provides a visual interface, a demo project, result triaging and exploration workflows, and makes setup in CI/CD fast. Scans are still local and code isn’t uploaded.
Alternatively, you can also start with the CLI without logging in and navigate the terminal output to run one-off searches.
However, we are pessimistic about the use of AI in development at all stages (such tools cannot be 100% trusted) particularly when working with RegEx, but early feedback indicates that it can significantly ease life!
With it, we haven’t yet noticed any clear applications for grep-tools in our audits, but if you do, please share your experiences in the article’s comments! Following our research, it appears that we can use it to the following:
For writing various detectors (detectors that cannot be implemented with using a Slither tool);
To perform an in-depth search for syntactic constructions (You can use this when performing an initial project inspection. For instance, to quickly determine how many external functions, libraries, and so on are used in a given project).
Solregex — A Regex compilation to Solidity
We hope that this article was informative and useful for you! Thank you for reading! What instruments should we review? What would you be interested in reading about?
We promise you that we’ll be posting a lot of interesting stuff soon! Make sure to read the rest of the series:
Support is very important to me, with it I can do what I love — educating users!
If you want to support my work, you can send me a donation to the address:
4AhpUrDtfVSWZMJcRMJkZoPwDSdVG6puYBE3ajQABQo6T533cVvx5vJRc5fX7sktJe67mXu1CcDmr7orn1CrGrqsT3ptfds — Monero XMR