After college, Sanket co-founded DoSelect where I joined as the first engineer. Both of us have been contributing to open-source projects for a few years then. In the beginning, we didn’t have any processes setup around code reviews. We had some IDE plugins to run the linters, and some team members used them as pre-commit hooks. We didn’t have any tests back then and used to spend too much time on some pull requests pointing out improvements and if the pull request was very large, we never reviewed it — direct merge. Then the engineering team started to grow, multiple folks started contributing to the same repositories and pull requests were often stuck for 5-7 days without any activity. To make sure the new commits are free of the common issues, we added multiple static analysis tools as part of our CI jobs. This became a pain sooner than expected as they were throwing hundreds of lines of logs in the CI and we had to fight through duplicate issues. Critical issues were hidden amongst other minor issues and false-positives, and often missed. Once a while, we tweaked the linter config files with the issues that didn’t make sense to us — to reduce noise in the CI logs. It didn’t work out after a while and we invested in a couple of commercial code quality tools but ended up disabling them as well. Their issues weren’t categorized or prioritized, analyzers were never updated with new rules, didn’t have any way to report false-positives.
We came across a paper — Lessons from building static analysis at Google . It is a beautiful paper with the following insights: 1) Static analysis authors should focus on the developer and listen to their feedback 2) Careful developer workflow integration is key for static analysis tool adoption 3) Static analysis tools can scale by crowdsourcing analysis development.
We started building DeepSource in December 2018. The initial release supported Python and integrated with GitHub. Our approach was to first curate all the issues available from open-source static analysis tools, de-duplicate them, add better descriptions with external reference links — so you just add python analyzer to the `.deepsource.toml` file with some metadata (version, test patterns, exclude patterns,.) and analysis will run on every commit and pull request. To cut down the noise, we only show you newly introduced issues in the pull-request by default, based on the changeset — and not all the issues present in the changed files. We also provide a way for you to report false-positive issues directly from the dashboard. If the report is valid, we update the analyzers to resolve it within 48 - 72 hours. After this release, we started writing our own rules by walking through the Abstract Syntax Tree to find patterns. So far, we’ve 520+ types of issues in the Python analyzer. Some of the custom issues we added recently are: File opened without the `with` statement, using `yield` in comprehension instead of a generator expression, use items() to iterate over a dictionary.
Lately, we realized some of the issues were occurring in tens of files. Though DeepSource reports them, one had to manually fix all the occurrences. We just released autofix support in Python for 15 most commonly occurring issues to start with. Autofix uses Concrete Syntax Tree to visit the issue location and make modifications in the code for which the issue is raised, and then generate a patch for that modification. When an autofix is available for an issue, you can view the suggested patch and on approval, a pull request will be created with the fixes. We're working on improving the coverage of issues we can autofix across the analyzers we support.
We would love to hear your experience using these tools and feedback/suggestions on how can we improve! Please let us know in the comments. We’re also at founders [at] deepsource.io.