Mito (YC S20) – Edit a spreadsheet, generate Python

Hiya HN, I'm Nate, cofounder of Mito (https://trymito.io) with my best friends Jake and Aaron. Mito is a spreadsheet UI that runs inside a Jupyter Notebook. Each time you edit the spreadsheet, it generates Python code for that edit. This allows analysts to write Python scripts using an interface they are familiar with, instead of waiting months for eng resources.

Mito is open core: http://github.com/mito-ds/monorepo. Our docs are at http://docs.trymito.io, and you can download it here: https://docs.trymito.io/getting-started/installing-mito.

Most people doing data analysis in Python struggle to just write basic Python. If you search StackOverflow for the [pandas] tag, you’ll find pandas users wrestling with everything from “how can I make a pivot table?” to “how do I import from another folder?” These users are experts in their field — they just aren’t experts in Python. Tasks that take them seconds in spreadsheets can end up taking them days. (Here’s how we put it to investors: the next 10 million Python programmers are transitioning from Excel and have one real problem: writing the damn code.) A lot of organizations are stuck on this dilemma: they want to move from spreadsheets to Python, but getting started with programming—even with a highly usable language like Python—is hard.

We’ve spent years with users trying to adapt their spreadsheet skills to Python. It takes weeks to learn the basics. Their existing skills don't transfer. Many of their needs are simple to do in a spreadsheet—writing a formula, aggregating data, graphing—but adapting them to Python requires long courses, emails to internal support (if any exists) waiting days for a reply, and countless trips to Stack Overflow. Often they just give up and return to Excel, but that makes them dependent on IT to write code for them. One of our users was quoted a full year for IT to implement a simple report! (Fast-forward: he ended up using Mito to automate it himself in less than a week.)

We went through this ourselves when we went to college together, studying engineering and business. We first learned data science with spreadsheets, then had to relearn it in Python. The transition was painful—basic Excel was much easier! Of course, not-so-basic Excel soon becomes not-so-easy, which is what drives the move to Python in the first place.

With our interest in spreadsheets, we started a spreadsheet-version-control company at the end of college, and spent a year working with Excel power users. Eventually, we realized that version control was secondary to the real problems users faced with spreadsheets: limited data size, speed limits, lack of advanced functionality, and a horrible replayability story.

Essentially, enterprises are caught between a rock (their spreadsheet woes) and a hard place (the pain of moving analysts to Python). We decided to work on this instead, and started Mito.

Mito is a spreadsheet UI built as an extension to Jupyter Notebooks / JupyterLab. Using a Mito spreadsheet, users can import data, add and delete columns, write formulas like Excel, make pivot tables, generate graphs, and more. See our docs (http://docs.trymito.io) for all our functionality.

Every tab in a Mito spreadsheet is a different pandas DataFrame. For each edit made, a line of pandas code is generated in a code cell directly below the spreadsheet that corresponds to this edit. For example, if I use Mito to import a CSV, add a column named Day of Week, and use the WEEKDAY formula from Excel to pull out the weekday from another column, Mito generates the following code:

  # Imported tesla stock.csv
  import pandas as pd
  tesla_stock = pd.read_csv(r'tesla stock.csv')
  
  # Added column Day of week
  tesla_stock.insert(1, 'Day of week', WEEKDAY(tesla_stock['Date']))
In practice, the typical user bounces back and forth between writing Python and using the Mito spreadsheet, depending on the task at hand. We think this fluid movement between a spreadsheet and Python is really cool. The spreadsheet backend is just a Python extension to the IPython kernel you’re already running for your Jupyter Notebook. Because Mito is just a Python package, all data processing happens locally.

As mentioned, Mito is an open core product. 90% of the code is AGPL licensed. The rest is under a separate enterprise license. These modules are still source-visible, but require users to pay for a pro or enterprise offering before using them. That’s basically our business model.

We have 3 versions (https://trymito.io/plans): (1) Free: basic analysis tools, as well as some basic telemetry that you can opt out of; (2). Pro: all of (1), with advanced functionality; (3) Enterprise: all of (2), with more advanced features, optimizations, and support.

Because spreadsheets are sprawling pieces of software, we’re pretty obsessed with optimizing for long-term development. We use strong types where we can (TypeScript on the frontend, fairly comprehensive MyPy in Python). We’ve implemented our own component libraries for common components from scratch, which lets us be flexible during large refactorings. We implemented our own custom JavaScript grid—hyper-optimized for our use case, and as a result is the fastest JS grid we tested in our context. We're also big fans of metaprogramming—we write an increasing amount of code that writes code for us—which in turn makes it easy to add more functionality to our spreadsheet.

We posted about Mito a long time ago: https://news.ycombinator.com/item?id=24305615. No one really liked it (we learned our lesson!), and it didn't do much at the time — I think the app had a single button that added a column. Three months ago, someone (not sure who — thank you, alefnula!) posted it again: https://news.ycombinator.com/item?id=31446236. It reached the top 3 and we got lots of comments—yay! Since then, we’ve doubled the number of features (mostly data processing), done a UI overhaul, dramatically expanded the Pro + Enterprise offering, made telemetry optional in the free version, and more.

We’d love to hear all about your experiences with spreadsheet analysis, the uncanny valley between spreadsheets and code, the travails of moving enterprise analytics off of spreadsheets, and whatever else you’d like to ask or mention. Any and all feedback is greatly appreciated!



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2023 best of 2022 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov