Ploomber (YC W22) – Quickly deploy data pipelines from Jupyter/VSCode

Hi HN, we’re Eduardo & Ido, the founders of Ploomber (https://ploomber.io). We’re building an open-source framework (https://github.com/ploomber/ploomber) that helps data scientists quickly deploy the code they develop in interactive environments (Jupyter/VScode/PyCharm), eliminating the need for time-consuming manual porting to production platforms.

Jupyter and other interactive environments are the go-to tools for most data scientists. However, many production data pipeline platforms (e.g. Airflow, Kubernetes) drag them into non-interactive development paradigms. Hence, when moving to production, the data scientist’s code needs to move from the interactive environment to a more traditional software environment (e.g. declaring workflows as Python classes). This process creates friction since the code needs to cross this gap every time the data scientist deploys their work. Data scientists often pair with software engineers to work on the conversion, but this is time-consuming and costly. It’s also frustrating because it’s just busy work.

We encountered this problem while working in the data space. Eduardo was a data scientist at Fidelity for a few years. He deployed ML models and always found it annoying and wasteful to port the code from his notebooks into a production framework like Airflow or Kubernetes. Ido worked as a consultant at AWS and constantly found that data science projects would allocate about 30% of their time to convert a notebook prototype into a production pipeline.

Interactive environments have historically been used for prototyping and are considered unsuitable for production; this is reasonable because, in our experience, most of the code developed interactively exists in a single file with little to no structure (e.g., a gigantic notebook). However, we believe it’s possible to bring software engineering best practices and apply them to the interactive development world so data scientists can produce maintainable projects to streamline deployment.

Ploomber allows data scientists to quickly develop their code in modular pipelines rather than a giant single file. When developed this way, their code is suitable for deployment to production platforms; we currently support exporting to Kubernetes, AWS Batch, Airflow, Kubeflow, and SLURM with no code changes. Our integration with Jupyter/VSCode/PyCharm allows them to iteratively build these modular pipelines without moving away from the interactive environment. In addition, modularizing the work enables them to create more maintainable and testable projects. Our goal is ease of use, with minimal disturbance to the data scientist’s existing workflows.

Users can install Ploomber with pip, open Jupyter/VSCode/PyCharm, and start building in minutes. We’ve made a significant effort to create a simple tool so people can get started quickly and learn the advanced features when they need them. Ploomber is available at https://github.com/ploomber/ploomber under the Apache 2.0 license. In addition, we are working on a cloud version to help enterprises operationalize models. We’re still working on the pricing details, but if you’d like us to let you know when we open the private beta, you can sign up here: https://ploomber.io/cloud. However, the core of our offering is the open-source framework, and it will remain free.

We’re thrilled to share Ploomber with you! If you’re a data scientist who has experienced these endless cycles of porting your code for deployment, an ML engineer who helps data scientists deploy their work, or you have any feedback, please share your thoughts! We love chatting about this domain since exchanging ideas always sheds light on aspects we haven’t considered before! You may also reach out to me at [email protected].



Get Top 5 Posts of the Week



best of all time best of today best of yesterday best of this week best of this month best of last month best of this year best of 2023 best of 2022 yc s24 yc w24 yc s23 yc w23 yc s22 yc w22 yc s21 yc w21 yc s20 yc w20 yc s19 yc w19 yc s18 yc w18 yc all-time 3d algorithms animation android [ai] artificial-intelligence api augmented-reality big data bitcoin blockchain book bootstrap bot css c chart chess chrome extension cli command line compiler crypto covid-19 cryptography data deep learning elexir ether excel framework game git go html ios iphone java js javascript jobs kubernetes learn linux lisp mac machine-learning most successful neural net nft node optimisation parser performance privacy python raspberry pi react retro review my ruby rust saas scraper security sql tensor flow terminal travel virtual reality visualisation vue windows web3 young talents


andrey azimov by Andrey Azimov