Most people don’t think of Nvidia as a software company. It makes GPUs, right? Some of the best GPUs in the world, no doubt, especially for artificial intelligence & machine learning (AI/ML) workloads. But they’re rarely thought of as a powerhouse in developer experience (DX), whether that’s for code written by AI/ML engineers or data scientists.
If you’ve ever tried to program something GPU-accelerated, you might’ve experienced the need to learn complicated custom frameworks like CUDA, which was specific only to Nvidia so your code didn’t work everywhere you wanted it to run. Or perhaps you worked with TensorFlow — also complex. In recent years, PyTorch made a subset of ML workloads a bit easier, and also provide multiple backends to improve portability.
However, many AI/ML developers and data scientists are already accustomed to much easier to use Python toolkits that are common in AI/ML and data science more broadly. These include well-known examples such as Pandas for general-purpose data processing, NetworkX for network analysis, and scikit-learn or Spark MLLib for a variety of machine-learning algorithms.
Any vendor or open-source creator aiming to build a great DX in this problem space needs to account for the fact that most of their audience is already familiar and comfortable with this ecosystem. That means anything that aims to gain broad adoption needs to minimize the barriers to entry from where those developers and data scientists are today, which is using Pandas, scikit-learn, Spark MLLib, and NetworkX.
Nothing has lower barriers than going to where they already are, requiring minimal or ideally zero code changes to see significant benefits. And that’s exactly what Nvidia has done.
Late last year, Nvidia hosted an AI and data science virtual summit where they made a series of meaningful announcements that are symbolic of the kinds of investments they’re making in improving DX for data science. They announced:
- RAPIDS cuDF Accelerates pandas Nearly 150x with Zero Code Changes
- Accelerating NetworkX on NVIDIA GPUs for High Performance Graph Analytics
- Reduce Apache Spark ML Compute Costs with New Algorithms in Spark RAPIDS ML Library
In each case, they’ve enabled massive performance improvements through GPU acceleration with essentially the flip of a switch. By changing one package import in your Python code, you can get all the value while maintaining full syntax compatibility. And if you make that package import conditional, then your code will continue working even when you’re on a Mac or some other system without Nvidia GPUs. Overall, this is an excellent move on their part. It provides a well-understood DX with minimal barriers to entry.
My main gripe with their approach is that rather than pushing this code upstream as a backend, it’s going into Nvidia’s RAPIDS suite of AI libraries. As a long-time observer, developer, and leader in open-source software communities, I frequently see this as an anti-pattern of vendor participation. This will inevitably result in Nvidia chasing upstream API changes over time and needing to keep up with ongoing development, due to their need to maintain compatibility. Depending on their codebase, they may also have increased overhead of bringing in the upstream code, merging it, and continually needing to resolve conflicts between the community’s code and Nvidia’s custom code.
If Nvidia’s level of investment changes in the future, it could put users at risk of falling behind in versions compared to upstream, or needing to switch back to an unaccelerated library to keep getting the latest features. On the bright side for end users, the process of switching between the upstream and Nvidia libraries is so simple that there’s nearly no lock-in risk.
Nvidia also offers a fully supported software solution dubbed Nvidia AI Enterprise, which the above libraries are gradually being incorporated into. Coupled with Nvidia-Certified Systems, this may be of interest to enterprises who are looking to stay fully on-premises for perceived reasons relating to security, privacy, or regulatory concerns. It’s also worth evaluating the cost profile of a fully private option against Nvidia GPUs available in public clouds based on the frequency and intensity of your expected workloads, to see what makes the most sense for your specific environment.
Key takeaways
- For AI/ML developers and data scientists who have access to Nvidia GPUs, we recommend adopting Nvidia’s syntax-compatible libraries wherever possible, to gain benefits of significant performance boosts with negligible changes to code.
- Enterprises who are just getting started with AI/ML workloads should consider Nvidia’s “AI Enterprise” offering. They could couple this with Nvidia-Certified Systems for a fully supported, well-integrated, on-premises hardware and software solution.
- Early adopters and research environments may prefer to continue without the paid support. They should evaluate usage of these libraries using their existing open-source software policies, and determine the criticality and support needs just like any other framework in a production application (e.g. scope/scale of usage, impact of performance/downtime).
Disclosures: Nvidia has been a client.