Open Source Investment Framework

WritingAI
Vic Singh

Part 3 of a 3 Part Series: Eniac’s Blueprint for Investing in Open-Source AI Startups

By Vic Singh and James Barone

tl;dr: The open-source AI landscape represents an exciting frontier brimming with potential, yet navigating this complex ecosystem requires a thoughtful and nuanced strategy. In Part 1 of this 3-part series, we illustrated the Eniac market map and open-source AI tooling stack. In Part 2, we analyzed the insights gleaned from our comprehensive research study. In this final post, we reveal our proprietary framework for investing in the open-source AI space, inclusive of our thoughts around ideal founder archetypes, approaches to differentiation, nuanced commercialization models, target customers, ecosystem partnerships, and metrics that matter. Follow along as we unpack the framework and dive into the dynamic landscape of open-source AI.

The open-source artificial intelligence (AI) movement holds tremendous promise, but also poses risks for entrepreneurs and investors alike. On one hand, the open-source model allows startups to quickly build AI capabilities and engage a community of enthusiastic users and contributors to accelerate distribution and development; however, many lack proprietary technology and have undefined business models which raise questions around how these companies will eventually commercialize in a durable way.

As the open-source AI ecosystem continues to evolve, the startups best positioned for success will need to balance these trade-offs and carefully thread the needle through technology and commercial viability. Specifically, we believe they will be led by visionary technical founders capable of building engaging developer communities, contributions to open-source projects that demonstrate technical expertise with a credible roadmap for developing proprietary assets, and a sustainable business model for their commercial offering. At Eniac, we developed a proprietary framework for investing in this new space. We believe that open-source AI startups that fit the characteristics outlined below could deliver massive societal value and financial returns.

Founder Archetype: Ideal open-source AI founders possess technical expertise, community-building experience, and a nuanced approach to commercialization

Perhaps the most important aspect of investing in an open-source AI startup, especially at the pre-seed/seed stage, is in the archetype of founder(s). The ideal founder(s) should possess a robust technical orientation, showcasing a profound understanding of the problem they’re solving as well as a deep understanding of the users they’re serving. Ideally, founders would have had some experience building developer communities in prior endeavors and have built a strong reputation of respect within the developer community.

To effectively nurture and expand these communities, founder(s) should also engage in deep, unscalable interactions with developers. This includes (but is not limited to): the creation of compelling content encompassing both technical insights and real-world use, hosting independent Slack channels and builder meetups, and fostering genuine engagement within the community via Discord and IRL.

Most importantly, founders must demonstrate a nuanced approach to enterprise and go-to-market strategies, carefully balancing the line between building a strong community and acquiring commercial customers. Such an approach requires a deliberate amd intentional focus on attracting contributors of quality over quantity, ensuring that the foundational code base is built on a robust infrastructure that can be eventually used by paying, enterprise customers. This emphasis on quality contributors safeguards the project’s integrity, allowing it to flourish organically once released into the wider ecosystem. At the seed stage, we don’t necessarily need an enterprise offering to be built. Rather, a conceptual framework and roadmap for a compelling enterprise offering should be clearly communicated by the founder.

Open Source AI is moving so fast that codebases with truly differentiated, proprietary, value-adding IP will win vs. bundled frameworks

Thousands of AI companies have been founded since the release of ChatGPT. 90+% of them haven’t developed proprietary IP that can be used as an initial moat to scale from a project to a full-fledged, sustainable venture scale company. For example, we recently explored an investment in a promising open-source AI orchestration startup that had fantastic user metrics (e.g., a large number of Github stars & forks and contributors, etc.); however, upon examining the codebase, we discovered that the project assembled many existing open-source projects with very little additional IP. We passed on investing because we weren’t convinced of their defensibility and pushed the founders to think deeply about building proprietary technology. In our view, bundling frameworks together isn’t a “show-stopper”, so long as there is proprietary technology powering the system.

Startups must outline a path to an enterprise-grade offering, focusing on data security, privacy, and licensing, as well as a monetizable premium functionality

While we heard anecdotally that many potential enterprise customers are still exploring AI, we know that they care greatly about security, privacy, and sovereignty of their data. Many want solutions that help development teams learn without exposing sensitive information. The best open-source founders recognize that to commercialize, they’ll eventually need to cater to the needs of these buyers. As such, they think creatively about how they can build a real business on top of and around their open-source technology from the get-go.

A recent survey from Morgan Stanley outlines how CFOs are thinking about AI purchasing decisions. Today, 66% of those surveyed are still evaluating the technology vs. 4% that have deployed in production; however, a whopping 33% expect to be in production during the latter half of 2024.

Licensing is also a hot-button issue in open-source AI that founders need to thoughtfully navigate. On one hand, developers love the ethos of open contribution. But on the other hand, companies need commercial licenses to monetize. This creates tension around perceived “selling out” if initial open-source promises aren’t kept. Through our research, we found licensing strategy to be top-of-mind for the ecosystem. Startups must thread the needle between sustaining an engaged community and generating revenue. The most credible founders have proactively considered tensions around licensing rather than addressing them reactively. They understand the conversations happening around open-source sustainability and commercialization.

Ideal customer profiles in highly regulated industries…

Many eventual adopters of open-source AI technologies will — perhaps counterintuitively — be highly regulated industries like financial services, healthcare and defense. These industries deal with sensitive data and face strict privacy and compliance requirements. As a result, they have heightened concerns around data sovereignty and allowing third parties to access their information, which makes them hesitant to adopt AI solutions from external vendors that would involve moving data to the cloud; however, open source presents an appealing alternative. With access to the underlying models, companies can train and fine-tune them on-prem while keeping data in-house (vs. relying on a hosted service), under their control.

…and/or full-stack open-source application stacks:

In the past decade, companies that have garnered significant valuations have found ways to lower the barriers to entry for others to compete. Shopify, for example, enabled ordinary brands to build eCommerce stores at a fraction of the cost it would’ve been to hire a website development team. Similarly, Stripe dramatically simplified payment infrastructure so online businesses could quickly and easily accept payments. The key insight is that lowering barriers for others can paradoxically drive more demand for products and services. The same applies to open-source AI startups.

Full-stack, closed-source AI companies like OpenAI and Anthropic already seem unbeatable due to the immense resources required to build competing models, but open-source AI companies can lower barriers by providing pre-trained models, frameworks, fine-tuning, and orchestrations that anyone can access. This enables companies to enter the market at a much cheaper entry point. Rather than building core AI from scratch, they can focus on (and retain control of) vertical-specific datasets and applications. For example, a healthcare company could leverage open-source vision and language models, then fine-tune and combine them to analyze medical images and text reports. By iterating on open-source foundations, they can create a full-stack medical AI offering without massive upfront investment: open-source models (i.e., LLaMA) provide the base, while proprietary data and industry expertise provide the specialization. We talk about building a Full-Stack AI startup in this post.

Symbiotic partnership opportunities exist amongst open-source projects and non-native open-source startups that offer security, compliance, scalability, etc “as-a-service”

As we previously mentioned, open-source AI projects often lack enterprise-grade security, compliance, and scalability capabilities out of the box. This creates opportunities for companies that might not be natively open-source to offer security, compliance, architecture compatibility, and scalability “as-a-service” to help enterprises tame sprawl and manage deployments. Such startups enable enterprises to adopt open-source AI stacks by making them more robust and enterprise-ready, offering solutions that address common complaints around vulnerabilities, both at a system-wide and data level.

These “pick & shovel” companies can tap into enterprise demand for open-source AI while also providing crucial value-adds. Rather than compete directly with open-source projects, they can partner with them in a symbiotic way: enterprises get improved open-source offerings, and open-source projects gain additional users and contributors through enterprise adoption. This creates a win-win ecosystem that allows innovation to flourish while satisfying the requirements of corporate buyers.

Open-source is a fantastic distribution mechanism, BUT being open-source for the sake of being open-source isn’t the right approach

Open-source companies have found the most success when operating at lower levels of infrastructure and data management. Companies like dbt and PostgreSQL provide valuable tools that lend themselves well to customization once installed in an enterprise, leading to stickiness and opportunities for upsell. The key is that these companies initially focus their offerings on a niche area rather than trying to be too broad.

However, being open source for its own sake is not always the right approach, especially at the application layer. Many projects that provide open-source alternatives to popular SaaS tools struggle to differentiate themselves — particularly if the only differentiator is that the tool is open-source. Although some highly regulated industries may adopt open-source application layers out of necessity, many enterprises lack the resources to customize these tools to their needs. For example, Mattermost, an open-source Slack alternative made headlines when Uber opted to deploy and build on their open-source communications platform; however, Uber quickly recognized that their developers’ time could be better spent elsewhere, and ended up purchasing Slack anyways. Startups should be wary of open sourcing a tool that could easily exist as a centralized SaaS.

Open Source AI startups can ride the mass adoption wave of existing larger platforms by offering composable functionality, integrations and more

Recently, we’ve been observing a unique, adjacent opportunity where successful businesses have been built around extending and augmenting the functionality of a widely adopted platform, particularly around technologies that are lower in the stack (e.g., the data layer).

Rather than reinventing the wheel, open-source AI startups can build upon these existing technologies, harnessing an established platform and community of active developers, to extend the platform’s functionality. By contributing to an open-source framework that adds desired features on top of an existing platform, startups can ride their coattails with an open codebase that is flexible by nature, enabling them to go to market from a proven codebase and brand rather than from scratch. Enabling deeper and wider integrations, feature set extensibility and architecture composability on top of existing open source AI platforms with mass adoption can be a winning formula.

The Eniac Open-Source Metric Funnel: Going deep into open-source metrics to assess the quality of traction and path to commercial viability

Open-source metrics have a mixed reputation with investors. Some use them as nothing more than a fast assessment of a project’s popularity without truly “digging under the hood”, while others disregard them entirely when reviewing an open-source project. At the end of the day, metrics help illustrate how likely it is for open-source companies to build up a sustainable user base that eventually converts to paying customers. At Eniac, we like to think about metrics like how a salesperson would think about converting leads into customers: via a funnel.

Vanity Metrics (Github stars, forks, watchers)

  • Stars, forks, and watchers help assess general interest and mindshare. A high number shows the project resonates with developers; however, these can be somewhat gamed and don’t necessarily indicate depth of usage. We always view these metrics with a grain of salt and think of them more as a barometer of project popularity.
  • That said, these vanity metrics do showcase the founders ability to build developer communities and can serve as a key barometer of potential traction

Ratios & Updates (forks-to-stars, contributors)

  • Going a layer down into the funnel, the forks-to-stars ratio is a better indicator of true user engagement. A higher ratio means users are modifying the code and experimenting vs just starring the repo. Redpoint Ventures put together a great analysis of fork-to-star ratios at each funding stage of an open-source startup’s life. We generally consider a fork-to-star ratio between 5–20% as a positive indicator, but by no means is it viewed in a silo as gospel.
  • Additionally, contributor growth shows project momentum, as it indicates that more and more developers are actively trying to improve the codebase. Accelerating contributors over time is a healthy sign of an active, engaged community.

Developer Usage

  • Disclosed live deployments indicate real-world usage, not just interest, while limited deployments could signal issues with maturity, ease of use and documentation. Projects that incorporate telemetry and usage tracking (such as daily active users) can measure live deployments more systematically rather than relying on disclosures. This enables them to quantify adoption over time and provide insights into how customers are integrating and deriving value from the software, which will guide the project’s roadmap by revealing pain points and feature gaps.

3rd Party Apps

  • Developers building ancillary tools, plugins, and other applications on top of the core open-source project validate its utility and ease of integration — more derivatives indicate a robust ecosystem developing around the project. Strong ecosystems exhibit “platform” dynamics where the community adds capabilities, so the core doesn’t have to. This creates the classic “flywheel” effect.
  • GPTPilot, an open-source AI tool that writes scalable applications from scratch, is a great example of a project that tracks and measures derivative projects built from its source code. To date, the company has over 16,000 unique applications that have been built on top of its core offering — a great indication of future success.

Deployed Commercial Applications

  • The end goal of any open-source project is to acquire customers (i.e., enterprises) that pay real money into commercial deployments; it signals that users find concrete value for which it is worth paying. During this analysis, we tend to look beyond just the quantity of enterprise logos and look to the quality of these deployments (i.e., the software is used in production deployments vs. proofs-of-concept, low churn, etc.) Alternatively, limited commercial adoption may indicate issues converting open-source interest into commercial value, with projects stranded at the POC stage.

Conclusion

Net-net, our thesis is simple: invest early behind technical founders who can build communities and go to market with startups contributing to the open-source AI ecosystem while keeping a close eye on a tractable path to building a commercial offering. While not without risks, the open-source model provides a compelling way to rapidly build AI capabilities and engage a community of users and contributors to help drive distribution and fast-track development from contributors. As this market continues to take shape, the companies most likely to succeed will be those that can clearly articulate their value proposition, build proprietary technology, innovate on sustainable business models, and engage meaningfully with developers.

Critical factors for assessment include the company’s strategy for monetization, ability to differentiate through data and applications, and metrics of community engagement. With thoughtful strategies around community building, product development, and commercialization, open-source AI offers a compelling and disruptive path for founders to build legendary companies.

Special thank you to all survey respondents, as well as these founders, investors, and operators who were generous enough with their time to provide first party insights & feedback in 1–1 interviews: Alan Zabihi, Amanda “Robby” Robson, Andrew Carr, Apoorva Pandhi, Gaurav Gupta, Ismail Pelaseyed, James Alcorn, Juliet Bailin,Kyle Corbitt, Raj Singh, Tim Chen, and Zander Matheson.

SaaS is dead, long live AI?

Read

The Faux First Mover Advantage

Read