Introduction
The Open Source Initiative (OSI) has introduced the Open Source AI Definition (OSAID) to establish criteria for identifying genuinely open-source artificial intelligence (AI) systems. This initiative addresses the unique challenges posed by AI, particularly concerning training data [6], and aims to influence AI development, deployment [4], and licensing [3] [4].
Description
The Open Source Initiative (OSI) has released the first version of the Open Source AI Definition (OSAID) at the All Things Open 2024 conference, developed by a diverse group of organizations and individuals [2], including notable tech companies [2]. This definition establishes criteria to identify genuinely open-source artificial intelligence (AI) systems, addressing unique challenges posed by AI that are not covered by traditional software licenses [6], particularly concerning the data used for training [6]. Key requirements for an AI model to qualify as open-source include complete access to the training data [6], a comprehensive description of its provenance, a list of publicly available data [2], full access to the codebase [6], and transparency regarding the settings and weights used in training [6]. Additionally, the entire architecture and components of an AI model must be accessible and modifiable [2], enabling others to replicate the AI [1].
While the OSAID allows anyone to use [2], study [2] [6], modify [2] [3] [5], and share open-source AI systems [2], it does not mandate the open sourcing of training data [2], raising concerns about its adherence to open-source principles [2]. This aspect may significantly influence AI development [4], deployment [4], and licensing in the United States [4], posing challenges for tech companies like Meta Platforms Inc. To meet the OSI’s criteria [3], an AI model must provide comprehensive information about its design to enable substantial recreation [3], as well as disclose details regarding its training data [3], including processing methods and licensing [3]. OSI’s definition emphasizes the need for unrestricted freedom to use [5], modify [2] [3] [5], and share AI systems [5].
As a recognized authority in open-source software [4], OSI’s definition could impact legislative and regulatory frameworks as they evolve [4], particularly in light of existing laws such as the European Union AI Act and California’s Artificial Intelligence Training Data and Transparency Act [4], which address transparency and disclosure requirements for AI systems [4]. Although OSI’s definition is not legally binding and may encounter alternative definitions from other organizations [4], it holds potential relevance for future legal interpretations and decisions [4].
The OSI does not possess enforcement mechanisms to ensure compliance with its definition [3], aiming instead to identify models that are misrepresented as “open source.” Several companies [3], including Meta [2] [3] [6], Stability AI Ltd. [6], and Mistral [6], have faced criticism for failing to meet the OSI’s criteria. For instance [6], Meta’s Llama models [3] [6], marketed as the largest open-source AI model [1] [5], impose restrictions on commercial use for applications exceeding 700 million users and do not disclose training datasets [1] [5], making it impossible to recreate the models [6]. Stability AI requires an enterprise license for businesses with significant revenue [6], while Mistral restricts the use of its models for certain commercial applications [6].
Research indicates that many models marketed as “open source” are not genuinely open [3], as they often keep training data confidential and require substantial resources for operation [3], potentially leading to centralized control rather than the democratization of AI [3]. A study by Carnegie Mellon [6], the AI Now Institute [6], and the Signal Foundation highlights that many models labeled as “open-source” are less transparent than claimed [6], with few releasing their training datasets [6]. Meta has defended its licensing practices [3], citing safety concerns as a justification for its restrictions on training data access. Critics [1] [5], however, argue that the real motivation may be to limit legal liability and maintain a competitive edge [5], especially given that many AI models are likely trained on copyrighted material [5]. Meta has acknowledged the presence of copyrighted content in its training data [1], leading to numerous lawsuits against companies like Meta and OpenAI for alleged copyright infringement [5], with plaintiffs often relying on circumstantial evidence due to the lack of training data disclosure.
The OSI’s definition currently does not address copyright issues related to AI models [3], leaving unresolved the question of whether such models can be copyrighted under existing intellectual property law [3]. OSI intends to monitor the implementation of its definition and suggest amendments as needed [3]. AI developers and licensors in the U.S. [4] are not currently obligated to follow this definition [4], but it may be advantageous for some organizations to consider it in their AI projects and licensing strategies [4]. The ongoing debate about how traditional open-source values will adapt in the context of AI is further highlighted by efforts from other organizations, such as the Linux Foundation [1], to define “open-source AI.”
Conclusion
The introduction of the OSAID by OSI is poised to significantly impact the AI landscape by setting standards for open-source AI systems. While it raises questions about adherence to open-source principles [2], particularly regarding training data, it also has the potential to influence legislative and regulatory frameworks. The definition’s implications for tech companies and the broader AI community underscore the ongoing debate about the adaptation of open-source values in the AI domain.
References
[1] https://www.techmonitor.ai/digital-economy/ai-and-automation/open-source-initiative-unveils-new-standards-for-open-source-ai
[2] https://news.itsfoss.com/osi-ai-open-source-definition/
[3] https://techcrunch.com/2024/10/28/we-finally-have-an-official-definition-for-open-source-ai/
[4] https://www.jdsupra.com/legalnews/opening-up-about-ai-osi-defines-open-9549359/
[5] https://www.theverge.com/2024/10/28/24281820/open-source-initiative-definition-artificial-intelligence-meta-llama
[6] https://siliconangle.com/2024/10/28/osi-clarifies-makes-ai-systems-open-source-open-models-fall-short/