Establishes the artificial intelligence training data transparency act requiring developers of generative artificial intelligence models or services to post on the developer's website information regarding the data used by the developer to train the generative artificial intelligence model or service, including a high-level summary of the datasets used in the development of such system or service.
STATE OF NEW YORK
________________________________________________________________________
6578--A
Cal. No. 166
2025-2026 Regular Sessions
IN ASSEMBLY
March 6, 2025
___________
Introduced by M. of A. BORES, CUNNINGHAM, KELLES, FORREST, CHANDLER-WA-
TERMAN, TORRES, OTIS, LEVENBERG, GRIFFIN -- read once and referred to
the Committee on Science and Technology -- ordered to a third reading,
amended and ordered reprinted, retaining its place on the order of
third reading
AN ACT to amend the general business law, in relation to establishing
the artificial intelligence training data transparency act
The People of the State of New York, represented in Senate and Assem-bly, do enact as follows:
1 Section 1. The general business law is amended by adding a new article
2 44-C to read as follows:
3 ARTICLE 44-c
4 ARTIFICIAL INTELLIGENCE TRAINING DATA TRANSPARENCY ACT
5 Section 1430. Short title.
6 1431. Definitions.
7 1432. Data used to train generative artificial intelligence
8 models or services.
9 1433. Employee data used to train generative artificial intelli-
10 gence models or services.
11 § 1430. Short title. This act shall be known and may be cited as the
12 "artificial intelligence training data transparency act".
13 § 1431. Definitions. For the purposes of this article, the following
14 terms shall have the following meanings:
15 1. "Artificial intelligence" or "artificial intelligence technology"
16 means a machine-based system that can, for a given set of human-defined
17 objectives, make predictions, recommendations, or decisions influencing
18 real or virtual environments, and that uses machine- and human-based
19 inputs to perceive real and virtual environments, abstract such percep-
20 tions into models through analysis in an automated manner, and use model
21 inference to formulate options for information or action.
EXPLANATION--Matter in italics (underscored) is new; matter in brackets
[] is old law to be omitted.
LBD07975-05-6
A. 6578--A 2
1 2. "Developer" means a person, partnership, state or local government
2 agency, or corporation that designs, codes, produces, or substantially
3 modifies an artificial intelligence model or service for use by members
4 of the public.
5 3. "Generative artificial intelligence" means a class of AI models
6 that are self-supervised and emulate the structure and characteristics
7 of input data to generate derived synthetic content, including, but not
8 limited to, images, videos, audio, text, and other digital content.
9 4. "Substantially modifies" or "substantial modification" means a new
10 version, new release, or other update to a generative artificial intel-
11 ligence model or service that materially changes its functionality or
12 performance, including the results of retraining or fine tuning.
13 5. "Synthetic data generation" means a process in which seed data is
14 used to create artificial data that have some of the statistical charac-
15 teristics of the seed data.
16 6. "Train a generative artificial intelligence model or service"
17 includes testing, validating, or fine tuning by the developer of the
18 artificial intelligence model or service.
19 7. "Aggregate consumer information" means information that relates to
20 a group of consumers, from which individual consumer identities have
21 been removed, that is not linked or reasonably linkable to any consumer
22 or household, including via a device. Aggregate consumer information
23 does not mean one or more individual consumer records that have been
24 de-identified.
25 8. "AI model" means an information system or component of an informa-
26 tion system that implements artificial intelligence technology and uses
27 computational, statistical, or machine-learning techniques to produce
28 outputs from a given set of inputs.
29 § 1432. Data used to train generative artificial intelligence models
30 or services. 1. On or before January first, two thousand twenty-seven,
31 and prior to each time thereafter that a generative artificial intelli-
32 gence model or service, or a substantial modification to a generative
33 artificial intelligence model or service, released on or after January
34 first, two thousand twenty-two, is made publicly available to New York-
35 ers for use, regardless of whether the terms of such use include compen-
36 sation, the developer of such model or service shall post on the devel-
37 oper's website documentation regarding the data used by the developer to
38 train the generative artificial intelligence model or service, including
39 a high-level summary of the datasets used in the development of the
40 generative artificial intelligence model or service, including, but not
41 limited to:
42 (a) the sources or owners of the datasets;
43 (b) a description of how the datasets further the intended purpose of
44 the artificial intelligence model or service;
45 (c) the number of data points included in the datasets, which may be
46 in general ranges, and with estimated figures for dynamic datasets;
47 (d) a description of the types of data points within the datasets. For
48 purposes of this paragraph, the following definitions apply:
49 (i) as applied to datasets that include labels, "types of data points"
50 means the types of labels used; and
51 (ii) as applied to datasets without labeling, "types of data points"
52 refers to the general characteristics;
53 (e) whether the datasets include any data protected by copyright,
54 trademark, or patent, or whether the datasets are entirely in the public
55 domain;
56 (f) whether the datasets were purchased or licensed by the developer;
A. 6578--A 3
1 (g) whether the datasets include personal information or personal
2 identifying information, as defined in section eight hundred ninety-
3 nine-aaa of this chapter;
4 (h) whether the datasets include aggregate consumer information;
5 (i) whether there was any cleaning, processing, or other modification
6 to the datasets by the developer, including the intended purpose of
7 those efforts in relation to the artificial intelligence model or
8 service;
9 (j) the time period during which the data in the datasets were
10 collected, including a notice if the data collection is ongoing;
11 (k) the dates the datasets were first used during the development of
12 the artificial intelligence model or service; and
13 (l) whether the generative artificial intelligence model or service
14 used or continuously uses synthetic data generation in its development.
15 A developer may include a description of the functional need or desired
16 purpose of the synthetic data in relation to the intended purpose of the
17 model or service.
18 2. A developer shall not be required to post documentation regarding
19 the data used to train a generative artificial intelligence model or
20 service for any of the following:
21 (a) A generative artificial intelligence model or service whose sole
22 purpose is the operation of aircraft in the national airspace; or
23 (b) A generative artificial intelligence model or service developed
24 for national security, military, or defense purposes that is made avail-
25 able only to a federal entity.
26 § 1433. Employee data used to train generative artificial intelligence
27 models or services. 1. Any person, partnership, state or local govern-
28 ment agency, or corporation that designs, codes, produces, or substan-
29 tially modifies a generative artificial intelligence model or service
30 using data of which a substantial part is derived from individuals
31 employed or contracted by the entity, regardless if whether the model is
32 made publicly available, shall ensure that the following information is
33 disclosed to each employee whose data is used to train the artificial
34 intelligence model:
35 (a) the intended purpose of the artificial intelligence model or
36 service;
37 (b) a description of how the collected datasets further the intended
38 purpose of the artificial intelligence model or service;
39 (c) a description of the types of data points within the datasets;
40 (d) whether the datasets include personal information or personal
41 identifying information, as defined in section eight hundred ninety-
42 nine-aaa of this chapter;
43 (e) the dates the datasets were first used during the development of
44 the artificial intelligence model or service; and
45 (f) the time period during which the data in the datasets were
46 collected, including a notice if the data collection is ongoing.
47 2. An entity that uses employee or contractor data to design, code,
48 produce, or substantially modify a generative artificial intelligence
49 model or service shall not be required to disclose the information
50 required by this section if the model or service:
51 (a) is solely intended to be used in the operation of aircraft in the
52 national airspace; or
53 (b) is developed for national security, military, or defense purposes
54 and only made available to a federal entity.
55 § 2. This act shall take effect immediately.