•  Summary 
  •  
  •  Actions 
  •  
  •  Committee Votes 
  •  
  •  Floor Votes 
  •  
  •  Memo 
  •  
  •  Text 
  •  
  •  LFIN 
  •  
  •  Chamber Video/Transcript 

A06578 Summary:

BILL NOA06578B
 
SAME ASSAME AS S06955-A
 
SPONSORBores
 
COSPNSRCunningham, Kelles, Forrest, Chandler-Waterman, Torres, Otis, Levenberg, Griffin, Wright
 
MLTSPNSR
 
Add Art 44-C §§1430 - 1432, Gen Bus L
 
Establishes the artificial intelligence training data transparency act requiring developers of generative artificial intelligence models or services to post on the developer's website information regarding the data used by the developer to train the generative artificial intelligence model or service, including a high-level summary of the datasets used in the development of such system or service.
Go to top

A06578 Text:



 
                STATE OF NEW YORK
        ________________________________________________________________________
 
                                         6578--B
                                                                Cal. No. 166
 
                               2025-2026 Regular Sessions
 
                   IN ASSEMBLY
 
                                      March 6, 2025
                                       ___________
 
        Introduced  by M. of A. BORES, CUNNINGHAM, KELLES, FORREST, CHANDLER-WA-
          TERMAN, TORRES, OTIS, LEVENBERG, GRIFFIN -- read once and referred  to
          the Committee on Science and Technology -- ordered to a third reading,
          amended  and  ordered  reprinted,  retaining its place on the order of
          third reading -- again amended on third  reading,  ordered  reprinted,
          retaining its place on the order of third reading
 
        AN  ACT  to  amend the general business law, in relation to establishing
          the artificial intelligence training data transparency act
 
          The People of the State of New York, represented in Senate and  Assem-
        bly, do enact as follows:
 
     1    Section 1. The general business law is amended by adding a new article
     2  44-C to read as follows:
     3                                 ARTICLE 44-C
     4           ARTIFICIAL INTELLIGENCE TRAINING DATA TRANSPARENCY ACT
 
     5  Section 1430. Short title.
     6          1431. Definitions.
     7          1432. Data  used  to  train  generative  artificial intelligence
     8                  models or services.
     9    § 1430. Short title. This act shall be known and may be cited  as  the
    10  "artificial intelligence training data transparency act".
    11    §  1431.  Definitions. For the purposes of this article, the following
    12  terms shall have the following meanings:
    13    1. "Artificial intelligence" or "artificial  intelligence  technology"
    14  means  a machine-based system that can, for a given set of human-defined
    15  objectives, make predictions, recommendations, or decisions  influencing
    16  real  or  virtual  environments,  and that uses machine- and human-based
    17  inputs to perceive real and virtual environments, abstract such  percep-
    18  tions into models through analysis in an automated manner, and use model
    19  inference to formulate options for information or action.

         EXPLANATION--Matter in italics (underscored) is new; matter in brackets
                              [ ] is old law to be omitted.
                                                                   LBD07975-08-6

        A. 6578--B                          2
 
     1    2.  "Developer" means a person, partnership, state or local government
     2  agency, or corporation that designs, codes, produces,  or  substantially
     3  modifies  an artificial intelligence model or service for use by members
     4  of the public.
     5    3.  "Generative  artificial  intelligence"  means a class of AI models
     6  that emulate the structure and characteristics of input data to generate
     7  derived synthetic content, including, but not limited to, images,  vide-
     8  os, audio, text, and other digital content.
     9    4.  "Substantially modifies" or "substantial modification" means a new
    10  version, new release, or other update to a generative artificial  intel-
    11  ligence  model  or  service that materially changes its functionality or
    12  performance, including the results of retraining or fine tuning.
    13    5. "Synthetic data generation" means a process in which seed  data  is
    14  used to create artificial data that have some of the statistical charac-
    15  teristics of the seed data.
    16    6.  "Train  a  generative  artificial  intelligence  model or service"
    17  includes testing, validating, or fine tuning by  the  developer  of  the
    18  artificial intelligence model or service.
    19    7.  "Aggregate consumer information" means information that relates to
    20  a group of consumers, from which  individual  consumer  identities  have
    21  been  removed, that is not linked or reasonably linkable to any consumer
    22  or household, including via a  device.  Aggregate  consumer  information
    23  does  not  mean  one  or more individual consumer records that have been
    24  de-identified.
    25    8. "AI model" means an information system or component of an  informa-
    26  tion  system that implements artificial intelligence technology and uses
    27  computational, statistical, or machine-learning  techniques  to  produce
    28  outputs from a given set of inputs.
    29    §  1432.  Data used to train generative artificial intelligence models
    30  or services. 1. On or before January first, two  thousand  twenty-seven,
    31  and  prior to each time thereafter that a generative artificial intelli-
    32  gence model or service, or a substantial modification  to  a  generative
    33  artificial  intelligence  model or service, released on or after January
    34  first, two thousand twenty-two, is made publicly available to New  York-
    35  ers for use, regardless of whether the terms of such use include compen-
    36  sation,  the developer of such model or service shall post on the devel-
    37  oper's website documentation regarding the data used by the developer to
    38  train the generative artificial intelligence model or service, including
    39  a high-level summary of the datasets used  in  the  development  of  the
    40  generative  artificial intelligence model or service, including, but not
    41  limited to:
    42    (a) the sources or owners of the datasets;
    43    (b) a description of how the datasets further the intended purpose  of
    44  the artificial intelligence model or service;
    45    (c)  the  number of data points included in the datasets, which may be
    46  in general ranges, and with estimated figures for dynamic datasets;
    47    (d) a description of the types of data points within the datasets. For
    48  purposes of this paragraph, the following definitions apply:
    49    (i) as applied to datasets that include labels, "types of data points"
    50  means the types of labels used; and
    51    (ii) as applied to datasets without labeling, "types of  data  points"
    52  refers to the general characteristics;
    53    (e)  whether  the  datasets  include  any data protected by copyright,
    54  trademark, or patent, or whether the datasets are entirely in the public
    55  domain;
    56    (f) whether the datasets were purchased or licensed by the developer;

        A. 6578--B                          3
 
     1    (g) whether the datasets  include  personal  information  or  personal
     2  identifying  information,  as  defined  in section eight hundred ninety-
     3  nine-aaa of this chapter;
     4    (h) whether the datasets include aggregate consumer information;
     5    (i)  whether there was any cleaning, processing, or other modification
     6  to the datasets by the developer,  including  the  intended  purpose  of
     7  those  efforts  in  relation  to  the  artificial  intelligence model or
     8  service;
     9    (j) the time period  during  which  the  data  in  the  datasets  were
    10  collected, including a notice if the data collection is ongoing;
    11    (k)  the  dates the datasets were first used during the development of
    12  the artificial intelligence model or service; and
    13    (l) whether the generative artificial intelligence  model  or  service
    14  used  or continuously uses synthetic data generation in its development.
    15  A developer may include a description of the functional need or  desired
    16  purpose of the synthetic data in relation to the intended purpose of the
    17  model or service.
    18    2.  A  developer shall not be required to post documentation regarding
    19  the data used to train a generative  artificial  intelligence  model  or
    20  service for any of the following:
    21    (a)  A  generative artificial intelligence model or service whose sole
    22  purpose is the operation of aircraft in the national airspace; or
    23    (b) A generative artificial intelligence model  or  service  developed
    24  for national security, military, or defense purposes that is made avail-
    25  able only to a federal entity.
    26    § 2. This act shall take effect immediately.
Go to top