•  Summary 
  •  
  •  Actions 
  •  
  •  Committee Votes 
  •  
  •  Floor Votes 
  •  
  •  Memo 
  •  
  •  Text 
  •  
  •  LFIN 
  •  
  •  Chamber Video/Transcript 

A06578 Summary:

BILL NOA06578A
 
SAME ASNo Same As
 
SPONSORBores
 
COSPNSRCunningham, Kelles, Forrest, Chandler-Waterman, Torres, Otis, Levenberg, Griffin
 
MLTSPNSR
 
Add Art 44-C §§1430 - 1433, Gen Bus L
 
Establishes the artificial intelligence training data transparency act requiring developers of generative artificial intelligence models or services to post on the developer's website information regarding the data used by the developer to train the generative artificial intelligence model or service, including a high-level summary of the datasets used in the development of such system or service.
Go to top

A06578 Text:



 
                STATE OF NEW YORK
        ________________________________________________________________________
 
                                         6578--A
                                                                Cal. No. 166
 
                               2025-2026 Regular Sessions
 
                   IN ASSEMBLY
 
                                      March 6, 2025
                                       ___________
 
        Introduced  by M. of A. BORES, CUNNINGHAM, KELLES, FORREST, CHANDLER-WA-
          TERMAN, TORRES, OTIS, LEVENBERG, GRIFFIN -- read once and referred  to
          the Committee on Science and Technology -- ordered to a third reading,
          amended  and  ordered  reprinted,  retaining its place on the order of
          third reading
 
        AN ACT to amend the general business law, in  relation  to  establishing
          the artificial intelligence training data transparency act
 
          The  People of the State of New York, represented in Senate and Assem-
        bly, do enact as follows:
 
     1    Section 1. The general business law is amended by adding a new article
     2  44-C to read as follows:
     3                                 ARTICLE 44-c
     4           ARTIFICIAL INTELLIGENCE TRAINING DATA TRANSPARENCY ACT
 
     5  Section 1430. Short title.
     6          1431. Definitions.
     7          1432. Data used  to  train  generative  artificial  intelligence
     8                  models or services.
     9          1433. Employee data used to train generative artificial intelli-
    10                  gence models or services.
    11    §  1430.  Short title. This act shall be known and may be cited as the
    12  "artificial intelligence training data transparency act".
    13    § 1431. Definitions. For the purposes of this article,  the  following
    14  terms shall have the following meanings:
    15    1.  "Artificial  intelligence" or "artificial intelligence technology"
    16  means a machine-based system that can, for a given set of  human-defined
    17  objectives,  make predictions, recommendations, or decisions influencing
    18  real or virtual environments, and that  uses  machine-  and  human-based
    19  inputs  to perceive real and virtual environments, abstract such percep-
    20  tions into models through analysis in an automated manner, and use model
    21  inference to formulate options for information or action.

         EXPLANATION--Matter in italics (underscored) is new; matter in brackets
                              [ ] is old law to be omitted.
                                                                   LBD07975-05-6

        A. 6578--A                          2
 
     1    2. "Developer" means a person, partnership, state or local  government
     2  agency,  or  corporation that designs, codes, produces, or substantially
     3  modifies an artificial intelligence model or service for use by  members
     4  of the public.
     5    3.  "Generative  artificial  intelligence"  means a class of AI models
     6  that are self-supervised and emulate the structure  and  characteristics
     7  of  input data to generate derived synthetic content, including, but not
     8  limited to, images, videos, audio, text, and other digital content.
     9    4. "Substantially modifies" or "substantial modification" means a  new
    10  version,  new release, or other update to a generative artificial intel-
    11  ligence model or service that materially changes  its  functionality  or
    12  performance, including the results of retraining or fine tuning.
    13    5.  "Synthetic  data generation" means a process in which seed data is
    14  used to create artificial data that have some of the statistical charac-
    15  teristics of the seed data.
    16    6. "Train a  generative  artificial  intelligence  model  or  service"
    17  includes  testing,  validating,  or  fine tuning by the developer of the
    18  artificial intelligence model or service.
    19    7. "Aggregate consumer information" means information that relates  to
    20  a  group  of  consumers,  from which individual consumer identities have
    21  been removed, that is not linked or reasonably linkable to any  consumer
    22  or  household,  including  via  a device. Aggregate consumer information
    23  does not mean one or more individual consumer  records  that  have  been
    24  de-identified.
    25    8.  "AI model" means an information system or component of an informa-
    26  tion system that implements artificial intelligence technology and  uses
    27  computational,  statistical,  or  machine-learning techniques to produce
    28  outputs from a given set of inputs.
    29    § 1432. Data used to train generative artificial  intelligence  models
    30  or  services.  1. On or before January first, two thousand twenty-seven,
    31  and prior to each time thereafter that a generative artificial  intelli-
    32  gence  model  or  service, or a substantial modification to a generative
    33  artificial intelligence model or service, released on or  after  January
    34  first,  two thousand twenty-two, is made publicly available to New York-
    35  ers for use, regardless of whether the terms of such use include compen-
    36  sation, the developer of such model or service shall post on the  devel-
    37  oper's website documentation regarding the data used by the developer to
    38  train the generative artificial intelligence model or service, including
    39  a  high-level  summary  of  the  datasets used in the development of the
    40  generative artificial intelligence model or service, including, but  not
    41  limited to:
    42    (a) the sources or owners of the datasets;
    43    (b)  a description of how the datasets further the intended purpose of
    44  the artificial intelligence model or service;
    45    (c) the number of data points included in the datasets, which  may  be
    46  in general ranges, and with estimated figures for dynamic datasets;
    47    (d) a description of the types of data points within the datasets. For
    48  purposes of this paragraph, the following definitions apply:
    49    (i) as applied to datasets that include labels, "types of data points"
    50  means the types of labels used; and
    51    (ii)  as  applied to datasets without labeling, "types of data points"
    52  refers to the general characteristics;
    53    (e) whether the datasets include  any  data  protected  by  copyright,
    54  trademark, or patent, or whether the datasets are entirely in the public
    55  domain;
    56    (f) whether the datasets were purchased or licensed by the developer;

        A. 6578--A                          3
 
     1    (g)  whether  the  datasets  include  personal information or personal
     2  identifying information, as defined in  section  eight  hundred  ninety-
     3  nine-aaa of this chapter;
     4    (h) whether the datasets include aggregate consumer information;
     5    (i)  whether there was any cleaning, processing, or other modification
     6  to the datasets by the developer,  including  the  intended  purpose  of
     7  those  efforts  in  relation  to  the  artificial  intelligence model or
     8  service;
     9    (j) the time period  during  which  the  data  in  the  datasets  were
    10  collected, including a notice if the data collection is ongoing;
    11    (k)  the  dates the datasets were first used during the development of
    12  the artificial intelligence model or service; and
    13    (l) whether the generative artificial intelligence  model  or  service
    14  used  or continuously uses synthetic data generation in its development.
    15  A developer may include a description of the functional need or  desired
    16  purpose of the synthetic data in relation to the intended purpose of the
    17  model or service.
    18    2.  A  developer shall not be required to post documentation regarding
    19  the data used to train a generative  artificial  intelligence  model  or
    20  service for any of the following:
    21    (a)  A  generative artificial intelligence model or service whose sole
    22  purpose is the operation of aircraft in the national airspace; or
    23    (b) A generative artificial intelligence model  or  service  developed
    24  for national security, military, or defense purposes that is made avail-
    25  able only to a federal entity.
    26    § 1433. Employee data used to train generative artificial intelligence
    27  models  or  services. 1. Any person, partnership, state or local govern-
    28  ment agency, or corporation that designs, codes, produces,  or  substan-
    29  tially  modifies  a  generative artificial intelligence model or service
    30  using data of which a  substantial  part  is  derived  from  individuals
    31  employed or contracted by the entity, regardless if whether the model is
    32  made  publicly available, shall ensure that the following information is
    33  disclosed to each employee whose data is used to  train  the  artificial
    34  intelligence model:
    35    (a)  the  intended  purpose  of  the  artificial intelligence model or
    36  service;
    37    (b) a description of how the collected datasets further  the  intended
    38  purpose of the artificial intelligence model or service;
    39    (c) a description of the types of data points within the datasets;
    40    (d)  whether  the  datasets  include  personal information or personal
    41  identifying information, as defined in  section  eight  hundred  ninety-
    42  nine-aaa of this chapter;
    43    (e)  the  dates the datasets were first used during the development of
    44  the artificial intelligence model or service; and
    45    (f) the time period  during  which  the  data  in  the  datasets  were
    46  collected, including a notice if the data collection is ongoing.
    47    2.  An  entity  that uses employee or contractor data to design, code,
    48  produce, or substantially modify a  generative  artificial  intelligence
    49  model  or  service  shall  not  be  required to disclose the information
    50  required by this section if the model or service:
    51    (a) is solely intended to be used in the operation of aircraft in  the
    52  national airspace; or
    53    (b)  is developed for national security, military, or defense purposes
    54  and only made available to a federal entity.
    55    § 2. This act shall take effect immediately.
Go to top