************
Data Catalog
************

Each data source provide some datasets/climate variables, and the catalog class is the only way
for you to know what data are available at a certain date at a specific location.

The data catalog is a dictionary with the available datasets as keys and the attributes that
describe each dataset are stored in an another dictionary.


------
CHIRPS
------

.. code-block:: py

    from earth2observe.chirps import Catalog
    print(chirps_catalog.catalog)

    >>> {
    >>> 'Precipitation':
    >>>         {
    >>>             'descriptions': 'rainfall [mm/temporal_resolution]',
    >>>             'units': 'mm/temporal_resolution',
    >>>             'temporal resolution': ['daily', 'monthly'],
    >>>             'file name': 'rainfall',
    >>>             'var_name': 'R'
    >>>         }
    >>> }


-----
ECMWF
-----

.. code-block:: py

    from earth2observe.ecmwf import Catalog
    catalog = Catalog()

    >>> {
    >>>     'version': 1,
    >>>     'datasets': [
    >>>                     'cams_gfas', 'cams_nrealtime', 'cera20c', 'cera_sat', 'era15', 'era20c', 'era20cm', 'era20cmv0',
    >>>                     'era40', 'geff_reanalysis', 'icoads', 'interim', 'interim_land', 'ispd', 'macc', 'macc_nrealtime',
    >>>                     's2s', 'tigge', 'uerra', 'yopp', 'yotc'
    >>>                ],
    >>>     'variables': [
    >>>                     'T', '2T', 'SRO', 'SSRO', 'WIND', '10SI', 'SP', 'Q', 'SSR', 'R', 'E', 'SUND', 'RO', 'TP', '10U',
    >>>                     '10V', '2D', 'SR', 'AL', 'HCC'
    >>>                 ],
    >>>     'T': {
    >>>             'descriptions': 'Temperature [K]',
    >>>             'units': 'C',
    >>>             'types': 'state',
    >>>             'temporal resolution': ['six hours', 'daily', 'monthly'],
    >>>             'file name': 'Tair2m',
    >>>             'download type': 3,
    >>>             'number_para': 130,
    >>>             'var_name': 't',
    >>>         },
    >>> .....
    >>> .....
    >>> }

- To get the attributes for a specific variable for example the Evaporation `E`

.. code-block:: py

    var = "E"
    catalog.get_variable(var)

    >>> {
    >>>     'descriptions': 'Evaporation [m of water]',
    >>>     'units': 'mm',
    >>>     'types': 'flux',
    >>>     'temporal resolution': ['six hours', 'daily', 'monthly'],
    >>>     'file name': 'Evaporation',
    >>>     'download type': 2,
    >>>     'number_para': 182,
    >>>     'var_name': 'e'
    >>> }


---------
Amazon-S3
---------
- for Amazon S3 the data depends on the aws bucket, so the catalog object has to initialize a connection to the
    bucket and check the data inside the bucket

.. code-block:: py

    from earth2observe.s3 import Catalog
    s3_catalog = Catalog()
    print(s3_catalog.catalog)

    >>> {
    >>>     'precipitation': {
    >>>                         'descriptions': 'rainfall [mm/temporal_resolution]',
    >>>                         'units': 'mm/temporal_resolution',
    >>>                         'temporal resolution': ['daily', 'monthly'],
    >>>                         'file name': 'rainfall',
    >>>                         'var_name': 'R',
    >>>                         'bucket_name': 'precipitation_amount_1hour_Accumulation'
    >>>                     }
    >>> }

- As you can see the attribute descibes the same climate variable like precipitation differs from one data source to
    another

- To get the attributes for a specific climate variable.

.. code-block:: py

    s3_catalog.get_variable("precipitation")

    >>> {
    >>>     'descriptions': 'rainfall [mm/temporal_resolution]',
    >>>     'units': 'mm/temporal_resolution',
    >>>     'temporal resolution': ['daily', 'monthly'],
    >>>     'file name': 'rainfall',
    >>>     'var_name': 'R',
    >>>     'bucket_name': 'precipitation_amount_1hour_Accumulation'
    >>> }

- To get the time span of the precipitation data.

.. code-block:: py

    years = s3_catalog.get_available_years()
    print(years)
    >>> [
    >>>     '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992',
    >>>     '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006',
    >>>     '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020',
    >>>     '2021', '2022', 'QA', 'zarr'
    >>> ]


.. note::

    The catalog is still in the development phase, ideally the catalog will be json file containing all the available
    data provided by each data source, at the moment the Amazon S3 data source catalog contains only the