{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a6f1bbe4",
   "metadata": {},
   "source": [
    "___We recommend working with this notebook on Google Colab___\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ridatadiscoverycenter/riddc-jbook/blob/main/riddc/notebooks/fox-kemper/noaa_coops_download.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cf680d0",
   "metadata": {},
   "source": [
    "# Downloading Tide Data from NOAA CO-OPS API"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c7e9278a",
   "metadata": {},
   "source": [
    "Author of this document: Timothy Divoll   [<img width=\"45\" height=\"15\" src=\"https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white\">](https://github.com/tdivoll)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3258b417",
   "metadata": {},
   "source": [
    "The purpose of this notebook is to demonstrate how to download data from NOAA's CO-OPS Data API. In this example, data are parsed into a dataframe and also written to CSV files (as needed to use in the `Assessing Accuracy of the Tide Predictions of the Ocean State Ocean Model` notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9d2d0d8",
   "metadata": {},
   "source": [
    "If needed, dataframes could be saved in other common formats and exported as needed."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0afff31d",
   "metadata": {},
   "source": [
    "First, we need to install the noaa_coops Python wrapper (https://github.com/GClunies/noaa_coops)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "bf1ba2aa",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621\u001b[0m\u001b[33m\n",
      "\u001b[0m\u001b[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.0\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49m/opt/homebrew/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip\u001b[0m\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "%pip install noaa_coops -q # this package sends requests to the NOAA CO-OPS Data API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "9cc6c5ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import dependencies and ignore warnings in code outputs\n",
    "import noaa_coops\n",
    "import pandas as pd\n",
    "from datetime import datetime\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d8b2129",
   "metadata": {},
   "source": [
    "### Direct data dowload - example\n",
    "First make a list of all the stations to pull data for"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "0eca4516",
   "metadata": {},
   "outputs": [],
   "source": [
    "station_list = ['8461490', '8510560', '8447930', '8449130', '8452660', '8454049', '8447386', '8452944', '8454000']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cdc9483",
   "metadata": {},
   "source": [
    "The next cell has code to extract the data directly from the NOAA CO-OPS API rather than downloading from the webpage. This block only shows one example station and the following code block loops through the station list to pull data for each station."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "7d052486",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# send a request to the CO-OPS API for Data Retrieval\n",
    "# https://api.tidesandcurrents.noaa.gov/api/prod/\n",
    "# MLLW = mean lower low water\n",
    "\n",
    "# New London, CT example\n",
    "new_london = noaa_coops.Station(station_list[0]) #use a different index for a different station\n",
    "new_london_verified = new_london.get_data(\n",
    "    begin_date = \"20120601\",\n",
    "    end_date = \"20220616\",\n",
    "    product = \"water_level\",\n",
    "    datum = \"MLLW\",\n",
    "    units = \"metric\",\n",
    "    time_zone = \"gmt\",\n",
    "    interval = \"h\")\n",
    "new_london_predicted = new_london.get_data(\n",
    "    begin_date = \"20120601\",\n",
    "    end_date = \"20220616\",\n",
    "    product = \"predictions\",\n",
    "    datum = \"MLLW\",\n",
    "    units = \"metric\",\n",
    "    time_zone = \"gmt\",\n",
    "    interval = \"h\")\n",
    "\n",
    "# merge verified and predicted, then rename columns to match `readcsv` function\n",
    "new_london_df = pd.merge(new_london_verified, new_london_predicted, on=\"date_time\").drop(columns = [\"sigma\", \"flags\", \"QC\"]).rename(columns={\"water_level\": \"Verified (m)\", \"predicted_wl\": \"Predicted (m)\"}).reset_index()\n",
    "new_london_df[\"Date\"] = new_london_df[\"date_time\"].dt.strftime(\"%m/%d/%Y\")\n",
    "new_london_df[\"Time (GMT)\"] = new_london_df[\"date_time\"].dt.strftime(\"%H:%M\")\n",
    "\n",
    "# see the next example to save data to Google Drive or to a local folder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "c6509306",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date_time</th>\n",
       "      <th>Verified (m)</th>\n",
       "      <th>Predicted (m)</th>\n",
       "      <th>Date</th>\n",
       "      <th>Time (GMT)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2004-01-01 00:00:00</td>\n",
       "      <td>0.353</td>\n",
       "      <td>0.408</td>\n",
       "      <td>01/01/2004</td>\n",
       "      <td>00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2004-01-01 01:00:00</td>\n",
       "      <td>0.238</td>\n",
       "      <td>0.311</td>\n",
       "      <td>01/01/2004</td>\n",
       "      <td>01:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2004-01-01 02:00:00</td>\n",
       "      <td>0.119</td>\n",
       "      <td>0.194</td>\n",
       "      <td>01/01/2004</td>\n",
       "      <td>02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2004-01-01 03:00:00</td>\n",
       "      <td>0.101</td>\n",
       "      <td>0.104</td>\n",
       "      <td>01/01/2004</td>\n",
       "      <td>03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2004-01-01 04:00:00</td>\n",
       "      <td>0.118</td>\n",
       "      <td>0.098</td>\n",
       "      <td>01/01/2004</td>\n",
       "      <td>04:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            date_time  Verified (m)  Predicted (m)        Date Time (GMT)\n",
       "0 2004-01-01 00:00:00         0.353          0.408  01/01/2004      00:00\n",
       "1 2004-01-01 01:00:00         0.238          0.311  01/01/2004      01:00\n",
       "2 2004-01-01 02:00:00         0.119          0.194  01/01/2004      02:00\n",
       "3 2004-01-01 03:00:00         0.101          0.104  01/01/2004      03:00\n",
       "4 2004-01-01 04:00:00         0.118          0.098  01/01/2004      04:00"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_london_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "554b6ee0",
   "metadata": {},
   "source": [
    "## Set up a destination to save results\n",
    "\n",
    "Execute the commands below directly in this notebook to connect to your Google Drive (via Colab) or set a path locally.\n",
    "\n",
    "**NOTE #1: If you are working in Google Colab, the next block will mount a results folder in Google Drive, otehrwise, the resuts folder will be in the current local directory**\n",
    "\n",
    "**NOTE #2: The following command will open a pop-up window requesting access to your Google Drive file system**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "c00d8c72",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    from google.colab import drive\n",
    "    drive.mount('/content/gdrive/', force_remount=True)\n",
    "    %mkdir ./gdrive/MyDrive/noaa_coops_tide_data/\n",
    "    results_path = \"./gdrive/MyDrive/noaa_coops_tide_data/\"\n",
    "except ModuleNotFoundError:\n",
    "    import os\n",
    "    results_dir = \"noaa_coops_tide_data\"\n",
    "    parent_dir = \"./\"\n",
    "    results_path = os.path.join(parent_dir, results_dir)\n",
    "    os.mkdir(results_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b18e28f2",
   "metadata": {},
   "source": [
    "The following code chunk pulls data for all stations in the list. It exports each station's data to a CSV in the format expected by `readcsv` in the `Assessing Accuracy of the Tide Predictions of the Ocean State Ocean Model` notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "f0b0e379",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Loop through each station and retreive data\n",
    "# Use the station list defined in the prior code block\n",
    "# Only one year of data is extracted for this example, but the date ranges can be changed\n",
    "\n",
    "results = pd.DataFrame()\n",
    "for i in station_list:\n",
    "    station_data = noaa_coops.Station(i)\n",
    "    station_data_verified = station_data.get_data(\n",
    "        begin_date = \"20210616\",\n",
    "        end_date = \"20220616\",\n",
    "        product = \"water_level\",\n",
    "        datum = \"MLLW\",\n",
    "        units = \"metric\",\n",
    "        time_zone = \"gmt\",\n",
    "        interval = \"h\")\n",
    "    station_data_predicted = station_data.get_data(\n",
    "        begin_date = \"20210616\",\n",
    "        end_date = \"20220616\",\n",
    "        product = \"predictions\",\n",
    "        datum = \"MLLW\",\n",
    "        units = \"metric\",\n",
    "        time_zone = \"gmt\",\n",
    "        interval = \"h\")\n",
    "    results_df = pd.merge(station_data_verified, station_data_predicted, on=\"date_time\").drop(columns = [\"sigma\", \"flags\", \"QC\"]).rename(columns={\"water_level\": \"Verified (m)\", \"predicted_wl\": \"Predicted (m)\"}).reset_index()\n",
    "    results_df[\"Station ID\"] = station_data.name\n",
    "    results_df[\"Date\"] = results_df[\"date_time\"].dt.strftime(\"%m/%d/%Y\")\n",
    "    results_df[\"Time (GMT)\"] = results_df[\"date_time\"].dt.strftime(\"%H:%M\")\n",
    "    results_df.to_csv(f'{results_path}/{station_data.name}_tide_data.csv')\n",
    "    results = results.append(results_df)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e02bd41",
   "metadata": {},
   "source": [
    "View the full dataframe containing data from all stations. Note that the head and tail of the `results` df are displaying data from different stations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "8b7cc7de",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date_time</th>\n",
       "      <th>Verified (m)</th>\n",
       "      <th>Predicted (m)</th>\n",
       "      <th>Station ID</th>\n",
       "      <th>Date</th>\n",
       "      <th>Time (GMT)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2021-01-01 00:00:00</td>\n",
       "      <td>0.223</td>\n",
       "      <td>0.241</td>\n",
       "      <td>New London</td>\n",
       "      <td>01/01/2021</td>\n",
       "      <td>00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2021-01-01 01:00:00</td>\n",
       "      <td>0.410</td>\n",
       "      <td>0.431</td>\n",
       "      <td>New London</td>\n",
       "      <td>01/01/2021</td>\n",
       "      <td>01:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2021-01-01 02:00:00</td>\n",
       "      <td>0.558</td>\n",
       "      <td>0.564</td>\n",
       "      <td>New London</td>\n",
       "      <td>01/01/2021</td>\n",
       "      <td>02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2021-01-01 03:00:00</td>\n",
       "      <td>0.647</td>\n",
       "      <td>0.637</td>\n",
       "      <td>New London</td>\n",
       "      <td>01/01/2021</td>\n",
       "      <td>03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2021-01-01 04:00:00</td>\n",
       "      <td>0.676</td>\n",
       "      <td>0.639</td>\n",
       "      <td>New London</td>\n",
       "      <td>01/01/2021</td>\n",
       "      <td>04:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13171</th>\n",
       "      <td>2022-06-16 19:00:00</td>\n",
       "      <td>0.084</td>\n",
       "      <td>-0.006</td>\n",
       "      <td>Providence</td>\n",
       "      <td>06/16/2022</td>\n",
       "      <td>19:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13172</th>\n",
       "      <td>2022-06-16 20:00:00</td>\n",
       "      <td>0.137</td>\n",
       "      <td>0.004</td>\n",
       "      <td>Providence</td>\n",
       "      <td>06/16/2022</td>\n",
       "      <td>20:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13173</th>\n",
       "      <td>2022-06-16 21:00:00</td>\n",
       "      <td>0.303</td>\n",
       "      <td>0.182</td>\n",
       "      <td>Providence</td>\n",
       "      <td>06/16/2022</td>\n",
       "      <td>21:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13174</th>\n",
       "      <td>2022-06-16 22:00:00</td>\n",
       "      <td>0.523</td>\n",
       "      <td>0.373</td>\n",
       "      <td>Providence</td>\n",
       "      <td>06/16/2022</td>\n",
       "      <td>22:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13175</th>\n",
       "      <td>2022-06-16 23:00:00</td>\n",
       "      <td>0.678</td>\n",
       "      <td>0.588</td>\n",
       "      <td>Providence</td>\n",
       "      <td>06/16/2022</td>\n",
       "      <td>23:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>118584 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                date_time  Verified (m)  Predicted (m)  Station ID  \\\n",
       "0     2021-01-01 00:00:00         0.223          0.241  New London   \n",
       "1     2021-01-01 01:00:00         0.410          0.431  New London   \n",
       "2     2021-01-01 02:00:00         0.558          0.564  New London   \n",
       "3     2021-01-01 03:00:00         0.647          0.637  New London   \n",
       "4     2021-01-01 04:00:00         0.676          0.639  New London   \n",
       "...                   ...           ...            ...         ...   \n",
       "13171 2022-06-16 19:00:00         0.084         -0.006  Providence   \n",
       "13172 2022-06-16 20:00:00         0.137          0.004  Providence   \n",
       "13173 2022-06-16 21:00:00         0.303          0.182  Providence   \n",
       "13174 2022-06-16 22:00:00         0.523          0.373  Providence   \n",
       "13175 2022-06-16 23:00:00         0.678          0.588  Providence   \n",
       "\n",
       "             Date Time (GMT)  \n",
       "0      01/01/2021      00:00  \n",
       "1      01/01/2021      01:00  \n",
       "2      01/01/2021      02:00  \n",
       "3      01/01/2021      03:00  \n",
       "4      01/01/2021      04:00  \n",
       "...           ...        ...  \n",
       "13171  06/16/2022      19:00  \n",
       "13172  06/16/2022      20:00  \n",
       "13173  06/16/2022      21:00  \n",
       "13174  06/16/2022      22:00  \n",
       "13175  06/16/2022      23:00  \n",
       "\n",
       "[118584 rows x 6 columns]"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb810a94",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.7 64-bit ('3.9.7')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  },
  "vscode": {
   "interpreter": {
    "hash": "b240976dc37aaf1a529ebe7133fc70f8114476d5871f1696eba5bdf4fd2ca117"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}