Exploring Azure Data Lake with VSCode

Microsoft recently released new functionality within VSCode to easily explore the contents of your Azure Data Lake. Previously, your options were to explore the Data Lake Store and U-SQL catalog via. Visual Studio or the Azure Portal.

Visual Studio Code(VSCode) is a modern platform where you can develop on a multitude of languages with support for products such as Azure Data Lake Analytics.

Contents

VSCode
Explore the Data Lake Store
Explore the U-SQL Catalog


VSCode

Installing any extension withing Visual Studio Code is a simple two-step process:

  • Open Visual Studio Code
  • Install Azure Data Lake Tools Extension

A quick run down of our left hand toolbar from top to bottom:

  • Explorer
  • Search bar
  • Source Control
  • Debug
  • Extensions

alt

Click on the last icon "Extensions" and search for "Azure Data Lake Tools". Go ahead and hit install. It may ask you to restart. In the image below, I have already installed the extension, but it is disabled for purposes of this tutorial.

alt

After the extension has been installed or reloaded, navigate back to the first icon which is our Explorer.

alt

The DataLake Explorer now displays in our Explorer. From the image above we see the following:

  • Azure Subscriptions
  • Azure Data Lake Instances
  • Azure Data Lake Stores
  • Azure Data Lake Analytic(U-SQL Databases)

Explore the Data Lake Store

If you click on your Data Lake Store Account under Storage Accounts, you are able to naviagte through your Data Lake Store hierarchy. Here for purposes of exploring the GDELT dataset I have data stored in the following order:

\DataLake \ {tier} \ {DataSource} \ {DataSet} \ {year} \ {month} \ {day} \file.csv

alt

Explore the U-SQL Catalog

Looking again at the image below, we have access to the full U-SQL Catalog. One thing to notice that is different from the Data Lake Explorer in Visual Studio is that the tables are grouped under their associated schema.
alt

Microsoft has made great strides with Visual Studio Code. You no longer have to have multiple IDE's open in order to work on a project. I can code my powershell right alongside my U-SQL scripts.