Director of Data Services
As a Snowflake fan, I love when I get projects that come down the pipe that are Snowflake based. They are much simpler than SQL Server projects, I do not need to jump through hoops to get started downloading SSMS or Data Studio, getting provisioned etc. There are great native tools like the python client library. And probably the best thing is it is just plain fast.
However one of my least favorite things about Snowflake has always been the UI. It gets cluttered easily, lacks things like auto-complete, just overall feels clunky, and did I mention there is not an auto-complete?
So when Snowflake announced the new UI would be in open preview, I was pretty excited. The SQL Worksheets replacement, Snowsight, got a lot of love in the process. It adds some necessary features to the platform as well as some great nice to haves. Below are some of my favorites and how I have been using them to speed up my data analysis and discovery workflows.
It gets really easy to open up a sheet, see something you want to explore, then something else… and you keep going until all the sudden you have 15 worksheets. Some have some value to it, others are just there, but all of them are unorganized.
With Snowsight, I can hop right into a folder view where I have organized worksheets based on an analysis I was doing, which is typically the approach I have been taking. As a quick note, you cannot nest folders. What this allows me to do is limit the amount of worksheets I have open to just what I’m working on in that session.
Snowsight now maintains a version of a worksheet. This one does not really improve my workflow as much as it bails me out of a jam. The typical use case for me here is I am on my third or fourth iteration of a query that I have been working on for a few days, but all of the sudden I realize I have deleted or modified a section of the query, but I do not remember what. With the worksheet history I can very quickly buzz through old versions of the query. As an added bonus if the result cache is still available or that version it also returns.
This one is very convenient by ensuring my queries are consistent, and are flexible. With the worksheet filters I can easily parameterize components of my query. Out of the box Snowsight comes with daterange and datebucket. I’ve mostly been taking advantage of daterange in the context of queries I run frequently where I just swap out the time period. For example, I have a weekly set of data I deliver to one of our data scientists. We are currently in the process of operationalizing a data pipeline for the results, but currently it is still a manual process. So each week I go update the query with some new dates, and inevitably I have missed something and the results are off (typically a date). With the worksheets filter I can update those values accurately whenever I run from the UI without having to hunt down the line in the query.
Additionally to the two out of the box filters, you have the ability to do custom filters based on an itemized list or query results. This is also very convenient as you need to modify your query for things like a customer, business unit, etc.
This one is not the most exciting feature, but effective for collaboration. I use this extensively for confirming business logic, result sets, and just to have another pair of eyes on what I’ve put together. Hit the ‘Get Link’ button and that will copy a link to your clipboard that you can distribute vs copy and pasting the whole query into a chat or email. This can be done at the worksheet or folder levels. It is worth noting that there are more granular sharing settings when sharing from the folder level as you can see below.
Automatic Contextual Statistics
Now we are getting into the heavy hitters. How do you profile your data for analysis? For me it is usually a combination of pandas and pandas-profiling. It is a great method for exploring data, but it adds steps to my workflow. I have to leave the UI, transfer the data into a Jupyter notebook somehow, then if I see an issue or need to change the query, I have to go back and repeat the process. With the automatic contextual statistics, I can chop all that out and just stay in the Snowflake UI.
Once the query is run, I get some basic statistics about the column, data type and one of the following charts Filled/empty meters, Histograms, Frequency distributions, Email domain distributions, Key distributions. This becomes very handy when looking at large data sets.
This is the scenario where this feature really caught my eye. I had a query where a column should not be empty, however given the large number of rows I would have to either spot check or export to a Jupyter notebook. However with the statistics in place I could easily check and see that I indeed have empty values.
If I had spot checked the results I would have missed the empty values, and if I exported to a notebook I would have added time to completing my query.
Filtering based on statistics
This is where the statistics really shine. Given that same example above where my DISTRIBUTION_WEEK had empty values. Had this been the old UI or another platform, I would have needed to add a where clause to my query, rerun, then identified what the issues were. However with the statistics in place all I have to do is click on the text of ‘0.01% blank’ and my result set is automatically filtered. This works great for trouble shooting or for just general data exploration. I can use any of the elements in the DESTINATION_CITY list to filter down my result set.
Boy oh boy, this is the one that does it for me. Above all else the autocomplete feature has significantly sped up my time to deliver results. This was a common frustration I would hear from members of my team about using worksheets. Snowflake really did a good job here, not only does auto complete work on things like columns and table names (including aliased tables), but also on functions where you get syntax help and a description and a link to the docs in a hover over.
In addition to the above there are some great new features that are included:
SQL Formatting — Write messy SQL? Click of a button Snowflake will format your query.
Duplicate worksheet — If you need to quickly modify a worksheet but you do not want to change your original you can quickly make a duplicate.
Data Visualization — This is a really great feature, I typically do not need visualizations which is why I did not mention it above, but you can now build charts and dashboards within the Snowflake UI. This is actually really big as Snowflake continues to take major steps in being a true data platform.
Schema Explorer — Lots of enhancements here to the old UI. You can now visually add tables and their columns (including data types) to a canvas. This way you can have reference points for multiple tables as you write your query.
Features I would still like to see in the new UI
I am positive that the Snowflake team are going to continue to iterate on Snowsight and the new UI in general. With that being said here is my wishlist of enhancements:
Git Integration — As a data engineer being able to tie into my version control platform. This way I can ensure the code that is running in my pipeline functions as I expect. Just configure the authentication, and have my queries in my repo populate a folder.
Execution details — This is an existing feature in the old UI that just hasn’t made its way to Snowsight yet. Just some quick stats about the execution.
Column selections — Another existing feature, it really shines when you are working with very wide result sets. Simply being able to select a subset of columns from your results.
Snowsight is currently in open preview so be sure to check out the features above and all of the other features I didn’t mention that are outlined in the Snowflake docs. How has the new UI and Snowsight helped you out? Let me know in the comments, and share what you like and what you are looking forward to.