![Bryan Cafferky](/img/default-banner.jpg)
- Видео 177
- Просмотров 2 264 086
Bryan Cafferky
США
Добавлен 31 окт 2011
Technical training on Big Data, Machine Learning, Spark, Azure, Databricks, SQL Server, and automation with PowerShell and state-of-the-art technology topics. Primary topics include:
- Databricks and Apache Spark
- Python
- Developing Apps with Python
- PowerShell
- Azure Cloud Data Services
- R Programming
- Structured Query Language (SQL) and Databases
- Databricks and Apache Spark
- Python
- Developing Apps with Python
- PowerShell
- Azure Cloud Data Services
- R Programming
- Structured Query Language (SQL) and Databases
What to Learn for Your Career Path
The most common question I get asked is "What do I need to learn?" related to a given career path like Data Engineer, Data Scientist, Data Analyst, or other role. The answer is simpler than you may think. Join me as I resolve this question and put you on the right path.
Support me on Patreon
www.patreon.com/bePatron?u=63260756
My Playlists
www.youtube.com/@BryanCafferky/playlists
Slides from this Video
github.com/bcafferky/shared/blob/master/WhatToLearn/WhatToLearn.pdf
Support me on Patreon
www.patreon.com/bePatron?u=63260756
My Playlists
www.youtube.com/@BryanCafferky/playlists
Slides from this Video
github.com/bcafferky/shared/blob/master/WhatToLearn/WhatToLearn.pdf
Просмотров: 376
Видео
About My Channel
Просмотров 344День назад
In the digital age, people are confused by the myriad of complex technologies. My channel is all about taking complex things, breaking them down, and making them simple to understand. Once you have that, the rest is easy. Let me tell you about what my channel is about and why you should subscribe. My Playlists www.youtube.com/@BryanCafferky/playlists Master Databricks & Apache Spark ruclips.net...
Quick Review of the Best App Dev Tools & Services
Просмотров 42714 дней назад
Whether you want to showcase your data analytics skills, or just create a cool app, you'll need an app development tool. There's a lot of tools out there to build and deploy web and mobile apps but which are the best, i.e., easiest to use, maintain, and deploy? With a focus on data centric apps, I'll review my favorites and discuss the pros and cons of the runner ups. Support me on Patreon www....
Master Databricks and Apache Spark Step by Step: Series Update - What's Changed?
Просмотров 1,2 тыс.Месяц назад
Since this series was uploaded, Databricks has added a lot of powerful new services and enhanced existing services. In this video, I will give a summary of what has changed since I uploaded the series. Spoiler Alert! The series is still valid and an ideal jump start on Databricks and Apache Spark. Watch the video to understand why.
Understanding Databricks & Apache Spark Performance Tuning: Lesson 02 - Spark Hardware
Просмотров 1,2 тыс.Месяц назад
Following up on Databricks Performance Tuning with the best place to start: allocating Spark clusters. If you don't allocate sufficient resources, nothing else will fix the problem. How many nodes? How large should the driver and workers be? Do you need GPUs or CPUs? Should you use Photon? These and many more questions will be covered in detail. Support me on Patreon www.patreon.com/bePatron?u=...
Understanding Databricks & Apache Spark Performance Tuning: Lesson 01 - Spark Architecture
Просмотров 2,8 тыс.3 месяца назад
A popular interview question and a critical topic for all Databricks and Spark developers, how do you tune and optimize Spark queries? This video provides a conceptual understanding of where things can go wrong as a starting point to understanding performance tuning and optimization. Support me on Patreon www.patreon.com/bePatron?u=63260756 Slides github.com/bcafferky/shared/blob/master/Databri...
Master Dimensional Modeling Lesson 02 - The 4 Step Process
Просмотров 2,1 тыс.3 месяца назад
Dimensional Modeling is the process of developing The Star Schema, a popular and effective way to organize your data to maximize business value. In this video, you will learn about the 4 steps in the Dimensional Modeling process. Support me on Patreon www.patreon.com/bePatron?u=63260756 Slides github.com/bcafferky/shared/blob/master/MasterDimensionalModeling/lesson_02/lesson02_DimModelingSteps0...
Master Dimensional Modeling Lesson 01 - Why Use a Dimensional Model?
Просмотров 4,6 тыс.4 месяца назад
Dimensional Modeling is a popular and effective way to organize your data to maximize business value. In this video, you will learn what a Dimensional Model, aka a Star Schema is and why you should use them to organize your data warehouse. Support me on Patreon www.patreon.com/bePatron?u=63260756 Slides github.com/bcafferky/shared/blob/master/MasterDimensionalModeling/lesson_01/DimModelingWhy_l...
Data Architecture vs. Data Engineering Deep Dive
Просмотров 3,3 тыс.4 месяца назад
Are you an aspiring Data Architect? Join me as I explain what Data Architecture is and what Data Engineering is with in-depth explanations and examples. I draw on decades of experience as a Data Engineer and Data Architect to give you time tested advice and best practices. Support me on Patreon www.patreon.com/bePatron?u=63260756 Slides available here: github.com/bcafferky/shared/blob/master/Da...
Master Data Workload Automation: Introduction
Просмотров 1,3 тыс.5 месяцев назад
Automating your data workloads is essential in today's mission critical data driven businesses but what is the best way to do it? There are two basic choices: job schedulers and data orchestrators. I'll explain what they are, review some examples, and explain when to use each. Support me on Patreon www.patreon.com/bePatron?u=63260756 Slides available here: github.com/bcafferky/shared/blob/maste...
Streamlit for Dummies: Lesson 3 - Using Advanced Features
Просмотров 9445 месяцев назад
This video builds on lesson 2 by using Streamlit's advanced features like state management, layout control, animations, and more. Using these features is crucial to building professional apps. Support me on Patreon www.patreon.com/bePatron?u=63260756 Code: github.com/bcafferky/shared/blob/master/Streamlit/lesson03/lesson03.zip My Video on Using Python Virtual Environments: ruclips.net/video/bjU...
Streamlit for Dummies: Lesson 2 - Writing Your First App
Просмотров 1,3 тыс.6 месяцев назад
Streamlit is a fun and easy way to create interactive web apps in Python. In this video I show you how to code a simple Streamlit app, i.e. a game. Support me on Patreon www.patreon.com/bePatron?u=63260756 Code: github.com/bcafferky/shared/blob/master/Streamlit/lesson02/lesson02_basic.py My Video on Using Python Virtual Environments: ruclips.net/video/bjUjNSotYgA/видео.html My Streamlit Introdu...
Python Streamlit for Dummies
Просмотров 6 тыс.7 месяцев назад
Streamlit is a fun and easy way to create interactive web apps in Python. Join me as I explain how you can get started using this powerful framework to have fun, build a data analytics online work portfolio, and add a valuable skill to your resume. Support me on Patreon www.patreon.com/bePatron?u=63260756 Slides: github.com/bcafferky/shared/blob/master/Streamlit/lesson01/StreamlitForDummies.pdf...
Python Virtual Environments & The Facts of Life
Просмотров 1,3 тыс.8 месяцев назад
If you develop Python programs, you need to use Virtual Environments! In this video, I'll explain what they are, how to use virtual them and The Facts of Life. Support me on Patreon www.patreon.com/bePatron?u=63260756 Video Slides github.com/bcafferky/shared/blob/master/PythonVirtualEnvironments/PythonVirtualEnvs.zip
How and When to Use Databricks Identity Column
Просмотров 2 тыс.8 месяцев назад
Databricks added support for Identity Columns similar to the same feature found in relational databases. How do you use it? Should you use it? How does it differ from Identity columns on relational databases? Before you use the Identity Column feature, you need to watch this video. Support me on Patreon www.patreon.com/bePatron?u=63260756 Video Slides github.com/bcafferky/shared/blob/master/Dat...
How to Create Databricks Workflows (new features explained)
Просмотров 11 тыс.9 месяцев назад
How to Create Databricks Workflows (new features explained)
Introduction to my Online Guide to my YouTube Videos
Просмотров 7099 месяцев назад
Introduction to my Online Guide to my RUclips Videos
Should You Use Databricks Delta Live Tables?
Просмотров 5 тыс.9 месяцев назад
Should You Use Databricks Delta Live Tables?
Scale Up Your Databricks Coding with Databricks AI Assistant
Просмотров 2,3 тыс.10 месяцев назад
Scale Up Your Databricks Coding with Databricks AI Assistant
Core Databricks: Understand the Hive Metastore
Просмотров 13 тыс.10 месяцев назад
Core Databricks: Understand the Hive Metastore
Python Pro! Understand Variable Scopes
Просмотров 73210 месяцев назад
Python Pro! Understand Variable Scopes
Creating Decorators on Steroids: Adding Custom Parameters
Просмотров 55311 месяцев назад
Creating Decorators on Steroids: Adding Custom Parameters
Python for Data Engineers: Using Function Decorators
Просмотров 1,8 тыс.Год назад
Python for Data Engineers: Using Function Decorators
Advanced Python Programming: Using Functions as First Class Objects
Просмотров 2,3 тыс.Год назад
Advanced Python Programming: Using Functions as First Class Objects
How to Build a Delta Live Table Pipeline in Python
Просмотров 14 тыс.Год назад
How to Build a Delta Live Table Pipeline in Python
Understanding Delta File Logs Part 3 - The Deep Dive
Просмотров 1,8 тыс.Год назад
Understanding Delta File Logs Part 3 - The Deep Dive
Understanding Delta File Logs Part 2 - Demonstrating Transactions
Просмотров 3 тыс.Год назад
Understanding Delta File Logs Part 2 - Demonstrating Transactions
Understanding Delta File Logs - The Heart of the Delta Lake
Просмотров 7 тыс.Год назад
Understanding Delta File Logs - The Heart of the Delta Lake
Understanding Delta Lake - The Heart of the Data Lakehouse
Просмотров 6 тыс.Год назад
Understanding Delta Lake - The Heart of the Data Lakehouse
Great video. Many thanks for sharing these thoughts. I like that you emphasized familiarity and mastery, and that you need to make a decision on where to dedicate your time to master.
Honestly at this day and age, the strongest weakpoints i see in people starting in the field is not that they lack tools like programming languages knowledge in platforms like databricks. If they don't know them yet they are mostly quick to learn. If there is an actual issue, it lies way deeper. Lack of understanding in core concepts like Data normalization, dimension and fact tables, measures in multidimensional Data models. Or being able to derive architecture and data requirements from talking to a customer. Those are difficult hurdles for beginners. I feel like people rush to learn tools, before learning what to do with them. My analgoy is someone who mastered the tools of carpentry, like saw and hammer but still jas no good idea how to use them to build a good chair^^
From the bottom of my heart, I just want to say thank you so much for this, I work as a data engineer but started off as a web developer, I have never really known how to actually organise the work of a data engineer and you just helped me with that. Now i know what and where exactly to focus on. Thanks once again.
So glad this video was helpful. Thanks for your comment.
Your content is very appreciated!! So practical and direct to the point! Any recommendation for Data Architect (or what they sometimes call Data cloud and analytics Architect), any suggestions for such career path in terms of the core knowledge etc? Thanks
Ideally, a Data Engineer would progress to being a Data Architect but DAs need to think at a broader level and consider the implications of their design and architectural decisions. I have found not all data engineers make good architects b/c they are too in the weeds and can't see the big picture. Basic tech skills for the DA includes everything of the DE plus broader knowledge esp. in the orange and blue band that I put for the DE. See this video for more information: ruclips.net/video/cI2dYnM5Kzo/видео.html Thanks, Bryan
@@BryanCafferky yes, of course I have watched this vid! So nice! However I don’t know if coming from a data science background makes things a bit different. I just don’t know what makes a great architect!
Thanks
It's been 6 years since this video taken, interface almost completely changed. Can you make a new demo please?
Actually, that's not correct. Purview is a replacement for Azure Data Catalog. However, ADC has been retired I can see from this blog. learn.microsoft.com/en-us/azure/data-catalog/overview That's the thing. MS promotes something as the best thing ever then quietly drops it. 😞
Awesome series.
Such a great channel, thank you Bryan.
Thank you so much
Hey Bryan, do you plan to create something about Unity Catalog?
Thanks, you're really good at explainin these topics!
Underrated channel, really quality information.
great explanation. Thanks!
I made a mistake in an interview today and confused the star schema with the 3 Normal Forms. I also stated star schema was normalization when it was denormalized...oh well.
your video has decluttered me a lot. Now am going to make a hdfs on my k8s cluster and spark operator
Your videos are descriptive , but crisp too .. To the point .. I have never seen any other tutor who explain big data concepts so well in a practical way .. Too good you are .. Love from India 💌 I wish i found your channel in my early days of my career
Love the content, always very clear
Thank You!
Hey Bryan, the SQL with group by negates the need for the distinct in the SELECT unless Spark SQL is different to ANSI SQL? Thanks for your series.
Thanks Bryan sorry another question when a table is created does it lock the file so it cannot be deleted from the file system?
In the case of this video topic, No. Because you are only creating a schema definition on top a file, i.e., schema on read. Mind you, the file system is Azure Data Lake Storage which is like a drive do it does not lock up. However, if you create a Delta table (not discussed here b/c it was very new and not in GA at the time of this video), that would create a new parquet file and related logs and these should be locked until the process is complete. Make sense?
Is this series contains how to work with java jars in databricks?
Amazing high level channel I recommend it for any young learner, keep going bryan
And I'm here for alllllllllll of it. ❤
Your laid-back method and complete explanations are a very refreshing method of learning. Thank you looking forward to watching more of your videos.
Thank you!
I also love your way of presentation!! (Of course plus the wonderful content) and I don’t find 30 hours of prep to be too much given the high quality you deliver!
Thank so much!
Is the context of the database and tables only for querying there is no DML?
Initially, Spark was only able to query data. It was never intended to be a database. So Spark SQL is originally just a query (SELECT) language. However, Databricks added full DML to it which required creating a storage format that supported Create, Read, Update, Delete) CRUD. To distinguish this from ordinary read operations, they called the new database like functionality and storage format called Delta Tables, the Data Lakehouse b/c it is a data warehouse on a data lake. Data Lakehouse has only been around for a few years and simulates a database in many ways but it is implemented very differently b/c it uses parquet files and a snapshotting approach to group current parquet files that form the current table snapshot together. Delta tables are more like Source Code Control in that each table version is a collection of files. See my playlist on this for a full explanation. ruclips.net/video/Muyq3qtHzzo/видео.html
@@BryanCafferky Coming from a database background it is understanding how this joins together and the use cases for it, thanks for the explanation.
I so agree with nailing things down before moving on, but due to over zealous project managers and scrum masters this rarely happens in my experience, grrr! They push to move things along due to time and cost but as you rightly pointed out that it costs so much more to have to change things later on. I could moan for hours on this subject!
Thanks Bryan, you teach the subject in an easy to understand manner.
Thank You.
Thanks Bryan for providing all of this for free. I am a seasoned dev 28 years in industry started out writing data feeds to Oracle data warehouses on Unix boxes with shell, SQL, PLSQL. Then after a few years moved to Business Objects Data Services an Integration tool and then for the last 10 years using Talend another Integration tool. Now out of work next month and in my fifties I realise I need to retrain to a new technology. Data Bricks and Apache Spark seem to be very popular, but I for building data integrations is wise to learn Data Fabric with Data Factory or is there some other tech used? I have done a small amount of pulling and pushing to Cloud services using API calls within Talend but I do not have strong Cloud skills or knowledge. Azure seems like a logical choice.
Hi Chris, Not to be pedantic but Databricks is all one word as I just spelled. Need to get the spelling right. Its a common mistake. Fabric is brand new and its future uncertain despite the marketing hype. Generally, I see Fabric as a service for Power BI so if you are a Power BI dev, makes sense to learn Fabric. I don't see Fabric replacing Databricks or Snowflake, the 2 largest Big Data services. Azure is great but AWS is still the largest public cloud. However, Databricks seems to be more popular on Azure. If you do ETL on Azure, ADF is good to learn but not always needed. Databricks workflows are powerful and can handle most ETL jobs. Thanks for your comment.
@@BryanCafferky Thanks for pointing that out if I cannot get the spelling right how can I get a job using it?😀 It is difficult to know what technology to invest the time into, a lot of my career has been data migrations not so much ETL into DWH. Hence ETL/integration tools, there are so many offering's now I will follow the Microsoft route possibly aim to get some sort of certification. I have seen a lot of roles that are asking for Snowflake as well in the UK where I am based. Thanks again for a great series of videos!
Dude you are on the money!! Agree all 100%.
Thanks Bryan, other than Flask I haven't tried these myself yet. Flet looks interesting too. I would have chosen "PyFlutter" as a name. Any thoughts on Gradio, which I think is similar to say Streamlit but lets you deploy to huggingface? There's so many different options these days and its hard to choose, your video helps narrow down the options.
Really useful
Thank you for your video!
You're welcome!
Bryan, thanks for another great video! Your series on python and sqlite helped me move my data career forward. I want to ask your opinion on Kotlin vs Flutter as a cross platform development language especially after Google established it as their preferred development language. Thanks!
Not sure about Google adopting Kotlin. Since FlutterFlow, they seem to be pushing Flutter which is their language. See flutter.dev/events Kotlin is also Yet Another Language so not loving that. Kotlin still has a low adoption according to the TIOBE indexwww.tiobe.com/tiobe-index/ but so does Flutter. Did you watch the video? I discuss Python options which may be a good choice.
Ty guy. Your post cleared up some of my inquiries!
at last somebody is clearing the confusion, Good job Bryan
Can you help how we can create the drop down for task parameters in worflow
You use widgets. Doc here learn.microsoft.com/en-us/azure/databricks/notebooks/widgets
I love your way of explaining, I watch each of your videos several times. This also allows me to improve my English.
how to retrieve specific values from delta log after reading json unable grab values
Hi sir , does python and mySQl together a good combo and can land you a job ?
Yes. It's a great combo and popular so good skills to have.
Zank you sir for zis tutorial. It is most very velcome.
Good one
Great Lesson! Thank you Bryan!
Thank you sir
Thanks!!
Thanks for the video about changes since Databricks series, Brayan. It's service to the community. I am very pleased, you being quite helpful to the new people in field including me. Your explanation as always is to the point covering all backgrounds of people of CS. Only Channel I subscribed is this one, I used to watch from incognito, But I had to come watch from my account, add to playlist, subscribe, and my subscription does not validate anything, but want to tell you, we cherish your effort wholeheartedly.
Thank you so much! Glad it is helpful. Glad my videos are helpful. New people in the field and people crossing over from other related fields are very much in my thoughts when I do my videos.
5:54, better comedian than half the comedians in the world
Sir, I just want to say thank you so much, I've gone through many videos but was still confused, u made this crystal clear with all your conceptual approach.
Thank you for kind words. I'm so glad my videos are helping you. That's why I do them. I know this technology is not easy to learn so kudos to you for sticking with it.
Very useful video
recently finished the course, thanks a lot Bryan. Out of curiosity any thoughts on recent acquisition of tabular? Will people switch to Iceberg?
I tried to do data bricks academy and I got lost. Thanks to channel, I understand every nook and crannies. Thumbs up Brian!!
Thank you! Glad my videos are helping you.