Skip to the content.


Introduction

This is a very complete, hand-on, medium level specialization based on cloud computing. Goes from the basics to some advanced and very helpful examples. The examples show you the 3 main cloud vendors platform: AWS, Azure and GCP.

The Roadmap

Final Project: ML Image Classification

Live App

Run locally the app:



Table of Contents

  1. 01: Cloud Computing Foundations
  2. 02: Cloud Virtualisation, Containers and APIs
  3. 03: Cloud Data Engineering
  4. 04: Cloud Machine Learning Engineering and MLOps

Course 01: Cloud Computing Foundations

Technical discussions

The concept of Technical discussion: use tools or capabilities in the same place where the project lives to communicate, for example, in a Git Repo like GitHub, using markdown file helps to detail a step-by-step process, with code snipped, diagrams, images, and so forth. AKA: Technical notes.

Tools for tech notes: Gist, Code Snippets.

Critical Thinking: key points:

Effective Technical Triple Threat

Effective Technical Teamwork

  1. Clear, elevating goal. Something that motivate.
  2. Result-driven structure.
  3. Competent team-members
  4. Unified commitment (everyone on the same page)
  5. Collaborative climate (share, colaborate, learn)
  6. Standard of excellence (what is good?)
  7. External support and recognition
  8. Principled leadership (the leader need character)

Talent is not the most important factor for an effective team. It is important to look at all of the factors, not just talent. Focus on hiring people who have the right character and who will work well with others.

Technical Project Management

The key takeaway is that effective technical project management is about making small, incremental changes that lead to a predictable outcome.

goals by week

Project Management Anti-Patterns

  1. Hero Driven Development:
    • Reliance on a hero who constantly saves the day.
    • Working nights and weekends, leading to burnout.
    • Indicates a lack of rigor in the development process.
  2. Crisis-Driven Development:
    • Relying on crises to drive development.
    • Continuous firefighting and fixing mistakes.
    • Results in a chaotic and unsustainable work environment.
  3. HIPPO Driven Development:
    • Decision-making driven by the “Highest-Paid Person’s Opinion” (HIPPO).
    • Random changes based on executive input disrupt development.
    • Advocates for a more structured and pre-planned decision-making process.
  4. Heavy Scrum:
    • Overreliance on mimicking successful processes.
    • Mimicking without understanding the underlying principles.
    • Emphasizes the importance of a lighter, more effective process.
  5. Faith in People vs. Process:
    • Emphasizes the need for both faith in people and an effective process.
    • Caution against blind trust without incremental progress.
    • Advocates for a process where results are demonstrated incrementally.

The overall message encourages a balanced and structured approach to project management, avoiding extreme reliance on individuals, crisis-driven approaches, and overly complex processes. Incremental progress and a well-defined process are highlighted as essential components of successful software development.

Introduction to AWS Cloud Development

Introduction to Continuous Integration

Continuous Integration is a way of ensuring that your software is always in a known state, and it saves you time. It is a form of automated testing and safety mechanism that ensures that your software is working or not. It is similar to the safety mechanisms like smoke alarms, seat belts, and drug testing, which save lives.

Once you understand the concepts of Continuous Integration, you can develop software much more quickly. It is a primary DevOps best practice that allows developers to frequently merge code changes into a central repository where builds and tests then run.

Automated tools are used to assert the new code’s correctness before integration. A source code version control system is the crux of the Continuous Integration process. The version control system is also supplemented with other checks like automated code quality tests, syntax style review tools, and more.

Project Scaffold example in Python

  1. Creation of a basic scaffold
  2. Configure a GitHub repository
  3. Add GitHub Actions for CI

Repo: https://github.com/matiaspakua/python-scaffold

Introduction to Testing

  1. Concepts of Testing • Testing is a critical tool in ensuring the functionality of software systems. • It can help in identifying and resolving problems, such as those in a film company.

  2. Testing Strategies • Replicating production and running a simulation can help in resolving problems. • Instrumentation and monitoring, such as logging and dashboards, can help in verifying and addressing critical failure points. • The combination of these steps is critical for identifying and addressing critical failure points.

  3. The Importance of Testing Strategy • Testing should be selectively chosen to solve problems. • Automating testing is key to a successful testing strategy. • Over-reliance on testing techniques can lead to project halt due to excessive testing. • A balanced approach, balancing extremes and just-righting, is the best way to handle testing.

Introduction to Continuous Delivery

  1. Understanding Continuous Delivery • Continuous delivery refers to the continuous deployment of code in a deployable state, including the application software and infrastructure needed to run the code. • It’s a modern best practice for code that needs to be deployed in the cloud, where everything is virtualized.

  2. Infrastructure as Code • Infrastructure as code allows for the automation and creation of infinite new environments. • The build server listens to the source control repository, undergoing a series of actions including a test phase, lint phase, and load testing. • The infrastructure’s code checks the infrastructure and ensures it’s set up properly before deploying the code.

  3. User Experience • The user checks their code into a source control repository, typically the master branch in GitHub. • The build server links the code, tests it, and deploys it by checking the branch the job is assigned to listen to. • The infrastructure as code, such as Terraform or cloud formation, allows for dynamic updates or creation of new environments.

  4. Breakdown of Environments • Each branch in the source control can automatically create a parallel environment. • The code can be pushed into a development branch, then merged into the staging branch for testing and deployment. • The code can then undergo extensive load testing before being merged to production.

Cloud Computing Introduction

  1. Near-Infinite Computing • Cloud computing offers near-infinite storage, compute, and CPU storage, making it powerful. • It can handle more traffic than a physical data center can handle.

  2. Elimination of Upfront Costs • Cloud computing eliminates the need for global-scale infrastructure, allowing startups to leverage available resources. • This eliminates the need for upfront costs, enabling creation of many modern applications.

  3. Use of Resources • Efficient use of resources can reduce costs, treating them more like utilities. • Companies that use cloud computing efficiently treat resources like utilities.

  4. Comparative Advantage • Comparative advantage refers to focusing on what you’re best at, allowing you to focus on what you do best. • This concept is similar to Michael Jordan’s decision to focus on training instead of training. • Cloud computing eliminates the need for physical data centers and hardware installation, allowing companies to focus on their core business.

Cloud Computing Service Models

  1. Software as a Service: • Examples include Gmail, Splunk, and data dog. • These services eliminate the need for hosting a dedicated web server.

  1. Platform as a Service: • Platform as a service abstracts away the infrastructure, allowing developers to focus on application development. • Examples include Heroku, Google’s GAE, and Amazon’s Beanstalk.

  1. Infrastructure as a Service: • Extensive offerings like Amazon’s EC2 allow for bulk rental of virtual machines at low costs. • This service requires the software engineer to spin up and set up the networking layer, but offers significant cost savings.

  1. Metal as a Service: • Provides the ability to spin up and provision machines yourself. • This service is suited for virtualization and can be used to control physical hardware like GPUs.

  2. Serverless: • Similar to Platform as a Service, Serverless FaaS or function as a service.

Economics of cloud computing

Introduction to DevOps

  1. DevOps: The union of people, process, and products to enable continuous delivery of value to end users. It involves essential practices such as agile planning, continuous integration, continuous delivery, and monitoring of applications.
  2. Cycle Time: The time it takes to complete one cycle of the OODA loop, which consists of observation, orientation, decision, and action. It determines how quickly a team can gather feedback and learn from their deployments.
  3. Validated Learning: The feedback that a team gathers with each cycle, based on real, actionable data. It helps the team to pivot or persevere, and to optimize their value delivery.
  4. Release Pipeline: The process of deploying a change of code or configuration to the production environment. It should be automated, hardened, and fast to shorten the cycle time and enable frequent deployments.

Benefits of DevOps:

DevOps Best Practices

Infrastructure as Code (IaC)

  1. Infrastructure as Code (IaC): A DevOps practice that manages infrastructure in a descriptive model, using the same versioning as source code.
  2. Benefits of IaC: IaC solves the problem of environment drift, enables consistent and repeatable deployments, and supports testing in production-like environments.
  3. Idempotence: A principle of IaC that ensures a deployment command always sets the target environment into the same configuration, regardless of the starting state.

Example of a Terraform scripts

# Configure the AWS Provider
provider "aws" {
  region = "us-west-2"
}

# Create an AWS instance
resource "aws_instance" "example" {
  ami           = "ami-0c94855ba95c574c8"
  instance_type = "t2.micro"

  tags = {
    Name = "example-instance"
  }
}

Introduction to Continuous Pipelines


Course 02: Cloud Virtualisation, Containers and APIs

Introduction

Virtual Machines

Containers vs VMs

Reference: Containers vs VMs (redhat.com)

How Do Spot Instances Work?

Containers

When to use containers?

  1. Cloud Native Environment: Containers are excellent for building a cloud native environment due to advancements in managed Container Services and Kubernetes services.
  2. Microservices: Containers work well with the microservice workflow, where one service does one thing. They allow you to build something that’s reproducible.
  3. DevOps: Containers fit well into DevOps workflows. They allow you to programmatically build the Container and the source code, and deploy them using infrastructure as code.
  4. Job Management: Containers are useful in job management, especially when building and reproducing jobs repeatedly.
  5. Portability and Usability: Containers offer portability, which is particularly beneficial in DevOps and Data Sciences. They allow the runtime to be included with your project, making it completely reproducible. This is a key tenant of science.

Docker

Docker is a product composed of Docker Desktop and Docker Hub. Docker Desktop is an application installed on your computer for local development. It includes a container runtime, developer tools, and a GUI.

It can interface with Kubernetes to launch and control clusters. Docker Hub allows you to check things into a public or private repository, automate container builds via GitHub, and pull and use certified images.

Docker Desktop is more of a development environment, while Docker Hub is a collaborative environment. When using Docker, you can leverage the knowledge of core developers by pulling base images for your projects.

Using DockerHub and Docker locally example:

Container Registry

A container registry is a repository of container images. It is used to store and access these images for cloud-native applications³. Container registries can be public or private, and they play a crucial role in the deployment and scaling of applications that use microservices³.

Microsoft Azure, for example, offers the Azure Container Registry, a fully managed, geo-replicated service that supports Docker and OCI images and artifacts¹. It provides features such as security, compliance, scalability, and automation for building, storing, and deploying container images and Helm charts¹. It also supports Azure Container Registry Tasks, a suite of services to build, manage, and patch container images².

In summary, a container registry is an essential tool for managing the lifecycle of containers, from development to deployment¹²³.

(1) What is a container registry? - Red Hat. https://www.redhat.com/en/topics/cloud-native-apps/what-is-a-container-registry. (2) Azure Container Registry | Microsoft Azure. https://azure.microsoft.com/en-us/products/container-registry/. (3) Azure Container Registry documentation | Microsoft Learn. https://learn.microsoft.com/en-us/azure/container-registry/.

Introduction to Kubernetes

Autoscaling Kubernetes

One of the “killer” features of Kubernetes is the ability to set up auto-scaling via the Horizontal Pod Autoscaler. How does this work? The Kubernetes HPA (Horizontal Pod Autoscaler) will automatically scale the number of pods (remember they can contain multiple containers) in a replication controller, deployment or replica set. Thee scaling uses CPU utilization, memory, or custom metrics defined in the Kubernetes Metrics Server.

Introduction to Microservices

Microservices characterictics

Where to Run Microservices

Operationalizing microservices

A critical factor in developing a microservice is to think about the feedbaࢤ loop. In this diagram, a GitOps style workflow implements. •

Metrics can be deployed and viewed in Prometheus: [Advance Your Spring Development Skills Tech-Notes (matiaspakua.github.io)](https://matiaspakua.github.io/tech.notes.io/pages/development/advance_your_spring_development_skills.html#02)

Five-Why’s and Kaizen

One way our troubled company could have swapped Voodoo for a sane alert process was to use the Five Why’s method. In a nutshell, it originated from Kaizen, a process of continuous improvement, from the Japanese Automobile industry post-WWII. The Five Why’s strategy is to keep asking questions until the root cause appears.

Learn about the Five Whys in the following screencast. Video Link Learn about Continuous Improvement in the following screencast

Introduction to Flask

Official Documentation: Welcome to Flask — Flask Documentation

This lesson focuses on Flask, a popular lightweight web framework in Python. It’s widely used for building microservices and mapping Python code to web URL routes.

APIs, which define interactions between services, and JSON (JavaScript Object Notation), a common data-interchange format used with APIs. When building a microservice, the core components are a Flask application, an API, and JSON. These elements work together to form a microservice.

Introduction to Serverless Microservices

Serverless Architecture is an application delivery model where cloud providers automatically intercept user requests and computing events to dynamically allocate and scale compute resources. This allows you to run applications without having to provision, configure, manage, or maintain server infrastructure.

A Serverless REST API

Assuming AWS as the cloud vendor, a Serverless REST API consists of three main components:

  1. API Gateway: This is responsible for receiving HTTP requests. It acts as the entry point for the client to interact with the serverless application.

  2. Lambda Functions: These are the functions that are triggered by the API Gateway. They receive these requests and execute upon them. The code for these functions is written by developers and can be in any language supported by AWS Lambda.

  3. DynamoDB: This is a NoSQL database service provided by AWS, which is used to store and retrieve data. The Lambda functions interact with DynamoDB to fetch or store data as per the request.

In this architecture, when a client sends an HTTP request, the API Gateway receives it and triggers the corresponding Lambda function. The Lambda function then processes the request, interacts with DynamoDB if necessary, and sends the response back to the client via the API Gateway.

AWS Step function

AWS S3 trigger

AWS Serverless Application Model (SAM)

The AWS Serverless Application Model (AWS SAM) is a toolkit designed to enhance the developer experience of building and running serverless applications on AWS. It consists of two main components:

In essence, AWS SAM helps you manage your serverless application through the authoring, building, deploying, testing, and monitoring phases of your development lifecycle. It also enables you to define permissions between your AWS resources and automatically sync local changes to the cloud, speeding up your development and cloud testing workflows. It’s best utilized when used with AWS SAM and AWS CloudFormation templates, but it also works with third-party products such as Terraform.

Event-Driven vs. Polling

A key advantage of serverless programming is the ability to write code that reacts to events, rather than continuously checking for results.

Event-Driven Programming: This is a programming paradigm where the flow of the program is determined by events such as user actions, sensor outputs, or messages from other programs. In serverless architectures, functions are triggered by events.

Polling: This is a coding technique where your program continually checks for conditions to be met. It’s like repeatedly asking, “Is the data ready yet?” until the answer is yes. While polling can be simple to implement, it can lead to inefficiencies.

The key difference between the two lies in how they handle waiting for something to happen.

In the context of serverless programming, event-driven architectures are often preferred because they allow the system to be more responsive and efficient. Instead of continuously checking for changes (polling), the system can sit idle and react when an event occurs.

Introduction to Monitoring and Alerts

Load Testing

Tool: Locust - A modern load testing framework

An excelent tools for monitoring is prometheus:

Introduction to Kaizen

What are the ‘Five Whys’?

This method emphasizes the importance of understanding the root cause of a problem for effective problem-solving and continuous improvement.

Reference: Five whys - Wikipedia


Course 03: Cloud Data Engineering

The Problem with Concurrency in Python

Introduction to the End of Moore’s Law

Moore’s Law is an observation made by Gordon Moore, co-founder of Intel, in 1965. He noticed that the number of transistors that could fit on a microchip was doubling approximately every two years². This led to an increase in computing power and a decrease in cost over time².

However, experts predict that Moore's Law will end sometime in the 2020s. This is because as transistors become smaller, they will reach a physical limit where they can’t be made any smaller. Additionally, as circuits get smaller, they generate more heat, which can cause them to fail.

The end of Moore’s Law doesn’t mean the end of technological progress. Instead, it signifies a shift in focus from miniaturization of existing technologies to the introduction of new devices, integration technologies, and computing architectures. For example, the industry is moving towards Application Specific Integrated Circuits (ASICs) like GPUs and Tensor Processing Units (TPUs), which are designed for specific tasks and offer high levels of parallel processing.

So, the end of Moore's Law is not a dead-end, but rather a new beginning for information technology.

CUDA

ASIC

Comparison

Introduction to Distributed Systems

Instrumentation

The key aspects of building logging instrumentation for a distributed system to monitor and debug the system’s behavior.

CAP Theorem & Amdahl’s Law

Elasticity

Highlights the system’s ability to adapt to varying loads by spinning up more resources like Virtual Machines or networking.

High Availability

Explains the concept of a highly available architecture, which can respond to requests despite increased traffic.

The concept of High Availability in cloud computing, often referred to as “Nine Nines”. Here are the key points:

Python Debugging

Techniques for debugging Python code, a critical component in modern Data Systems.

Introduction to Big Data

The three V’s of Big Data

Web Reference: Chapter05 Cloud Storage Pragmatic AI Labs and Solutions

Data Lakes

Big Data Processing

Feedback data loop

The Challenges of Big-Data engineering

Different modes of transportation solve various problems. A car can get you into the city at your convenience but only transports a few people. A train or metro transports thousands but has a specific schedule. A bike can go 50 miles in a day but typically only carries one person and is slower than a car. A tractor can easily dig a trench in an hour to take a team of people a week, but it isn’t ideal for a commute to work.

Big Data Platforms are similar in that they can move data around, just like a tractor can move dirt around. So what are five limitations to Big Data Platforms?

  1. Specialized Skills: Big Data platforms often require specialized skills and knowledge to use efficiently. They may be secured with firewalls or private clouds, making it difficult to transfer and use data between multiple teams.
  2. Complexity and Cost: Working with big data can be complex and expensive. It requires investments in storage solutions, analytics tools, and cybersecurity and governance programs.
  3. Data Overload: A large amount of data can be difficult to wade through and is liable to produce flukes that are too difficult to detect. This can make it challenging to yield clear correlations and answers needed to make informed decisions.
  4. Privacy and Security: Big data platforms can potentially expose sensitive information, such as company data that competitors could use, financial data that could give hackers access to accounts, or personal user information that could be used for identity theft.
  5. Ethical Challenges: The misuse of big data can lead to data workflows bypassing the intent of privacy and data protection law, as well as ethical mandates.

Introduction to Data Engineering

Data engineering is the practice of building pipelines that transport or transform data periodically. It involves:

Batch vs. Streaming vs. Events

In Data Engineering, there are three key paradigms: Batch Data, Streaming Data, and Events.

Events engineering: example

Events are a powerful concept in Data Engineering because they don’t consume resources until necessary. Here is an exercise:

Answers

What is Hadoop: The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

  1. Replacing a traditional Hadoop system with a serverless data engineering system: A serverless data engineering system can be implemented using cloud-based services like AWS Lambda, Google Cloud Functions, or Azure Functions. Here’s a high-level overview of how it could work:

    • Data Ingestion: Use event-driven services (like AWS S3 events or Google Cloud Storage triggers) to trigger a function whenever new data is added. This function could perform initial data validation and transformation.
    • Data Processing: Use cloud functions to process the data. These functions can be triggered by the successful completion of the ingestion functions. The processing might involve complex computations, aggregations, or machine learning model predictions.
    • Data Storage: Store the processed data in a suitable storage service like AWS S3, Google Cloud Storage, or Azure Blob Storage.
    • Data Analysis: Use services like AWS Athena, Google BigQuery, or Azure Data Lake Analytics to run SQL-like queries on the processed data.
  2. Strengths and weaknesses compared to Hadoop:

    • Strengths:
      • Scalability: Serverless architectures can scale automatically based on the workload, which is a significant advantage over Hadoop clusters that have a fixed number of nodes.
      • Cost: With serverless, you only pay for the compute time you consume. There is no charge when your code is not running.
      • Maintenance: Serverless architectures eliminate the need for system maintenance, as the cloud provider manages the servers.
    • Weaknesses:
      • Cold Start: Serverless functions can experience a “cold start” (i.e., a delay) if they haven’t been used recently, which could impact performance.
      • Long-Running Tasks: Serverless functions are typically designed for short-lived tasks. Long-running tasks can be more challenging to implement in a serverless architecture.
      • Data Locality: Hadoop takes advantage of data locality by moving computation close to where the data resides in the cluster, which can be more efficient for certain types of large-scale data processing tasks. This is not the case with serverless.

The choice between Hadoop and a serverless architecture depends on the specific requirements of your data engineering tasks. It’s essential to consider factors like the volume of data, the complexity of the processing tasks, cost, and the required latency of the analytics results.

CLI and Containerized CLI’s

A containerized Command-Line Interface (CLI) can indeed add significant value to a CLI.

These benefits make containerized CLIs a powerful tool in modern software development and operations. However, it’s important to note that like any technology, containers are not a silver bullet and should be used judiciously based on the requirements of the project.

Kaizen + CI/CD

Mapping Functions to CLI

The versatility of functions in Python:

Example repo from Noah Gift: noahgift/python-data-engineering-cookbook: Some recipes for data engineering with Python

Serverless Data Engineering

Introduction

Serverless computing is a cloud computing model where the cloud provider manages the servers. Developers focus on writing the application’s business logic without worrying about the underlying infrastructure, its maintenance, or scaling. AWS Lambda and AWS Athena are examples of serverless services. AWS Lambda allows you to write functions in supported languages and AWS manages running and scaling your function. AWS Athena lets you analyze data in Amazon S3 using standard SQL without managing any servers. The key aspect of serverless computing is the abstraction of servers, leading to quicker development times and lower costs.

Serverless Concepts: Service Model

The three main service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Serverless.

Serverless Concepts: Functions

The serverless computing and the power of functions. Here are the key points:

Serverless Concepts: Ecosystem

Serverless cookbook

The evolution of cloud computing, from monolithic architectures to the modern era of DevOps, serverless, and microservices. The key points:

The shift from monolithic to microservices and serverless architectures represents a significant advancement in cloud computing. It allows for more efficient use of resources, easier debugging, improved security, and overall better maintainability.

Data Governance

Data governance is a crucial aspect of cloud security and is vital to a company’s health. It involves determining who should have access to data, considering worst-case scenarios, and ensuring data is encrypted both at rest and in transit. A breach in data security, especially for companies with large user bases, could pose an existential threat. Many organizations have a dedicated data governance officer to handle these responsibilities.

The Principle of Least Privilege

The principle of least privilege is a key security goal. It suggests that individuals should only have access to the resources they need, limiting potential security risks. This concept applies to various scenarios, from mail delivery to cloud computing. By granting access only to necessary resources, we protect all parties involved and prevent the creation of large security holes. This approach is one of the most effective ways to set up security.

Cloud Security with IAM on AWS

AWS Identity and Access Management (IAM) is a web service that helps you securely control access to AWS resources. It allows you to manage permissions centrally, controlling which AWS resources users can access.

Key features of IAM include:

IAM uses the principle of least privilege, meaning users should only have access to the resources they need. This limits potential security risks. It’s important to note that IAM users, groups, and roles are concerned with authentication, while IAM policies deal with authorization.

Wikipedia: Identity management - Wikipedia

AWS Shared Security Model

AWS cloud security operations are crucial but often overlooked. The shared security model is a partnership between AWS and the customer to secure applications. Customers are responsible for their data and operating systems, while AWS handles foundational services like compute storage, infrastructure, regions, and availability zones. AWS ensures physical security, hardware security, network configuration, and virtualization. This includes controlling physical access to data centers, purchasing secure hardware, and setting up secure network configurations and virtualizations. Examples of physical security measures include need-based access, 24/7 security guards, and two-factor authentication.

AWS Cloud Security Operations

AWS Trusted Advisor is a system that provides checks for cost optimization, performance, security, fault tolerance, and service limits. It can identify potential savings, performance improvements, security risks, and architectural issues. It also monitors service usage to prevent exceeding limits.

AWS CloudTrail is a powerful, often underutilized service that monitors system activity. It’s the first place to check if a security breach is suspected. It provides an audit trail detailing what’s happening, who’s involved, when it occurred, and includes usernames and timestamps.

Encrypt at Rest and Transit

Encryption is a critical aspect of data security and involves three key concepts:

These concepts are fundamental to encryption and are considered industry best practices. They also align with the principle of least privilege, which states that only those who need access to data should have it.

Introduction to Extract, Transform, Load (ETL)

ETL, standing for Extract, Transform, and Load, is a crucial process in data engineering pipelines. It involves extracting data from a source, which may not be in a clean form, transforming it by potentially decompressing it, changing its format, or removing corrupted or missing values, and then loading it into a new system, such as a business intelligence database. ETL is a fundamental concept to consider when building data pipelines, as most data engineering operations involve these processes.

Real-World Problems in ETL: building a social network

Building a social network from scratch involves leveraging influencer marketing to attract users. This process involves identifying potential partners who can send the right signal to your platform. In this case, a list of sports personalities from around the world was obtained from a third-party system and stored in a database. A nightly job was set up in a custom-built jobs framework to run periodically. A Mechanical Turk job was created to clean up the data by asking people worldwide to find the social media handles of these personalities. If the majority of the results agreed on a handle, a full record was created and stored in the database. This cycle of obtaining a rough record and augmenting it with cleaned-up data is a key aspect of the data engineering pipeline.

Cloud Databases

One Size Does Not Fit All in the Cloud?

-Cloud Computing and Database Choice: When building applications in the cloud, such as on AWS, there are many databases to choose from. It’s important to pick the right database for the right task, and not rely on just one database for everything.

Examples are: Google BigQuery, AWS Aurora, DynamoDB, RedShift

Introduction: BigQuery - Wikipedia

Cloud Storage

The concept of economies of scale and how it applies to cloud storage. Some of the benefits of cloud storage are lower cost, higher access, and more possibilities.

Summary: Cloud storage is a service that leverages economies of scale to offer lower cost, higher access, and more possibilities for storing data on remote servers. There are different types of cloud storage, such as object, file, and block storage, depending on the data structure and use case. Some of the popular cloud storage services are Amazon S3, Google Drive, and Microsoft Azure Blob Storage.

More Documentation: Chapter05 Cloud Storage Pragmatic AI Labs and Solutions

Cloud Storage: deep dive

Examples are:

Amazon AWS S3

Amazon S3 is a cloud storage service that offers several capabilities to help support your data resiliency and backup needs. Some of these capabilities are:

In summary, Amazon S3 provides various features to enhance your data resiliency and backup, such as versioning, replication, lifecycle management, object lock, and encryption. These features can help you mitigate data loss, corruption, or breach, and recover from disasters.

Official Web: Amazon S3


Course 04: Cloud Machine Learning Engineering and MLOps

Machine Learning Architecture

A typical machine learning architecture consists of a microservice with an API for handling requests. This service receives serialized data via a JSON payload and makes predictions. The microservice is composed of three main components:

  1. A web application code (e.g., a Python Flask app).
  2. A pre-built model that’s included in the project.
  3. A container technology (either serverless or Docker).

The model is deployed through continuous delivery of data and the application, which are checked into source control. The data comes from a data lake infrastructure that allows for the detection of data drift. If the data changes by more than 25% from the last model build, it triggers a rebuild and redeployment of the model.

This dynamic feedback loop is a crucial component of the system. Lastly, a production system includes a monitoring dashboard and health alerts to assess the service’s success. This dashboard monitors everything from response time latency to the accuracy of the model’s predictions. This is a non-optional component in a production environment. In essence, these are the key elements needed to build a machine learning system.

Edge Machine Learning

Edge-based machine learning is the process of running machine learning models on an edge device to collect, process, and recognize patterns within datasets. The device itself has everything it needs, and in many cases, the hardware is optimized to be able to do it at low power. Edge-based processing allows you to do things on the device, and there are many examples of edge-based devices such as Coral TPU, AWS DeepLens, DeepRacer, and Intel neural compute stick. These devices can run without needing to talk to the internet, and there are a lot of applications in the future.

Building a ML Microservice

A machine learning microservice is a specific application of the microservices trend in DevOps. It allows for the creation of succinct, specific services that are easier to debug. For instance, a microservice could be designed to predict the height or weight of Major League Baseball players based on input data.

The adoption of microservices for machine learning and DevOps is driven by their compatibility with DevOps principles, such as continuous delivery. This allows for new changes to be deployed into a production-ready environment as they’re made. The application code is also integrated with the infrastructure code.

Microservices, which reject the old practice of building monolithic applications, are driving the field of machine learning engineering. In essence, a machine learning microservice is a specific, manageable piece of a larger system that can be developed, deployed, and debugged independently.

Monolithic vs Microservices

A Monolithic application is a traditional model where all components (authentication, data, logic) are interwoven within a single codebase. This could be as large as 20,000 lines of code. Debugging issues, such as problems with the authentication system, requires going through the entire application, leading to maintainability and complexity issues.

On the other hand, a Microservice is an approach where each component is its own separate, independent service. For instance, there’s a distinct service for authentication, another for data handling, and another for logic. Each service communicates using lightweight API operations and performs a single function, supporting multiple applications.

The key difference is that Microservices are built as independent components that run as services, making it quicker to develop code, automate, and recover from mistakes. This architecture is particularly beneficial for machine learning. It allows for faster starts, easier mistake identification, and quicker recovery.

Introduction to Continuous Delivery for Machine Learning

Continuous delivery for machine learning is the practice of maintaining your software, including the code for machine learning predictions, in a deployable state. This involves several unique considerations:

  1. Data Versioning: You need to know the version of the data your model was trained with.
  2. Data Drift: You need to monitor if the underlying data has changed significantly (drifted) from the version used for training the model.
  3. Model Versioning: If there’s significant data drift, it may necessitate deploying a new version of the model.

The underlying data is a core component in this process, making it crucial to pay attention to these aspects when implementing continuous delivery for machine learning. This approach ensures that your machine learning systems are always up-to-date and ready for deployment.

What is a Data Drift?

Data Drift is a crucial concept in machine learning operations. It allows for the detection and alerting of changes in a new dataset, the analysis of historical data for drift, and the profiling of new data over time.

For instance, consider a dataset of baby weights at birth stored in a data store like Amazon S3. If an observation a year later shows a significant increase in baby weight (e.g., from 10 pounds at birth to 30 pounds), this represents a large data drift. This change, which is three times the initial data, would necessitate training a new model to reflect these changes.

Alerts can be set up to ensure the accuracy of predictions, such as predicting which babies might need hospitalization. If a model trained on the initial data is used, it may not apply to older babies.

Data drift is a vital monitoring mechanism, akin to checking tire pressure or monitoring altitude when flying an airplane, but specifically tailored for machine learning. Regular alerts can help maintain the accuracy and relevance of machine learning models over time.

Example of a CI/CD ML demo app in Flask

In the final project of this specialization, I develop a very simple flask app that uses all the concept previously mention:

Repo: matiaspakua/ml-demo-project: Demostration project for the Specialization Building Cloud Computing Solutions at Scale

Introduction to AutoML

AutoML, or Automated Machine Learning, is a technology that automates many aspects of machine learning. It allows users to upload data (like images), select an option to train a model, and then receive a model that can make predictions (like differentiating between a cat and a dog).

The adoption of AutoML is increasing, with more companies using it to create tools and speed up their processes, potentially by 10, 100, or even 1000 times. One of the key advantages of AutoML is that it enables individuals without advanced computer science degrees to participate in creating machine learning models.

In essence, AutoML is the automation of machine learning.

Example of: AutoML Computer Vision architecture:

An AutoML computer vision pipeline can be used to build production-quality models. The process involves the following steps:

  1. Source Data: You need images for training, such as 1,000 images of birds and 1,000 images of dogs for binary classification.

  2. Upload Data: The images are uploaded into a cloud-based environment like GCP’s AutoML Vision, an open-source solution like Ludwig, or even local software like Apple’s Create ML.

  3. Specify Model: You specify the type of model you want to train, in this case, a classifier to differentiate between a bird and a dog.

  4. Train Model: The model is trained, which could require special resources like a GPU for increased accuracy. The process is touchless, with no need for a data scientist to tweak the neural networks.

  5. Deploy Model: Once trained, the model is ready for deployment. Depending on the framework used, you could download a JavaScript version of it, make predictions directly on a hosted version via an API, or even download it into a mobile app.

The key idea is that AutoML allows for the automation of machine learning, focusing more on the logic of the problem you’re trying to solve. As long as you understand the business logic, you can use AutoML to continuously train and update the model as you acquire more data.

Introduction to No Code/Low Code (YAGNI)

No Code/Low Code is an emerging trend in data science that allows professionals to solve problems without necessarily needing to write code. This approach, encapsulated by the expression “You Ain’t Gonna Need It” (YAGNI), leverages solutions provided by cloud platforms and tech giants:

  1. Apple: Offers low code machine learning tools for computer vision and email.
  2. Google Cloud Platform (GCP): Provides Low Code/No Code tools, including AutoML for computer vision.
  3. Amazon Web Services (AWS): Supports Low Code/No Code solutions, offering a range of capabilities from tabular data (useful for business analytics) to AI APIs for computer vision and Natural Language Processing (NLP).
  4. Azure: Offers numerous APIs, including those for computer vision and AutoML.

These platforms can be used for Exploratory Data Analysis (EDA), modeling, and digestion. They allow business analytics, business intelligence, and data science professionals to leverage existing tools to solve problems, potentially without writing any code. This approach can lead to efficient problem-solving and analysis.

Example documentation in APPLE CREATE ML: Create ML Overview - Machine Learning - Apple Developer

Ludwig AutoML

Official documentation: Ludwig

Ludwig is an open-source, code-free tool that simplifies the process of training machine learning models. It allows users to train models using a configuration file and their data, without needing detailed knowledge of machine learning or coding. Ludwig supports various data types including binary, numerical, categorical, set, bag, and sequence. The training process involves combining a dataset with a YAML file and running a script. Compared to proprietary systems, Ludwig offers advantages such as no need for a costly cloud-based API, the ability to run on a local machine, and the potential for high performance with a GPU. It caters to specific user needs with its open-source auto ML capabilities.

Cloud AutoML

Cloud-based AutoML is a solution that automatically finds the correct values for a machine learning algorithm. It can handle tasks like computer vision, text classification, and translation. The gradient of cloud-based AutoML allows you to start with a managed machine learning platform like SageMaker, which automates many aspects of machine learning, or use a pre-trained model via an API. There are also tools that allow automatic training of, for example, a computer vision problem by uploading files to the cloud-based system and training them with a click of a button. The end result can be downloaded and used in a mobile device or physical hardware, such as a Coral TPU for edge-based machine learning on a laptop or drone, or an Intel video stick. The cloud provides the flexibility to do it yourself, do it automatically, or use a pre-trained model.

Cloud AutoML workflow

An example repository where is user Apple AutoML: noahgift/Apple-CreateML-AutoML-Recipes: Some recipes around Apple CreateML (github.com)

MLOps

Definition

Machine learning + DepOps (Development and operations.)

Machine learning engineering is the field of applying machine learning models to real-world problems and deploying them in production environments. Machine learning engineers use DevOps principles and practices, such as microservices and continuous delivery, to build, test, and maintain machine learning systems. Machine learning engineers work with data, algorithms, and software to create solutions for various domains, such as self-driving cars or wildlife detection.

Edge ML and 5G

The proliferation of edge-based devices, which can do prediction on the physical device and are more powerful than your laptop at specific tasks is in the near future.   These devices can have a camera and hardware that can do the prediction and then go into a cloud environment. One of the interesting things about 5G is that it could potentially get rid of fiber networks. Additionally, these devices could be put in locations where they’re able to also have this fiber speed. It’s opening up a new opportunity. The offline aspects of edge-based machine learning are also interesting, especially in situations like self-driving cars or drones where it’s making life or death decisions. If it can’t talk to the network, that’s a big problem.

5G is the fifth generation of cellular technology, designed to increase speed, reduce latency, and improve flexibility of wireless services. It has a theoretical peak speed of **20 Gbps**, which is **20 times faster** than 4G . 5G also promises lower latency, which can improve the performance of business applications as well as other digital experiences such as online gaming, videoconferencing, and self-driving cars.

5G networks are virtualized and software-driven, and they exploit cloud technologies. The 5G network will also simplify mobility, with seamless open roaming capabilities between cellular and Wi-Fi access. The new Wi-Fi 6 wireless standard (also known as 802.11ax) shares traits with 5G, including improved performance.

5G technology works by using higher radio frequencies that are less cluttered, called 'millimeter waves' (mmwaves). These waves allow for it to carry more information at a much faster rate. 5G New Radio, the global standard for a more capable 5G wireless air interface, will cover spectrums not used in 4G. New antennas will incorporate technology known as massive MIMO (multiple input, multiple output), which enables multiple transmitters and receivers to transfer more data at the same time.

5G networks can create software-defined subnetwork constructs known as network slices. These slices enable network administrators to dictate network functionality based on users and devices. 5G also enhances digital experiences through machine-learning (ML)-enabled automation.

References:

What problems solve Edge ML?

  1. Real-time Decision Making: One of the primary advantages of Edge ML is its ability to operate in low-latency environments. In applications where real-time responses are critical, such as autonomous vehicles or industrial automation, relying on cloud-based machine learning models can introduce significant delays due to data transmission and processing. Edge ML eliminates this bottleneck by processing data locally, enabling instantaneous decision-making.

  2. Privacy and Data Security: Another unique problem that Edge ML solves is the issue of privacy and data security. With traditional machine learning models, sensitive data often needs to be transmitted to remote servers for processing. This raises concerns about data privacy and potential security breaches. Edge ML addresses this challenge by keeping data locally and performing computations on the device itself, reducing the risk of data exposure.

  3. Decentralized Processing: As the demand for real-time and decentralized processing increases, Edge ML brings the power of artificial intelligence (AI) to the edge of the network, without relying on cloud or centralized servers.

Using AI API

The Boto 3 library is a Python-based library that can do many things on AWS.

Documentation: AWS SDK for Python (amazon.com)

You can talk to any service in this case, including the AWS comprehensive service. You can write a script or function in Python that allows you to give it an image and then it can find what is in that image. You can also recognize text and put that into a fully server-less application like AWS Elastic Beanstalk.

The advantage of using an AI API is that you can really focus on the ML component and its engineering. You’re not focused on training a model that will probably do a worse job than the API that you can call from AWS comprehend. This is really a concept called comparative advantage or CA and it says do the things that you’re the best at, let other people that are better at a specific thing do that. It’s the same concept when you’re using these AI APIs as focus on the building of the software and then call out to these other APIs and let them do the heavy lifting.

Core Components of a Cloud Application

Checklist for Building Professional Web Services

References

Repo with example: noahgift/gcp-flask-ml-deploy: This is a project to auto-deploy with an ML payload (github.com)

Web Book: Cloud Computing for Data Pragmatic AI Labs and Solutions