Peer Insights - A Microsoft Fabric Blog

Designing for Automation in Microsoft Fabric

Peer Grønnerup — Sun, 20 Jul 2025 20:14:04 GMT

In the fast-evolving world of enterprise data platforms, automation is not a luxury - it's a necessity. When working with Microsoft Fabric, especially in scalable solutions that span across multiple environments and logical layers (like ingestion, transformation, and serving), it’s essential to embed automation into your design from day one.

A crucial aspect often overlooked is how Microsoft Fabric uniquely identifies artifacts (items) across workspaces and environments. Understanding the difference between item IDs and logical IDs, and how they’re used, is foundational to building robust, automated solutions that can scale and deploy seamlessly across dev, test, and production environments.

What Are Item IDs?

In Microsoft Fabric, an item ID is a globally unique identifier (GUID) that represents a specific item in a workspace - whether it's a notebook, lakehouse, pipeline, or other Fabric artifact. This ID is visible in the URL when navigating to an item, like in this example:

https://app.fabric.microsoft.com/groups/9bcbb7d4-13f7-4bc2-a261-22be96a809dc/pipelines/82e7a8f1-e593-414d-8fab-c9b34a267772

Here, 82e7a8f1-e593-414d-8fab-c9b34a267772 is the item ID for the pipeline, and 9bcbb7d4-13f7-4bc2-a261-22be96a809dc is the workspace ID.

These IDs are also used in Fabric REST APIs, such as the List Items and Get Item operations. They're fundamental to how Fabric tracks and manages content internally.

What Are Logical IDs?

Logical IDs are different, they're Git-related and exist only for items in source-controlled workspaces.

A logical ID is a unique identifier that links a Fabric item in the workspace to its corresponding file and configuration in a Git branch. Think of it as the “anchor” between what lives in Fabric and what’s committed to source control. This makes logical IDs vital in Git-integrated workflows, especially when names or paths change across branches or environments.

You can find the logical ID in the .platform system file that Fabric automatically generates inside the item’s Git directory:

{
  "$schema": "https://developer.microsoft.com/json-schemas/fabric/gitIntegration/platformProperties/2.0.0/schema.json",
  "metadata": {
    "type": "DataPipeline",
    "displayName": "Controller - Full"
  },
  "config": {
    "version": "2.0",
    "logicalId": "bffcdc62-7e33-83b0-4dc9-0f7957777e88"
  }
}

More details: Source Code Format – Microsoft Learn

For items in non source-controlled workspaces, the logicalId will be a blank GUID:

00000000-0000-0000-0000-000000000000

Why Does This Matter?

When promoting artifacts across environments, from development to test to production, you don’t want your production controller pipelines referencing development notebooks or ingestion pipelines. This is where logical IDs shine - they enable Fabric to resolve internal references based on the Git-tracked logical structure, not hardcoded workspace or item IDs.

Handling Feature Branches and Multi-Layered Workspaces

I'm a strong advocate for using feature-isolated development workspaces and separating your Fabric solution into layers such as Storage, Ingest, Prepare and deployed across at least three environments: dev, test (PPE), and prod. This architecture is standard in mature enterprise solutions.

One common question I hear:

“Which workspace should I refer to when invoking other pipelines or notebooks from my controller pipeline?”

Within the Same Workspace

Let’s say you’re referencing another item (pipeline or notebook) within the same workspace. In that case, always reference the version from your feature workspace, not the main dev workspace.

Here’s what happens:

When editing the pipeline in the Fabric UI, you’ll see this reference:

"typeProperties": {
  "notebookId": "0e23e4cb-caf5-41bd-8161-ad34f69679ce",
  "workspaceId": "9bcbb7d4-13f7-4bc2-a261-22be96a809dc"
}

But once committed to Git, Fabric rewrites it as:

"typeProperties": {
  "notebookId": "f69679ce-ad34-8161-41bd-caf50e23e4cb",
  "workspaceId": "00000000-0000-0000-0000-000000000000"
}

What’s going on? Fabric automatically replaces the item ID with the logical ID, and sets the workspace ID to a blank GUID to indicate an intra-workspace reference. It’s elegant and powerful because it allows seamless deployment without having to manually update references between environments.

Referencing Items Across Workspaces

Now, when you need to reference items across workspaces (e.g., from a controller pipeline in the orchestration layer to an ingestion pipeline in another workspace), Fabric does not resolve logical IDs automatically. You need to provide actual workspace and item IDs - and ideally dynamically.

To automate this, check out one of my previous blog posts:
👉 Automating Fabric: Dynamically Configuring Microsoft Fabric Data Pipelines

You can also use SemPy functions in your Notebooks like:

resolve_item_id
resolve_item_name
resolve_workspace_name
resolve_workspace_name_and_id

These help dynamically fetch the correct IDs based on the environment context.

Automate with Confidence: The fabric-cicd Python Library

For deploying Git-connected Fabric workspaces, I highly recommend using the fabric-cicd Python library. Purpose-built for this exact use case, it has quickly become the preferred deployment tool among many Fabric professionals.

The library enables code-first CI/CD automation by allowing you to deploy workspaces directly from a Git repository structure. It takes care of critical deployment tasks, such as replacing logical IDs with the actual item IDs of the newly deployed artifacts.

A standout feature is its support for environment-specific configurations using a parameters.yml file. This file lets you define and programmatically override values depending on the target environment. That includes, but isn’t limited to, workspace IDs, item IDs, connection strings, connection IDs, and more.

This makes the fabric-cicd library especially powerful for multi-environment deployments where automation, consistency, and traceability are key.

Don't miss the great introductory blog post by Jacob Knightley:
👉 Introducing the Fabric CICD Deployment Tool

A Note on Variable Libraries

Variable Libraries are a great addition to Fabric and will simplify many deployment scenarios. However, they do not handle dynamic references to other Fabric items like notebooks or pipelines. For that, logical IDs (and tools like fabric-cicd) are still essential.

Final Thoughts

I hope this post has helped clarify the role of logical IDs in Microsoft Fabric and why they’re vital when working with Git-connected workspaces. Designing your solution with automation and environment promotion in mind is key to building scalable, robust, and enterprise-ready data platforms.

Keep an eye out for my upcoming (and very small) post on how to use fabric-cicd to deploy multiple interconnected workspaces (Ingest, Prepare, etc.) while preserving cross-layer references.

Until then - automate everything, and automate it smartly 😉

Fabric CLI Beyond Shell Commands

Peer Grønnerup — Wed, 11 Jun 2025 05:47:23 GMT

The Microsoft Fabric CLI has become an essential tool for automating and managing your Fabric environments. There are many great articles out there on how to use the CLI locally, from your CI/CD pipelines, and even from within Fabric Notebooks - whether that’s using Python’s subprocess to run CLI commands, the ! operator for single shell commands, or magic commands such as %%sh or %%bash to execute entire cells in a subprocess.

👉 Sandeep Pawar recently wrote an excellent article on using the Fabric CLI in notebooks, which you can find here: Using Fabric CLI in Fabric Notebook.

👉 I also recently shared a blogpost on automating feature workspace maintenance in Microsoft Fabric using Python, the Fabric CLI, and GitHub Actions: Read it here.

So why explore Fabric CLI Python modules directly?

With Fabric User Data Functions (UDFs) now in public preview, I decided to investigate whether we could bypass the system shell entirely and leverage the Fabric CLI’s Python modules and functions directly instead of executing fab commands in a subprocess.

Why not just use the `subprocess` module?

Even though you can add the ms-fabric-cli library from PyPI in the library management section of your UDF, running shell commands (subprocess.run(["fab", ...])) doesn’t work because:

Fabric UDFs run in sandboxed environments where direct shell access is restricted.
The subprocess module is often locked down or lacks access to the underlying system shell.
It’s a security and resource isolation measure to ensure reliability and consistency of Fabric workloads.

For standard CLI usage, you should instead run the CLI on your local machine, on a VM, in a container, or through your CI/CD environment (like Azure DevOps or GitHub Actions). Or simply use the public REST APIs instead, which are HTTP-based and work well directly in code.

My Approach: Directly Using Fabric CLI Python Modules

Since the Fabric CLI is written in Python and is installed via pip, I thought - why not see if I could use the underlying Python modules directly?

My goal was to create a Fabric UDF that would run a job synchronously using the same logic as the fab job run command.

To make this work, you must add the ms-fabric-cli library from PyPI in the Library management section of your Fabric UDF.

How It Works

First observation - each CLI command has its own dedicated subpackage:

auth, config, jobs, fs (for filesystem commands), acl, and more.

For example, to configure encryption fallback and log in using a service principal, you can directly import and use:

from argparse import Namespace
from fabric_cli.commands.config import fab_config
from fabric_cli.commands.auth import fab_auth

# Set encryption fallback
args = Namespace(
    command_path=["/"],
    path=["/"],
    command="config",
    config_command="set",
    key="encryption_fallback_enabled",
    value="true"
)
fab_config.set_config(args)

# Login using service principal
args = Namespace(
    auth_command="login",
    username="*****",
    password="*****",
    tenant="*****",
    identity=None,
    federated_token=None,
    certificate=None
)
fab_auth.init(args)
fab_auth.status(None)  # Check current authentication status

⚠

This approach is purely experimental! Hardcoding client IDs and secrets directly in a UDF is not recommended for production scenarios. Currently, UDFs don’t support features like notebookutils to fetch tokens or Key Vault secrets. However, you could create a connection to a Fabric Lakehouse or a Fabric SQL Database containing the credentials for the service principal.

Running a Fabric Job

To run a job (similar to the fab job run command), you import the fab_jobs module. In my implementation, I take the workspace name, item name, and item type as input parameters to build the path for the item to run. These parameters are then used as arguments when executing the Fabric UDF.

from fabric_cli.commands.jobs import fab_jobs

args = Namespace(
    command_path=["/"],
    command="job",
    jobs_command="run",
    path=[f"{workspacename}.Workspace/{itemname}.{itemtype}"]
)
fab_jobs.run_command(args)

And voilà! 🎉 With this concise yet powerful Fabric UDF, you can expose job execution to business super users. For example, finance teams can now trigger ad-hoc jobs during month-end close using a transalytical task flow to run a Fabric User data function handling the job execution - without granting the users deep admin access to the entire Fabric workspace.

Additional Thoughts

One important consideration when using this approach is that you won’t see console outputs (like print statements) in the execution logs of your UDF runs. This can make troubleshooting or understanding the full execution flow challenging.

To address this, I’ve added logging of outputs and errors from the CLI commands directly into the UDF source code. You can find this implementation in the downloadable UDF example on my GitHub .

This ensures that all important outputs and errors are written to the log - making it much easier to monitor and debug these Fabric CLI-based UDFs in action.

Conclusion

While directly using the Fabric CLI’s Python modules in a UDF isn’t the recommended approach - many would argue that the public REST APIs are better suited for managing Fabric items within UDFs - this experiment showed that it’s possible to run CLI commands directly without relying on the shell.

It highlights the flexibility of the Fabric CLI’s architecture and suggests exciting future possibilities - imagine how powerful it would be to have a dedicated, fully supported Python interface for Fabric!

👉 You can find the full Fabric User Data Function implementation on my GitHub.

Who's Calling?

Peer Grønnerup — Fri, 09 May 2025 12:39:27 GMT

During Microsoft Fabric project implementations, I’m frequently asked a deceptively simple question: “Under which identity is this running?” It turns out, the answer isn’t always straightforward - and to be honest, it’s a topic I’ve also found quite complex at times.

Just because a schedule was created by you doesn’t necessarily mean the entire job triggered by that schedule runs in your user context - or for that matter, in the context of the identity who created the item. And with the introduction of Service Principal support, things haven’t exactly become clearer. In fact, it often adds an extra layer of complexity to the already tricky landscape of execution context in Fabric.

In this post, I want to share some of the insights I’ve gathered - especially when working with data pipelines that trigger child notebooks and other downstream activities. We’ll look at how identities are used across different components, what you need to be aware of, and how to avoid common pitfalls. Or in short: Who’s calling? 📞

Finally, I’ll touch on a known bug in the Fabric API and the SemPy library that affects notebook execution in Service Principal contexts, a setup that’s becoming increasingly common in enterprise-grade, multi-environment data platforms.

Test Setup: Simulating Real-World Scheduling Scenarios

To explore how execution context behaves in Microsoft Fabric, I created a simple but representative setup. Using the Fabric CLI, I triggered on-demand executions of Fabric items like data pipelines that call child notebooks as well as triggering notebooks directly.

This setup allows us to control exactly who initiates the run - be it a user or a Service Principal - and observe how that identity flows (or doesn’t) through the various components.

Key components of the setup:

A Data Pipeline with multiple activities (e.g., Invoke Notebook and Invoke Data Pipeline)
A Notebook which prints identity info as well as runtime properties and other relevant info
A parent Notebook which executes a child notebook (as the one above)
Fabric CLI-triggered job runs using both user identity and Service Principal

This approach mimics many enterprise deployment scenarios, especially in multi-environment setups.

Execution Scenarios: What Identity Is Actually Used?

Regardless of whether a job is triggered by a user or a Service Principal, the same core logic applies when it comes to execution context in Microsoft Fabric. However, what happens next depends heavily on the type of item being executed and how it's executed.

Let’s break it down…

Top-Level Execution: Who Triggers the Job?

When a pipeline or notebook is triggered - either manually, via schedule, or through a CLI/API call - the top-level item (the pipeline or notebook itself) is executed in the context of the identity that triggered it.

That could be:

A user account (e.g., developer in dev/test)
A service principal (e.g., a scheduled run in production)

So far, so good. But once you go deeper, into child components and downstream activities, the picture becomes more complicated.

Notebook Execution from Notebooks

When one notebook triggers another (e.g., using notebookutils.notebook.run()), the child notebooks always inherit the execution context of the parent notebook.

✅ If a notebook is triggered by a Service Principal, all downstream notebooks will run under the same Service Principal.

✅ If a user triggers the parent notebook, all child notebooks will run under that user’s identity.

This behavior is consistent and predictable across environments.

Data Pipelines: A More Complex Story

With Data Pipelines, execution context is activity-specific. Here’s what governs it:

🔹 Activities that use connections

Examples: Copy Data, Invoke Pipeline (preview), Azure Databricks, Semantic model refresh, Web etc.
These activities run under the identity associated with the connection object used.

🔹 Activities that do not use connections

Examples: Notebook, Invoke Pipeline (Legacy activity), Dataflow, Spark Job Definition etc.
These activities run under the identity of the user or service principal who last modified the pipeline. This is the identity shown as "Last Modified By" in the Data Pipeline settings.

⚠️ Yes, that means if you last edited a pipeline in dev as yourself, but deploy it in test using a service principal, the execution identity in test will be the service principal - even if the original intent was to run it as a user.

Real-Life Example: A Lakehouse Medallion Architecture

Let’s ground this in a practical scenario - a common Lakehouse Data Platform with a 3-layer medallion architecture:

A controller pipeline kicks off the process.
It calls child pipelines that ingest raw data into the bronze layer.
Then it triggers a notebook that processes bronze into silver.
Another notebook handles transformations into gold (curated data).
Finally, the pipeline refreshes a semantic model as the last step.

Here’s how execution context breaks down:

Activities using connections (e.g., Copy Data or Semantic model refresh) run under the connection identity.
Notebooks in the pipeline (with no connection) run as the last modified identity of the pipeline - which could be a user or service principal.
If a child pipeline triggers a notebook, the same logic applies: the last modified identity of that pipeline determines the execution context of its notebook.

So yes, it’s entirely possible that a single run involves:

Data ingestion as one identity (connection)
Silver transformation as another (pipeline author)
Gold orchestration as yet another (child pipeline modifier)

Feeling Lost? You’re Not Alone

If you’re scratching your head, you’re not alone. The behavior is by design, but it does mean we need to be deliberate about how we:

Modify items
Manage dependencies downstream
Set up connections
Deploy across environments

Most importantly: how things run in development may not reflect how they run in test or production - especially if you use a service principal for automated deployments.

That’s why understanding execution context is critical for ensuring consistent behavior across environments in enterprise-grade solutions.

Known Bug: When Notebooks Fail Under a Service Principal

While building enterprise-ready Fabric solutions, it’s increasingly common to run notebooks using Service Principals. However, there's a known bug that can cause unexpected failures when doing so.

What’s the Problem?

Running a notebook under a Service Principal can break certain functions and environment references, especially those related to runtime context and authentication. The issue appears to stem from the scope or limitations of the Service Principal's token, and Microsoft has acknowledged it as a bug. The Fabric product team is actively working on a fix.

What Fails?

Here’s a list of some of the functions and methods that return None or throw errors when executed in a notebook under a Service Principal. Note that mssparkutils is going to be deprecated, notebookutils is the way to go. This is just to illustrate the issue:

mssparkutils.env.getWorkspaceName()
mssparkutils.env.getUserName()
notebookutils.runtime.context.get('currentWorkspaceName')
fabric.resolve_workspace_id()
fabric.resolve_workspace_name()
Any SemPy FabricRestClient operations
Manual API calls using tokens from notebookutils.mssparkutils.credentials.getToken("https://api.fabric.microsoft.com")

⚠️ Importing `sempy.fabric` Under a Service Principal

When executing a notebook in the context of a Service Principal, simply importing sempy.fabric will result in the following exception:

Exception: Fetch cluster details returns 401:b''
## Not In PBI Synapse Platform ##

This error occurs because SemPy attempts to fetch cluster and workspace metadata using the execution identity’s token - which, as mentioned earlier, lacks proper context or scope when it belongs to a Service Principal.

In short, any method that fetches workspace name or user name - or relies on the executing identity’s token for SemPy or REST API calls - is likely to fail or return None.

What Still Works?

Surprisingly, not everything is broken. Here are some functions that still work under a Service Principal:

spark.conf.get('trident.workspace.id') – this gives you the workspace ID reliably
sempy.fabric.get_workspace_id() – still functional, eventhough importing sempy.fabric will throw an exception as shown above.
notebookutils.credentials.getSecret(...) – useful for pulling secrets like client credentials from a Key Vault

Using these, you can still manually generate a token and pass it into your REST requests - or even inject a custom token_provider into the SemPy FabricRestClient.

Workarounds

If you hit this issue, here are some paths forward:

Avoid relying on runtime context methods when running under a Service Principal
Use a manual token approach: fetch your own token using credentials from Key Vault and use that in REST requests
Where possible, shift context resolution logic out of notebooks and into deployment orchestration or pipeline steps
Watch for updates: Microsoft is aware of the issue and a fix is on the way

Why This Bug Matters for CI/CD and Execution Context

This issue ties directly back to the core topic of this blog post - execution context in Microsoft Fabric. Remember that when a notebook is triggered by a Data Pipeline, its execution identity depends on who last modified the data pipeline.

In modern CI/CD workflows - whether you're using Azure DevOps Pipelines, GitHub Actions, or any other automation platform - you’re most likely deploying with a Service Principal. That means after every deployment, the "Last Modified By" identity on your Data Pipelines becomes the Service Principal.

This wouldn’t be an issue if notebooks worked reliably under Service Principal identity. But as we've seen above, notebooks run into serious limitations when executed in that context - missing environment properties, failed API calls, and broken logic in dynamic configurations.

A Practical Workaround: Let a Web Activity Re-Assign Ownership

Here’s one way to get around it:
Use a Web activity in a Fabric Pipeline - configured with an OAuth2 connection for a specific user - to update the description of your Data Pipelines post-deployment.

Why this works:

A Web activity executes in the context of the connection identity
Updating the pipeline’s description (even just reapplying the same description) is enough to change the "Last Modified By" property
As a result, all notebooks executed by those pipelines will now run in the context of the user tied to the OAuth2 connection, not the Service Principal

This allows you to:

Deploy pipelines automatically with a Service Principal
Then post-process them to re-assign their execution identity to a user, for scenarios where notebook behavior matters

This approach also allows you to apply filters to target only specific Data Pipelines, updating the Last Modified By property selectively. This way, you can still support notebook execution under a Service Principal where needed.

Pipeline Template: Available on GitHub

You can see a visual of this post-deployment ownership adjustment pipeline below.

I’ve also published the pipeline definition on my GitHub including a short description on how to use the 2 parameters: View on GitHub

Note: All activities in the definition are currently disabled by default so you can safely copy-paste it into your own Fabric Data Pipeline json definition and adjust the connection settings, pipeline selection logic etc. as needed.

Automating Feature Workspace maintainance in Microsoft Fabric

Peer Grønnerup — Mon, 21 Apr 2025 08:53:40 GMT

📣

Update – July 2025: This post has been updated to reflect new support for connecting and synchronizing Microsoft Fabric workspaces with Azure DevOps Repos using a Service Principal. The update includes a new section on Azure DevOps setup, covering the required permissions, repository access etc. when using Azure DevOps as your Git provider.

At the Microsoft Fabric Community Conference in Las Vegas in April 2025, Microsoft announced the public preview of the Fabric CLI - a powerful, developer-first command line interface that brings a file-system-inspired way to explore and manage your Fabric environment. As someone who's been deep in the weeds with the Fabric REST APIs for quite some time (and have blogged about it before), I was excited to see how the Fabric CLI was building on the APIs to make automation more intuitive and accessible than ever.

In this blog post, I’ll walk you through how to use the Fabric CLI from within Python to support a best-practice approach for auto-generating and auto-configuring feature development workspaces in Microsoft Fabric.

In myexample, I’ll first focus GitHub Actions along with a service principal for authentication.

When this post was originally published, Service Principal authentication was only supported when using GitHub as the Git provider in Microsoft Fabric.

However, Fabric is evolving rapidly and with that evolution, we now have Service Principal support for the Git Connect operations via the Fabric REST APIs when using Azure DevOps as the Git provider as well.

⚠️ Important caveat: Service Principal is not supported when the Git provider is Azure DevOps and the authentication method is set to "Automatic".

There are additional details and setup requirements you'll want to be aware of. I’ve included a dedicated section at the end of this post covering how to configure Azure DevOps to work with a Service Principal for secure, automated workspace synchronization.

This post ties closely to my session at FabCon 2025 in Las Vegas "From Setup to CI/CD: Automating Microsoft Fabric for Scalable Data Solutions" - where I showcased an end-to-end automation approach. If you’re interested, you can find the session materials and sample code here:

🔗 Session code & presentation from FabCon 2025: github.com/gronnerup/Fabric
🛠 This article's code repo (ongoing work): github.com/gronnerup/FabricAutomation

Automating feature workspace maintainance using GitHub Actions

This section focuses on automating the setup and teardown of feature workspaces using GitHub Actions.
While the examples here use GitHub, much of the approach also applies when using Azure DevOps.
If you're working with Azure DevOps Pipelines, be sure to check out the dedicated section at the end of this post for platform-specific guidance.

Prerequisites and Requirements

Before automating the creation of isolated feature development workspaces in Microsoft Fabric using the Fabric CLI and GitHub Actions, make sure you have the following in place:

1. Service Principal Authentication

This solution uses service principal authentication with a client secret, allowing secure, automated access to your Fabric environment. You’ll need to create an App Registration in Microsoft Entra ID and ensure the service principal is properly configured for Fabric API access.

In your GitHub repository, define these repository secrets:

SPN_TENANT_ID – The Tenant ID of your Microsoft Fabric environment.
SPN_CLIENT_ID – The Client ID (Application ID) of your app registration.
SPN_CLIENT_SECRET – The Client Secret of the app registration.

Make sure the service principal is enabled for the Fabric REST APIs by following the official guidance:
Enable service principal for Fabric REST APIs

2. GitHub Personal Access Token (PAT)

You’ll also need to create a GitHub Personal Access Token (PAT) to enable Fabric’s Git integration. This token is used to authenticate Fabric when connecting to your GitHub repository.

Follow this guide to create a PAT and connect your workspace to Git:
Connect to a Git repo (Microsoft Learn)

⚠

Important: Before setting up Git integration, review the considerations and limitations outlined here: Git integration: Considerations and Limitations

3. Fork the Repository

To get started, fork the repository to your own GitHub account so you can safely configure secrets and CI/CD pipelines: https://github.com/gronnerup/FabricAutomation

This provides a clean slate to experiment and build on top of the existing automation approach.

My Approach to Continuous Integration with Git

When implementing Continuous Integration (CI) in Microsoft Fabric, it's essential to have a clear structure for both your workspaces and your Git repository. This helps ensure that your development process supports scalability, collaboration, and automation from day one.

Workspace Structure: Layer-Separated for Clarity and Control

The question of how to best structure workspaces in Fabric has been the subject of many discussions across blog posts, LinkedIn and Reddit threads. While there's no single “right” answer, my recommendation, based on practical experience and architectural clarity, is to follow a layer-separated workspace pattern.

This architecture separates your Fabric solution into logical layers, such as:

Store: Lakehouse etc.
Ingest: Notebooks, Data Pipelines, etc.
Prepare: Notebooks focused on shaping and cleansing data
Serve: Semantic Models and related artifacts
Orchestrate: Data Pipelines or Notebooks driving execution logic
Core: Components such as Variable Libraries, Environments, and Fabric Databases used for metadata

Each layer gets its own dedicated workspace, allowing for:

Transparent organization of items and responsibilities
Improved access control at the workspace level
Capacity separation, which is especially useful in large-scale environments

📘

I’ve previously written about this setup and why I believe it’s a solid foundation for modern Fabric development: 🔗 Automating Fabric: Kickstart Your Fabric Data Platform Setup

This structure does introduce one consideration: isolated feature development workspaces may need to mirror more than one layer, depending on the scope of the feature being implemented. In other words, a single feature branch may touch multiple workspaces and that’s okay, as long as it’s organized.

Git Repository Structure: One Repo to Rule Them All

To support this workspace setup effectively, I recommend keeping all your Fabric resources in a single Git repository. Within this repo, each solution layer is represented by a subfolder, and each layer-specific workspace connects to its respective folder via Fabric’s Git integration.

A typical structure might look like this:

/.azure-pipelines    # Azure DevOps pipelines
/.github             # GitHub Actions workflows
/automation          # Scripts, deployment helpers etc.
/documentation       # Solution documentation. Can be used for Azure DevOps project Wiki
/solution            # Solution folders for the different layers
  /Core
  /Ingest
  /Orchestrate
  /Prepare
  /Serve
  /Store

This structure offers a few key benefits:

End-to-end feature branches – You can implement a business requirement across all relevant layers (and include documentation!) in a single branch.
CI/CD alignment – Makes it easier to automate build/test/deploy processes using GitHub Actions or Azure Pipelines.
Organizational clarity – Developers always know where to find and contribute to specific parts of the solution.

With this setup, isolated feature development workspaces are created dynamically and point to the relevant subfolders. This aligns perfectly with the approach demonstrated in this blog post, and it’s designed to scale with the complexity of your data platform.

Automating the Feature Development Process

As highlighted in Microsoft’s official documentation on deployment and development processes, it’s considered best practice to isolate development work outside of your main collaboration branch. This ensures cleaner version control, better collaboration, and minimizes disruption to ongoing work.

Following Git standards, development should happen in feature branches, each representing a specific unit of work. This allows for focused development, easier reviews, and safer integration into the mainline once complete.

When working in Microsoft Fabric, isolated development also means creating separate workspaces to support and validate your changes. There are two primary ways to do this:

Manual setup via the Fabric UI
Programmatic setup via the Fabric REST APIs or the Fabric CLI

But why stop at manual or semi-automated processes?

Taking It to the Next Level: Automating Workspace Creation

By leveraging GitHub Actions or Azure DevOps pipelines, we can automate the entire process of setting up and later tearing down feature development workspaces. This not only saves time but ensures consistency across environments.

In my approach, I use a recipe file that defines exactly how feature workspaces should be configured. This file, feature.json, lives in the repository at: automation/resources/environments/

{
    "feature_name" : "*{feature_name}-{layer_name}",
    "capacity_name": "MyCapacity",
    "git_settings": {
        "gitProviderDetails": {
            "gitProviderType": "GitHub",
            "ownerName": "MyGitHubProfile",
            "repositoryName": "MyGitHubRepo"
        },
        "myGitCredentials": {
            "source": "ConfiguredConnection",
            "connectionId": "00000000-0000-0000-0000-000000000000"
        }
    },
    "permissions": {
        "admin": [
            {"type": "Group", "id": "00000000-0000-0000-0000-000000000000"}
        ],
        "contributor": [
            {"type": "User", "id": "00000000-0000-0000-0000-000000000000"}
        ]
    },
    "layers": {
        "Prepare": {
            "spark_settings": {
                "pool": {
                    "starterPool": {
                        "maxExecutors": 1,
                        "maxNodeCount": 1
                    }
                }
            },
            "git_directoryName": "solution/prepare"
        },
        "Ingest": { "git_directoryName": "solution/prepare" },
        "Orchestrate": { "git_directoryName": "solution/orchestrate" }
    }
}

The key elements include:

A naming convention for feature workspaces (prefixed with an asterisk for easy visibility)
The target capacity for deployment
Git integration settings and authentication
Permissions configuration for users and/or groups
Layer-specific settings such as Spark pool resource limits can be particularly useful - for example, by configuring a single-node Spark pool, you can reduce vCore consumption and minimize the risk of hitting concurrency limits.

📘

My good friend Just Blindbæk has written a great series on optimizing Spark for collaboration and scaling - definitely worth a read!

GitHub Workflows: Creation and Cleanup

Inside the .github/workflows folder of the repository, you’ll find two workflows:

Create Fabric feature workspaces on feature branch creation
Triggered when a new feature branch is created.
Cleanup Fabric feature workspaces on merge to main
Triggered when the feature is merged into main.

Both workflows call the Python script fabric_feature_maintainance.py (found in automation/scripts), which handles the actual creation or deletion logic. Under the hood, the script uses the Fabric CLI, calling commands via a utility function defined in: automation/scripts/modules/fabric_cli_functions.py

CLI commands are executed using a simple run_command() function:

def run_command(command: str) -> str:
    try:
        result = subprocess.run(
            ["fab", "-c", command],
            capture_output=True,
            text=True,
            check=EXIT_ON_ERROR
        )
        return result.stdout.strip()
    ...

And for functionality not yet covered by Fabric CLI commands, I use the powerful fab api command to interact directly with the Fabric REST API - for example, when connecting and synchronizing Git repositories.

Quickstart Walkthrough

Curious how this works in practice? Here’s a simple walkthrough to get you up and running with automated feature workspace creation in Microsoft Fabric.

1. Fork the Repository

Head over to:
👉 https://github.com/gronnerup/FabricAutomation
Fork it to your own GitHub account.

2. Set Up Your Secrets and Service Principal

Make sure you’ve followed the prerequisites:

Create a service principal and assign necessary API permissions
Configure your repository secrets:
- SPN_TENANT_ID
- SPN_CLIENT_ID
- SPN_CLIENT_SECRET
Set up Git integration with a GitHub Personal Access Token (PAT) and create a new cloud connection in Fabric go generate the required connection id. Choose Github - Source control as the connection type.

3. Customize the feature.json recipe file

Edit the file automation/resources/environments/feature.json
Define how your feature workspaces should be created:

Workspace naming pattern
Fabric capacity
Git repo connection settings
Layers to include and optional Spark pool settings

4. Create a Feature Branch

Create a new branch in your GitHub repository by using the naming convention feature/\***.

This will automatically trigger the GitHub Action responsible for creating your feature workspaces.

5. Watch the Workspaces Come to Life

Within seconds, your configured feature workspaces will appear in Microsoft Fabric - connected to Git and syncronized, with permissions and Spark settings applied.

6. Merge and Clean Up Automatically

When the feature is complete and you merge your branch into main, a separate GitHub Action will trigger and clean up the feature workspaces - keeping your Fabric environment tidy and focused.

Using Azure DevOps Pipelines

If you haven’t already, make sure to read the section on automating feature workspace maintenance using GitHub Actions. It dives deeper into the overall approach, recommended repository structure, and reusable configuration files. This section focuses only on Azure DevOps-specific details - including authentication, limitations, and pipeline setup when using Azure DevOps as your Git provider.

Prerequisites and Requirements

Simular to using Github Actions we to make sure a few things are in place before we can automate the creation of isolated feature development workspaces, the prerequisites are:

1. Setup Azure DevOps Repo

Using this guide https://learn.microsoft.com/en-us/azure/devops/repos/git/import-git-repository import the Github repository https://github.com/gronnerup/FabricAutomation into you own Azure DevOps Repo.

2. Variable Group for holding Service Principal credentials

Create a new Variable Group under Repos → Library named Fabric_Automation and add the following variables:

SPN_TENANT_ID – The Tenant ID of your Microsoft Fabric environment.
SPN_CLIENT_ID – The Client ID (Application ID) of your app registration.
SPN_CLIENT_SECRET – The Client Secret of the app registration.

Make sure the service principal is enabled for the Fabric REST APIs by following the official guidance:
Enable service principal for Fabric REST APIs

3. Create Azure DevOps Pipelines for feature workspace creation and feature teardown

Create 2 new Azure DevOps pipelines using the YAML pipelines located in the .azure-pipelines folder.

Create Feature Workspaces pointing to .azure-pipelines/feature_fabric_branch.yml
Cleanup Feature workspaces pointing to .azure-pipelines/feature_fabric_cleanup.yml

4. Create Azure DevOps Azure DevOps source control connections

Create a new connection to Azure DevOps in Fabric. This connection can be established using either a user principal or a Service Principal. Whichever option you choose, ensure that the identity has the necessary access to the Azure DevOps repository.
And don’t forget to explicitly add the Service Principal as a user of the connection to authorize its use in Git operations.

5. Customize the feature.json recipe file

Edit the file automation/resources/environments/feature.json as also described in the section covering GitHub setup.
Note that the gitProviderType must be set to AzureDevOps.

6. Create a new feature branch watch feature worksaces come to life

Creating a new feature named feature/*** will trigger the pipeline Create feature workspaces which will automatically create the required workspaces based on the recipe file, connect them to the Azure DevOps Repo and perform a syncronization.

7. Merge and Clean Up Automatically

When the feature is complete and you merge your branch into main, triggering the pipeline and Cleanup Feature workspaces - keeping your Fabric environment tidy and focused.

💡

Note that unlike GitHub Actions, Azure DevOps uses a different syntax for defining triggers in YAML. For example the ADO does not have native support on: create and types: [closed]. However we do check whether the corresponding feature workspaces already exists before creating it and use a condition the cleanup pipeline to ensures the logic only runs on individual commits, not on merge completions.

Tip: Dynamically defining source control connections

In the feature.json recipe file, you can now define the source control connection dynamically using the connectionName field with string interpolation. This provides a flexible alternative to using a fixed connectionId and allows you to tailor the connection to the identity of the user triggering the pipeline.

Instead of this:

jsonCopyEdit"myGitCredentials": {
  "source": "ConfiguredConnection",
  "connectionId": "12345678-abcd-efgh-ijkl-9876543210"
}

You can now do this:

jsonCopyEdit"myGitCredentials": {
  "source": "ConfiguredConnection",
  "connectionName": "PeerInsights_AzureDevOps_{identity_username}"
}

The placeholders {identity_username} and {identity_id} are automatically resolved at runtime:

{identity_username}
- In Azure DevOps, this maps to the predefined variable Build.RequestedForEmail (converted to uppercase).
- In GitHub, it uses the GITHUB_ACTOR environment variable (must match the casing exactly).
{identity_id}
- In Azure DevOps, this is Build.RequestedForId.
- In GitHub, it uses GITHUB_ACTOR_ID.

This enables even more granular connection setups, for example:

FabricSourceControl_GRONNERUP (based on username)
FabricSourceControl_3e8609e9-9292-4e1e-9f2d-3f533ed6d7f8 (based on user ID)

Note: In Azure DevOps, the username used in the connection must be in uppercase. In GitHub, the casing must exactly match how the username is stored in the platform. That’s just my design…

Also, remember that the Service Principal used by the pipeline in Azure DevOps or GitHub must be added as a user of the connection to access it during automation.

This dynamic approach makes your automation workflows more flexible and scalable, especially in environments with multiple contributors.

Wrapping Up

Automating the creation of feature workspaces in Microsoft Fabric is a key step toward a scalable, repeatable, and developer-friendly data platform. By combining the power of the Fabric CLI, GitHub Actions, Azure DevOps Pipelines and a simple recipe-based configuration, we can streamline the entire development process - from branch creation to workspace provisioning and eventual cleanup.

This is just the beginning.

I’ll continue to enhance the FabricAutomation repository to reflect my latest work, including:

Automated solution setup for new projects and environments
Solution automation using a metadata-driven framework
CI/CD pipelines using Fabric CLI and the fabric-cicd Python library
Branching and merging strategies for structured, enterprise-grade development
Enhanced support for user specific recipe files and much more…

Stay tuned - and feel free to star the repo or follow along if you're as excited about Fabric automation as I am. 🚀

Automating Fabric: Maintaining workspace icon images

Peer Grønnerup — Mon, 10 Feb 2025 18:57:55 GMT

When working with data platform solutions in Microsoft Fabric, a well-structured approach is crucial for maintaining scalability and organization. One best practice is to separate workspaces not only into different environments (such as development, test, and production) but also into distinct layers—data storage, data ingestion, transformation, semantic modeling, and reporting. This separation improves governance, security, and clarity in large-scale deployments.

However, managing multiple workspaces can quickly become overwhelming. Identifying and distinguishing them at a glance is not always easy. Fortunately, Microsoft Fabric allows us to assign Workspace Images, which provide a simple yet effective way to visually categorize different workspaces based on their purpose and environment.

Uploading these images manually is feasible, but when dealing with a large number of workspaces, automation becomes the obvious solution. In this blog post, I will walk you through how to automate the process of uploading workspace images using a Fabric Notebook, making it easy to manage and update workspace visuals at scale.

Disclaimer: This solution uses a non-documented and unofficial Microsoft endpoint for fetching and updating workspace metadata in Microsoft Fabric/Power BI. Since this is not an officially supported API, it may change without notice, which could impact the functionality of this approach. Use it with that in mind, and feel free to experiment!

Fabric Notebook to automating Workspace Image uploads

To demonstrate how to maintain workspace icon images programmatically, I’ve created a simple Fabric Notebook. This notebook provides methods for:

Identifying workspaces based on a filter definition.
Fetching workspace metadata, including existing icons.
Setting new workspace icons in bulk.

For this demonstration, the notebook utilizes icons from Marc Lelijveld’s blog post on Designing Architectural Diagrams with the Latest Microsoft Fabric Icons.

ℹ

The notebook must be executed by a user with workspace admin permissions to update the icon for a given workspace.

Requirements

The notebook requires a few Python libraries:

cairosvg – Converts base64 SVGs to PNG images.
Pillow – Supports adding an environment letter on top of the icons (not used in the example but available for experimentation).

To fetch all accessible workspaces, the notebook uses SemanticLink and the FabricRestClient class:
SemanticLink FabricRestClient Documentation.

Filtering Workspaces

The notebook filters workspaces using two parameters:

must_contain = "PeerInsights"
either_contain = ["dev", "tst", "prd"]

A custom Python function filter_items then filters the list of workspaces:

workspaces = filter_items(all_workspaces, must_contain, either_contain)

Defining Workspace Icons

A JSON structure is used to define workspace icons and color overlays:

workspace_icon_def = {
    "icons": {
        "prepare": "Notebook",
        "ingest": "Pipelines",
        "store": "Lakehouse",
        "serve": "Dataset"
    },
    "color_overlays": {
        "dev": "#1E90FF",   # Blue
        "tst": "#FFA500",   # Orange
        "prd": "#008000"    # Green    
    }
}

Note: To remove an existing icon, set the icon title to None.

Updating Workspace Icons

In Cell 7 of the notebook, a new property icon_base64img is added to each workspace, storing the base64-encoded PNG string of the new icon.
The function display_workspace_icons generates an HTML table showing the old and new workspace icons for verification.
Finally, we iterate through the filtered workspaces and updates their icons using the set_workspace_icon function.

The result….

Using the non-documented metadata endpoint

Workspace icons are updated by calling the non-documented Microsoft endpoint:

{cluster_base_url}metadata/folders/{workspace_id}

The cluster_base_url can be retrieved using the Power BI REST API:

https://api.powerbi.com/v1.0/myorg/capacities

Example base URL:

https://wabi-north-europe-j-primary-redirect.analysis.windows.net/v1.0/...

⚠

The undocumented endpoints can only be accessed using a user identity. Setting a workspace icon is not supported when using a Service Principal.

Making API Calls

Fetching workspace metadata:

GET {cluster_base_url}metadata/folders/{workspace_id}

Updating the workspace icon:

PUT {cluster_base_url}metadata/folders/{workspace_id}

With the following payload:

{ "icon": "data:image/png;base64,{base64_png}" }

ℹ

According to the documentation from Microsoft a Workspace image can be in .png or .jpg format and the size of the file has to be less than 45 KB.

Conclusion

By automating the upload of workspace images in Microsoft Fabric, we can enhance the visual organization of workspaces, making it easier to distinguish between different layers and environments. Instead of manually updating images across multiple workspaces, a Fabric Notebook provides an efficient and scalable solution.

You're free to use my example as-is or as inspiration for your own automated setup. I’d love to hear how others are tackling this challenge - what solutions have you come up with, and what are your use cases? Feel free to comment, ask questions, or share suggestions!

If you're interested in trying this out, you can download the Fabric Notebook here.

Automating Fabric: Dynamically Configuring Microsoft Fabric Data Pipelines

Peer Grønnerup — Wed, 29 Jan 2025 18:42:00 GMT

In a typical end-to-end Microsoft Fabric data platform, we use workspace structures and stages - like store, ingest, prepare, serve and orchestrate - to organize the data lifecycle. If you're unfamiliar with these stages, I’ve detailed them in my previous post: Automating Fabric: Kickstart Your Fabric Data Platform Setup.

This post will focus on the ingest and orchestrate stage and how to ensure valid and robust references between Data Pipelines.

The challenge of automating Data Pipelines in Fabric

Automation is key to enabling an efficient CI/CD flow, but Microsoft Fabric, as a relatively new platform, doesn’t always provide the ideal tools for seamless automation. A prime example is how Fabric Data Pipelines manage dependencies and references - whether invoking other pipelines, running notebooks, refreshing semantic models, or connecting to resources like Lakehouses or SQL databases.

A Common Scenario

Consider this scenario:

You create a controller pipeline that orchestrates data ingestion by invoking child pipelines.
The controller pipeline then triggers notebooks to transform data from bronze to gold in a medallion architecture.
Finally, it refreshes a semantic model to support business intelligence workloads.

This solution evolves through feature branches in a development environment, moves to test for user acceptance testing, and is eventually deployed to production.

A key challenge here is ensuring that references to resources - like workspaces and pipelines - are dynamically updated as part of the deployment process, without adding complexity for data engineers or compromising CI/CD workflows.

Many of you who has already worked with Data Pipelines in Fabric in combination with Git and deploying Data Pipelines may have found yourself frustrated by the way pipelines reference other pipelines and how it can lead to errors.

In this post, I’ll show how to dynamically configure the Invoke Pipeline activity in Fabric Data Factory to support automated CI/CD deployments.

Two Approaches to Invoking Data Pipelines

Fabric offers two main ways to invoke one data pipeline from another:

1. Legacy Invoke Data Pipeline

This is the older, now deprecated, approach. It:

Allows pipeline execution only within the same workspace.
Does not support dynamic expressions for workspace or pipeline references.

2. Invoke Pipeline (Preview)

This newer, more versatile activity (currently in preview) allows:

Executing pipelines across different workspaces.
Using dynamic expressions for workspace and pipeline references.
Invoking Azure Data Factory Pipelines and Synapse Pipelines.

However, it depends on a new type of connection that leverages user principal identity for authentication.

A Dynamic Solution for CI/CD

To enable automated CI/CD deployment while maintaining dynamic references, we use the Invoke Pipeline (Preview) activity with dynamic settings for workspace and pipeline IDs. Here’s how:

Step 1: Extract Pipeline Metadata

First, we need a Web activity to retrieve metadata about all pipelines in the current workspace.

This activity calls the Fabric REST API using a service principal.
Configure a Web connection with a base URL: https://api.fabric.microsoft.com/v1.

Use the following dynamic expression for the relative URL:

  @concat('workspaces/', pipeline().DataFactory, '/items?type=DataPipeline')

This fetches details about data pipelines, including their display names and IDs. We use the Core endpoint List Items which returns a list of items from a specified workspace.

Step 2: Dynamically Set Workspace and Pipeline References

Next, use the Invoke Pipeline (Preview) activity with dynamic content for the workspace and pipeline settings:

Workspace Reference
Set the workspace dynamically using:
```
 @pipeline().DataFactory
```
This ensures the activity always points to the workspace of the executing pipeline.

Pipeline Reference
Use the following expression to dynamically retrieve the pipeline ID based on its display name:

 @string(
     xpath(
         xml(
             json(concat('{"root":', activity('GetPipelines').output, '}'))
         ),
         'string(/root/value[normalize-space(displayName)="MyChildPipeline"]/id)'
     )
 )

This searches for the ID of the pipeline MyChildPipeline within the current workspace.

Why This Works for CI/CD

By configuring workspace and pipeline references dynamically:

You eliminate hardcoding, ensuring pipelines adapt to the target environment (development, test, or production).
References are automatically updated during deployment, reducing manual effort and risk of errors.
The solution remains flexible and scalable for feature branches and multi-stage workflows.

Conclusion

Dynamic configuration of Fabric Data Pipelines is essential for a robust and automated CI/CD process. By leveraging the Invoke Pipeline (Preview) activity and integrating with Fabric REST APIs, you can achieve seamless deployments across environments while maintaining clarity and simplicity in your pipeline design.

I hope this guide helps you on your journey to automate Microsoft Fabric solutions. Let me know your thoughts or questions in the comments below!

Automating Microsoft Fabric: Private Endpoint Setup in workspaces

Peer Grønnerup — Wed, 30 Oct 2024 16:23:14 GMT

In an exciting development, Microsoft Fabric just announced support for APIs dedicated to managing private endpoints, a crucial feature for organizations prioritizing secure and private data access. Building on my previous posts on automating Fabric workspaces and lakehouses and leveraging Fabric REST APIs, I’ll guide you through automating the creation of managed private endpoints within your Fabric workspaces. In this post, I’ll cover not only how to set up these private connections but also how to streamline approvals via Azure management APIs, if permitted in your environment.

Find the official blog post from Microsoft on APIs for Managed Private Endpoints here: https://blog.fabric.microsoft.com/en-US/blog/apis-for-managed-private-endpoint-are-now-available/

Previous Approach to Automating Managed Private Endpoint Creation

Before official API support for managed private endpoints was available in Microsoft Fabric, our approach relied on using Fabric's internal, undocumented APIs. To automate endpoint creation within a workspace, I would send a POST request to:

https://wabi-north-europe-j-primary-redirect.analysis.windows.net/metadata/workspaces/00000000-0000-0000-0000-000000000000/privateEndpoints

And with the following JSON payload:

{
   "name":"my-private-endpoint",
   "requestMessage":"Auto-generated managed private endpoint",
   "privateLinkResourceId":"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/rg-peerinsights-dev/providers/Microsoft.KeyVault/vaults/kv-peerinsights-dev",
   "groupId":"00000000-0000-0000-0000-000000000000"
}

While effective, this approach was less than ideal - it depended on an unsupported API and allowed only user identity for authentication, not service principals or managed identities.

With the recent additions to the Fabric APIs, creating managed private endpoints can now be achieved through officially supported, documented endpoints. Even better, service principal authentication is now supported, offering a more secure and scalable way to automate private endpoint management.

Adding Managed Private Endpoints with Fabric APIs

Building upon my previous blog post on automating your Fabric environment setup, I’ve enhanced the helper functions notebook to support the creation and management of managed private endpoints, including handling the long-running nature of the setup process.

In the fabric_functions.py script, I added a few key functions to streamline this process. Two of the most critical functions are:

create_workspace_managed_private_endpoint: This function automates the creation of a managed private endpoint within a Microsoft Fabric workspace, monitoring its provisioning status until fully completed.
approve_private_endpoint: This function automates the approval of a private endpoint connection within Azure, updating its status to "Approved" through an API request.

To integrate this functionality, I extended the staging recipe used in the workspace setup to include private endpoints that should be created and, if desired, automatically approved. Here’s an example of the updated fabric_stages configuration:

fabric_stages = {
    "Prepare": {
        "private_endpoints": [
            {
                "name": "mpe-kv-peerinsights-dev",
                "auto_approve": True,
                "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/rg-peerinsights-dev/providers/Microsoft.KeyVault/vaults/kv-peerinsights-dev"
            }
        ]
    }
}

With this new functionality, private endpoints can be easily integrated into the Fabric setup process. And by using the auto_approve property in the private endpoint definition, we can direct our setup to automatically approve the newly created endpoint. Here’s how it works:

if not stage_props.get("private_endpoints") is None:
    for private_endpoint in stage_props.get("private_endpoints"):
        fabfunc.create_workspace_managed_private_endpoint(
            fabric_access_token, workspace_id, private_endpoint.get("name"), private_endpoint.get("id")
        )
        if private_endpoint.get("auto_approve"):
            connection_name = f"{workspace_id}.{private_endpoint.get('name')}-conn"
            management_access_token = fabfunc.get_access_token(tenant_id, app_id, app_secret, 'https://management.core.windows.net')
            fabfunc.approve_private_endpoint(
                management_access_token, private_endpoint.get("id"), connection_name
            )

And the result…

With this approach, managed private endpoints can now be included as an integrated part of the Fabric setup, ensuring a smooth and automated deployment from start to finish.

Conclusion

Every Fabric API update brings us closer to fully automating and streamlining data platform workflows, steadily checking off my 'must-have' features list—big kudos to the Fabric team!

I’ll keep sharing insights on automating Microsoft Fabric, so stay tuned for more from Peer Insights! As a sneak peek, I’ll be exploring ways of working within Fabric to simplify the setup of feature development workspaces and more.

You can download the enhanced notebooks, now supporting managed private endpoint setup, here: GitHub - FabricSolutionInit.

💡

I initially forgot to include the azure_functions.py file in the repository, but it has now been added. You can find it alongside the other resources to support your setup.

Automating Fabric: Kickstart your Fabric Data Platform setup

Peer Grønnerup — Mon, 28 Oct 2024 21:53:40 GMT

Setting up and managing workspaces in Microsoft Fabric can be a time-consuming task, especially when you need multiple workspaces for various stages of the data lifecycle across different environments. This blog post demonstrates how to streamline your Fabric setup using Python and Fabric REST APIs and automate the creation, configuration, and if required clean up of Fabric workspaces etc.

My approach to workspace setup and configuration

I will introduce a recipe-based setup approach, where I define essential parameters like workspace naming pattern, environment-specific settings, stages, Git configurations, and more.

Using Python scripts I will demonstrate how quickly and efficiently you can perform the following tasks:

Configure environments (Development, Test, Production) using environment-specific parameters.
Set up workspaces for different data lifecycle stages (Ingest, Prepare, Serve, and Consume) and for each of the configured environments.
Automate workspace assignments to Fabric capacities.
Manage access and permissions for secure, compliant collaboration.
Integrate workspaces with Git for seamless CI/CD workflows.

Prerequisites

There are a few prerequisites to this approach. These include:

Python Environment: Ensure you have Python 3.x installed with essential libraries (requests and azure-identity) which is needed for interacting with REST APIs and authenticating to Azure.
You can install the required Python libraries by using the command:
pip install requests azure-identity —-user
Fabric API Access: Access to Fabric REST APIs via service principal, configured with necessary permissions.
Git Access: Access to integrate Fabric workspaces with a Git repository.
Python functions file and setup sample scripts: Clone or download the Python scripts from my GitHub repo to get started. You can find a link to the repository at the bottom of this blog post.

Tips:
A Fabric administrator will need to enable API permissions and workspace creation rights for your service principal.

Workspace structure

Before jumping into action, let’s discuss how to structure your Fabric workspaces effectively. My recommendation is to separate workspaces by stages and environments, as shown below.

In a typical end-to-end data platform setup, we have distinct components for each stage of the data lifecycle: pipelines and notebooks for data ingestion, notebooks for data preparation, lakehouses for storage, semantic models for serving data and reports for data consumption. Separating these stages into individual workspaces, then multiplying them by environments (such as dev, tst, prd), allows you to assign security at the stage level and provides flexibility in allocating different Fabric capacities for each stage and environment.

For enhanced governance and security, consider further dividing the storage workspace into separate workspaces for each layer of the medallion architecture. This approach simplifies permission management and supports a more scalable, secure setup across the data platform. On the other hand this approach will increase complexity and it add management overhead .

Recipe-Based Setup

My automation approach is built around variables and recipes, defining details for each environment and stage, including:

Naming: A generic pattern for defining how workspaces are named.
Environments: Details for Dev, Test, and Production environments. This also includes Fabric capacity details and permissions.
Stages: The purpose (Ingest, Prepare, Serve, Consume) and definition of Fabric items such as lakehouses.
Git Setup Information: Definition of Git repository information and branch details for each workspace.

Script and setup configuration

To streamline the setup of a Fabric data platform solution, I’ve created two Python scripts: init_fabric_solution.py and fabric_functions.py. These scripts automate the creation and configuration of workspaces, capacities, and permissions across various stages and environments using Fabric and Power BI REST APIs.

The init_fabric_solution.py script manages the main setup process, leveraging helper functions in fabric_functions.py. These helper functions encapsulate the necessary Fabric and Power BI REST API calls, keeping the code clean, reusable, and easy to maintain. This approach makes it simple to add or adjust functions as setup needs evolve.

Together, these scripts provide a fully automated, scalable method for configuring your Fabric solution with minimal manual effort.

Let me walk you through the key steps in the setup process, covering the creation of Fabric workspaces, items, and the initialization of Git integration.

Step 1: Authenticating with Fabric REST APIs

Workspaces and lakehouses are created using a service principal, following best practices to ensure that ownership is assigned to the service principal rather than an individual user account.

The Tenant ID, App ID, and App Secret for the service principal can be stored in a credentials.json file or directly in the init_fabric_solution.py script, depending on your preference.

Example of credentials.json file

{
    "tenant_id": "00000000-0000-0000-0000-000000000000",
    "app_id": "00000000-0000-0000-0000-000000000000",
    "app_secret": "YourAppSecret"
}

The wrapper function get_access_token is then called, passing in the service principal credentials and scope.

# Load the credentials from the credentials.json file. Remove this and use hardcoded values if credentials file is not used. 
credentials = fabfunc.get_credentials_from_file("credentials.json")

tenant_id = credentials["tenant_id"]
app_id = credentials["app_id"]
app_secret = credentials["app_secret"]

fabric_access_token = fabfunc.get_access_token(tenant_id, app_id, app_secret, 'https://api.fabric.microsoft.com')
#endregion

Step 2: Naming pattern, environments and stages

After authentication, the setup process follows a structured naming convention defined by the fabric_solution_name variable. This variable uses string interpolation to generate the names of each workspace, incorporating the specified stage and environment names.

Key Configuration Variables

fabric_solution_name: Sets the base naming pattern for the workspaces. For example:
```
 fabric_solution_name = 'MyDataPlatform - {stage} [{environment}]'
```
This pattern ensures consistency in naming by automatically incorporating each workspace's stage and environment into its name.

fabric_environments: This JSON-like variable defines each environment to be created, including:

capacity_id: Specifies the Fabric capacity to which each workspace in the environment will be assigned.

permissions: Lists the user or group permissions for each environment. Supports Admin, Contributor, Member and Viewer and Group, User and App identities. For example:

  fabric_environments = {
      "dev": {
          "capacity_id": "79CF9D57-8F75-4879-B906-691A0D85A36B",
          "permissions": {
              "Admin": [
                  {"type": "Group", "id": "a9327fc3-a6a0-4b82-8087-6b0d698323d7"},
                  {"type": "User", "id": "pg@kapacity.dk"}
              ]
          }
      },
      "tst": {
          # Additional environment configurations
      },
  }

fabric_stages: This variable defines the stages and resources to be created within each environment, specifying different stages of data processing. For instance:
```
 fabric_stages = {
     "Store": { "lakehouses": ["Bronze", "Silver", "Gold"] },
     "Ingest": {},
     "Prepare": {},
     "Serve": {}
 }
```
In this configuration, lakehouses are created within the “Store” area, segmented into Bronze, Silver, and Gold layers to align with data lifecycle management.

Together, these variables enable a scalable and automated setup that generates workspaces, assigns capacities, and configures permissions across environments with minimal manual intervention.

Automating Workspace Setup

With the naming pattern, environments, stages, and Git integration configured, you’re ready to execute the script to set up your Fabric workspaces and lakehouses.

The script will automatically output the results of the setup process, as shown below:

Cleaning Up: Automating workspace deletion

In scenarios where workspaces need to be decommissioned, the python script cleanup_fabric_solution.py can be used to batch-deleting Fabric workspaces based on naming pattern, environments and stages.

Simply specify the naming pattern of you Fabric solution, environments and stages.

#region Fabric solution setup
fabric_solution_name = 'MyDataPlatform - {stage} [{environment}]'
fabric_environments = ['dev', 'tst', 'prd']
fabric_stages = ['Data', 'Ingest', 'Prepare', 'Serve']
#endregion

Conclusion

This approach to automating workspace creation in Microsoft Fabric accelerates setup, ensures consistency, and simplifies the integration of workspaces with Git for CI/CD. By leveraging Fabric REST APIs and Python, you’ll be able to manage and maintain your Fabric data platform workspaces efficiently across all environments.

In the near future, I’ll also be looking into using the Terraform Provider for Fabric which is currently in preview. And also though upon a lot of other topic related to automating Fabric. So stay tuned for more Peer insights!

You can download the notebooks and credentials.json file used in this post here:
https://github.com/gronnerup/Fabric/tree/main/FabricSolutionInit

Automating Microsoft Fabric: Extracting Identity Support data

Peer Grønnerup — Mon, 21 Oct 2024 13:33:26 GMT

🆕

The notebooks has been updated on the 28th of March 2025 to reflect changes in the documentation as well as automate the creating of a Lakehouse and import of report definition file directly from the GitHub repo.

In Microsoft Fabric, REST APIs play a crucial role in automating and optimizing various aspects of platform management, from CI/CD processes to maintaining a data lakehouse. They enable seamless interactions with Fabric items, making it easier to streamline data workflows and handle large-scale operations with minimal manual intervention. Understanding which identities - such as service principals or managed identities - are supported by different Fabric REST API endpoints is essential to ensure secure and efficient platform management.

And wouldn't it be great if we didn't have to visit each individual API documentation page to check which Microsoft Entra identities are supported? Constantly navigating through multiple pages to find this information can be time-consuming and inefficient. Fortunately, there's a way to automate this process, allowing us to extract and centralize the data with ease - saving both time and effort.

In this blog post, I'll walk through how to scrape Microsoft Fabric REST API documentation using a Fabric Notebook to extract information on supported identities for each endpoint. Once the data is extracted, we can leverage Semantic Link Labs to build a semantic model that exposes data from the Fabric Lakehouse.

And finally, we can create a report using Semantic Link, offering insights into how these identities are supported across various Fabric APIs.

The task for accomplishing the above split into 3 steps:

Extracting information from the Fabric REST API documentation
Creating a semantic model using Semantic Link Labs
Creating a Power BI report using Semantic Link Labs

Extracting Fabric REST API identity support information

To automate the extraction of identity support information from the Microsoft Fabric REST API documentation, I used BeautifulSoup (from the bs4 library) to scrape the necessary data directly from the Microsoft Learn site. Here's a brief overview of how the process works:

Setup Fabric Items: Start by creating a new Workspace and assigning it to a Fabric capacity. Next, import the two sample notebooks. You can find a link to the notebooks in the Conclusion section of this blog post.
Fetching the API Documentation: The code starts by making an HTTP request using requests.get() to fetch the table of contents (TOC) from Microsoft Learn, which is structured in JSON format. The TOC contains links to each API's documentation page.
Parsing the HTML: For each API page, BeautifulSoup parses the HTML content, looking for a specific section that lists the supported Microsoft Entra identities (e.g., User, Service Principal, and Managed Identities).
Extracting the Identity Data: Once the correct section is found, the code extracts the table containing identity types. The table rows are iterated over to capture the identity information for each API endpoint, storing the results in a structured format (data_list).
Handling Nested Documentation: My function extract_all_articles() recursively navigates through nested API documentation sections, ensuring that all relevant pages are checked, even when organized in hierarchical structures.

import requests

from pyspark.sql.types import StructType, StructField, StringType
from bs4 import BeautifulSoup
from pyspark.sql.functions import *

baseurl = "https://learn.microsoft.com/en-us/rest/api/fabric/"

### Extract Fabric API documentation
response = requests.get(baseurl+"toc.json")
data = response.json()

# Call the extract_all_articles function and store the return value as data_list
data_list = extract_all_articles(data)

This approach allows us to programmatically gather the identity support data, eliminating the need to manually check each API page.

Once collected, the data can be processed further or integrated into a Fabric Lakehouse for analysis. In our case we convert the data_list to a Spark DataFrame and write the DataFrame to a Delta table in our lakehouse. Also we create a manual table holding each Identity option. This table will be used for grouping and filtering APIs in the Power BI report which we will create later.

Build semantic model using Semantic Link Labs

After extracting the necessary data from the Microsoft Fabric REST API documentation, the next step is to leverage Semantic Link Labs to create a semantic model. Semantic Link Labs is a Python library designed for use in Microsoft Fabric notebooks. This library extends the capabilities of Semantic Link offering additional functionalities to seamlessly integrate and work alongside it. Semantic Link Labs simplifies building semantic models, reports and more directly from our Fabric notebooks.

To use Semantic Link Labs we first need to install the Semantic Link Labs package within our Fabric Notebook environment. This can be done by running:

%pip install semantic-link-labs

Once Semantic Link Labs is installed, we can generate a blank semantic model as a foundation to which we will add our extracted data.

This blank model serves as a starting point, where we’ll later introduce the tables and data derived from your scraping process, along with defining specific measures and hierarchies needed for reporting.

import sempy_labs as labs
from sempy_labs.tom import connect_semantic_model
from sempy_labs import report

lakehouse_name = "FabricDocs"
lakehouse = mssparkutils.lakehouse.get(lakehouse_name)
workspace_name = notebookutils.runtime.context.get("currentWorkspaceName")

# Create a new blank semantic model
semantic_model_name = f"{lakehouse_name}_Model"
labs.create_blank_semantic_model(semantic_model_name)

After creating the blank model, we will connect to it (using connect_semantic_model) and add objects like tables, expressions, hierarcies etc.

Create a new report using Semantic Link Labs

Finally, after setting up the semantic model, we will create a report that exposes the extracted data from our Direct Lake semantic model. This is also achieved using Semantic Link Labs, which enables us to seamlessly generate reports based on the data stored in the model.

The following code is used to create the report:

# Read the file as a DataFrame where each row represents a line in the file
df = spark.read.text("Files/report.json")

# Convert the DataFrame rows (lines) into a single string
json_raw = ''.join(df.rdd.map(lambda row: row[0]).collect())
jobject = json.loads(json_raw)

# Create a new report based on the report.json file located in our Lakehouse
labs.report.create_report_from_reportjson(
    report="Fabric REST API Docs", 
    dataset=semantic_model_name, 
    report_json=jobject, 
    workspace=workspace_name
    )

This code reads a JSON file, which contains the report structure, and uses it to create a new report that is tied to the semantic model you previously built. This allows you to easily visualize and analyze the identity data extracted from the Microsoft Fabric REST API documentation, directly within your Fabric Lakehouse environment.

Conclusion

The Microsoft Fabric APIs are essential for automating key components of your Fabric setup, providing a strong foundation for CI/CD, governance, and scaling your data platform. By extracting and centralizing identity support information from the API documentation, you can streamline processes and ensure that your platform is built with both efficiency and security in mind.

In the near future, I’ll be publishing more articles on how to leverage the Fabric REST APIs to jumpstart your Fabric Lakehouse Data Platform, manage CI/CD pipelines, and much more. So stay tuned for more insights!

You can download the notebook etc. used in this post here: https://github.com/gronnerup/Fabric/tree/main/FabricRestApiDocs.

Peer Insights - A Microsoft Fabric Blog

Designing for Automation in Microsoft Fabric

What Are Item IDs?

What Are Logical IDs?

Why Does This Matter?

Handling Feature Branches and Multi-Layered Workspaces

Within the Same Workspace

Referencing Items Across Workspaces

Automate with Confidence: The fabric-cicd Python Library

A Note on Variable Libraries

Final Thoughts

Fabric CLI Beyond Shell Commands

So why explore Fabric CLI Python modules directly?

Why not just use the subprocess module?

My Approach: Directly Using Fabric CLI Python Modules

How It Works

Running a Fabric Job

Additional Thoughts

Conclusion

Who's Calling?

Test Setup: Simulating Real-World Scheduling Scenarios

Execution Scenarios: What Identity Is Actually Used?

Top-Level Execution: Who Triggers the Job?

Notebook Execution from Notebooks

Data Pipelines: A More Complex Story

🔹 Activities that use connections

🔹 Activities that do not use connections

Real-Life Example: A Lakehouse Medallion Architecture

Feeling Lost? You’re Not Alone

Known Bug: When Notebooks Fail Under a Service Principal

What’s the Problem?

What Fails?

⚠️ Importing sempy.fabric Under a Service Principal

What Still Works?

Workarounds

Why This Bug Matters for CI/CD and Execution Context

A Practical Workaround: Let a Web Activity Re-Assign Ownership

Pipeline Template: Available on GitHub

Automating Feature Workspace maintainance in Microsoft Fabric

Automating feature workspace maintainance using GitHub Actions

Prerequisites and Requirements

1. Service Principal Authentication

2. GitHub Personal Access Token (PAT)

My Approach to Continuous Integration with Git

Workspace Structure: Layer-Separated for Clarity and Control

Git Repository Structure: One Repo to Rule Them All

Automating the Feature Development Process

Taking It to the Next Level: Automating Workspace Creation

GitHub Workflows: Creation and Cleanup

Quickstart Walkthrough

Using Azure DevOps Pipelines

Prerequisites and Requirements

1. Setup Azure DevOps Repo

2. Variable Group for holding Service Principal credentials

3. Create Azure DevOps Pipelines for feature workspace creation and feature teardown

4. Create Azure DevOps Azure DevOps source control connections

Tip: Dynamically defining source control connections

Wrapping Up

Automating Fabric: Maintaining workspace icon images

Fabric Notebook to automating Workspace Image uploads

Requirements

Filtering Workspaces

Defining Workspace Icons

Updating Workspace Icons

Using the non-documented metadata endpoint

Making API Calls

Conclusion

Automating Fabric: Dynamically Configuring Microsoft Fabric Data Pipelines

The challenge of automating Data Pipelines in Fabric

A Common Scenario

Two Approaches to Invoking Data Pipelines

1. Legacy Invoke Data Pipeline

2. Invoke Pipeline (Preview)

A Dynamic Solution for CI/CD

Step 1: Extract Pipeline Metadata

Step 2: Dynamically Set Workspace and Pipeline References

Why This Works for CI/CD

Conclusion

Automating Microsoft Fabric: Private Endpoint Setup in workspaces

Previous Approach to Automating Managed Private Endpoint Creation

Why not just use the `subprocess` module?

⚠️ Importing `sempy.fabric` Under a Service Principal