<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Peer Insights - A Microsoft Fabric Blog]]></title><description><![CDATA[Welcome to Peer Insights! This blog focuses on Microsoft Fabric, mainly from a data engineering perspective. Find insights to help navigate and leverage the pla]]></description><link>https://peerinsights.emono.dk</link><generator>RSS for Node</generator><lastBuildDate>Sat, 11 Apr 2026 07:08:34 GMT</lastBuildDate><atom:link href="https://peerinsights.emono.dk/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Designing for Automation in Microsoft Fabric]]></title><description><![CDATA[In the fast-evolving world of enterprise data platforms, automation is not a luxury - it's a necessity. When working with Microsoft Fabric, especially in scalable solutions that span across multiple environments and logical layers (like ingestion, tr...]]></description><link>https://peerinsights.emono.dk/designing-for-automation-in-microsoft-fabric</link><guid isPermaLink="true">https://peerinsights.emono.dk/designing-for-automation-in-microsoft-fabric</guid><category><![CDATA[microsoftfabric]]></category><category><![CDATA[cicd]]></category><category><![CDATA[microsoft fabric]]></category><dc:creator><![CDATA[Peer Grønnerup]]></dc:creator><pubDate>Sun, 20 Jul 2025 20:14:04 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1753041756957/2695bfc0-d16f-4abe-b1e5-8aa057818350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the fast-evolving world of enterprise data platforms, automation is not a luxury - it's a necessity. When working with Microsoft Fabric, especially in scalable solutions that span across multiple environments and logical layers (like ingestion, transformation, and serving), it’s essential to embed automation into your design from day one.</p>
<p>A crucial aspect often overlooked is how Microsoft Fabric uniquely identifies artifacts (items) across workspaces and environments. Understanding the difference between <strong>item IDs</strong> and <strong>logical IDs</strong>, and how they’re used, is foundational to building robust, automated solutions that can scale and deploy seamlessly across dev, test, and production environments.</p>
<h2 id="heading-what-are-item-ids">What Are Item IDs?</h2>
<p>In Microsoft Fabric, an <strong>item ID</strong> is a globally unique identifier (GUID) that represents a specific item in a workspace - whether it's a notebook, lakehouse, pipeline, or other Fabric artifact. This ID is visible in the URL when navigating to an item, like in this example:</p>
<pre><code class="lang-plaintext">https://app.fabric.microsoft.com/groups/9bcbb7d4-13f7-4bc2-a261-22be96a809dc/pipelines/82e7a8f1-e593-414d-8fab-c9b34a267772
</code></pre>
<p>Here, <code>82e7a8f1-e593-414d-8fab-c9b34a267772</code> is the item ID for the pipeline, and <code>9bcbb7d4-13f7-4bc2-a261-22be96a809dc</code> is the workspace ID.</p>
<p>These IDs are also used in Fabric REST APIs, such as the <a target="_blank" href="https://learn.microsoft.com/en-us/rest/api/fabric/core/items/list-items">List Items</a> and <a target="_blank" href="https://learn.microsoft.com/en-us/rest/api/fabric/core/items/get-item">Get Item</a> operations. They're fundamental to how Fabric tracks and manages content internally.</p>
<h2 id="heading-what-are-logical-ids">What Are Logical IDs?</h2>
<p><strong>Logical IDs</strong> are different, they're Git-related and exist only for items in source-controlled workspaces.</p>
<p>A <strong>logical ID</strong> is a unique identifier that links a Fabric item in the workspace to its corresponding file and configuration in a Git branch. Think of it as the “anchor” between what lives in Fabric and what’s committed to source control. This makes logical IDs vital in Git-integrated workflows, especially when names or paths change across branches or environments.</p>
<p>You can find the logical ID in the <code>.platform</code> system file that Fabric automatically generates inside the item’s Git directory:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"$schema"</span>: <span class="hljs-string">"https://developer.microsoft.com/json-schemas/fabric/gitIntegration/platformProperties/2.0.0/schema.json"</span>,
  <span class="hljs-attr">"metadata"</span>: {
    <span class="hljs-attr">"type"</span>: <span class="hljs-string">"DataPipeline"</span>,
    <span class="hljs-attr">"displayName"</span>: <span class="hljs-string">"Controller - Full"</span>
  },
  <span class="hljs-attr">"config"</span>: {
    <span class="hljs-attr">"version"</span>: <span class="hljs-string">"2.0"</span>,
    <span class="hljs-attr">"logicalId"</span>: <span class="hljs-string">"bffcdc62-7e33-83b0-4dc9-0f7957777e88"</span>
  }
}
</code></pre>
<p>More details: <a target="_blank" href="https://learn.microsoft.com/en-us/fabric/cicd/git-integration/source-code-format?tabs=v2#automatically-generated-system-files">Source Code Format – Microsoft Learn</a></p>
<p>For items in <strong>non</strong> source-controlled workspaces, the <code>logicalId</code> will be a blank GUID:</p>
<blockquote>
<p><code>00000000-0000-0000-0000-000000000000</code></p>
</blockquote>
<h2 id="heading-why-does-this-matter">Why Does This Matter?</h2>
<p>When promoting artifacts across environments, from development to test to production, <strong>you don’t want your production controller pipelines referencing development notebooks or ingestion pipelines.</strong> This is where logical IDs shine - they enable Fabric to resolve internal references based on the Git-tracked logical structure, not hardcoded workspace or item IDs.</p>
<h2 id="heading-handling-feature-branches-and-multi-layered-workspaces">Handling Feature Branches and Multi-Layered Workspaces</h2>
<p>I'm a strong advocate for using <strong>feature-isolated development workspaces</strong> and separating your Fabric solution into layers such as Storage, Ingest, Prepare and deployed across <strong>at least three environments</strong>: dev, test (PPE), and prod. This architecture is standard in mature enterprise solutions.</p>
<p>One common question I hear:</p>
<blockquote>
<p>“Which workspace should I refer to when invoking other pipelines or notebooks from my controller pipeline?”</p>
</blockquote>
<h3 id="heading-within-the-same-workspace">Within the Same Workspace</h3>
<p>Let’s say you’re referencing another item (pipeline or notebook) <strong>within the same workspace</strong>. In that case, <strong>always</strong> reference the version from your <strong>feature workspace</strong>, not the main dev workspace.</p>
<p>Here’s what happens:</p>
<p>When editing the pipeline in the Fabric UI, you’ll see this reference:</p>
<pre><code class="lang-json"><span class="hljs-string">"typeProperties"</span>: {
  <span class="hljs-attr">"notebookId"</span>: <span class="hljs-string">"0e23e4cb-caf5-41bd-8161-ad34f69679ce"</span>,
  <span class="hljs-attr">"workspaceId"</span>: <span class="hljs-string">"9bcbb7d4-13f7-4bc2-a261-22be96a809dc"</span>
}
</code></pre>
<p>But once committed to Git, Fabric rewrites it as:</p>
<pre><code class="lang-json"><span class="hljs-string">"typeProperties"</span>: {
  <span class="hljs-attr">"notebookId"</span>: <span class="hljs-string">"f69679ce-ad34-8161-41bd-caf50e23e4cb"</span>,
  <span class="hljs-attr">"workspaceId"</span>: <span class="hljs-string">"00000000-0000-0000-0000-000000000000"</span>
}
</code></pre>
<p>What’s going on? Fabric automatically replaces the item ID with the <strong>logical ID</strong>, and sets the workspace ID to a blank GUID to indicate an <strong>intra-workspace reference</strong>. It’s elegant and powerful because it allows seamless deployment without having to manually update references between environments.</p>
<h2 id="heading-referencing-items-across-workspaces">Referencing Items Across Workspaces</h2>
<p>Now, when you need to reference items <strong>across</strong> workspaces (e.g., from a controller pipeline in the orchestration layer to an ingestion pipeline in another workspace), Fabric does <strong>not</strong> resolve logical IDs automatically. You need to provide actual workspace and item IDs - and ideally <strong>dynamically</strong>.</p>
<p>To automate this, check out one of my previous blog posts:<br />👉 <a target="_blank" href="https://peerinsights.hashnode.dev/automating-fabric-dynamically-configuring-microsoft-fabric-data-pipelines">Automating Fabric: Dynamically Configuring Microsoft Fabric Data Pipelines</a></p>
<p>You can also use SemPy functions in your Notebooks like:</p>
<ul>
<li><p><code>resolve_item_id</code></p>
</li>
<li><p><code>resolve_item_name</code></p>
</li>
<li><p><code>resolve_workspace_name</code></p>
</li>
<li><p><code>resolve_workspace_name_and_id</code></p>
</li>
</ul>
<p>These help dynamically fetch the correct IDs based on the environment context.</p>
<h2 id="heading-automate-with-confidence-the-fabric-cicd-python-library">Automate with Confidence: The fabric-cicd Python Library</h2>
<p>For deploying Git-connected Fabric workspaces, I highly recommend using the fabric-cicd Python library. Purpose-built for this exact use case, it has quickly become the preferred deployment tool among many Fabric professionals.</p>
<p>The library enables code-first CI/CD automation by allowing you to deploy workspaces directly from a Git repository structure. It takes care of critical deployment tasks, such as replacing logical IDs with the actual item IDs of the newly deployed artifacts.</p>
<p>A standout feature is its support for environment-specific configurations using a parameters.yml file. This file lets you define and programmatically override values depending on the target environment. That includes, but isn’t limited to, workspace IDs, item IDs, connection strings, connection IDs, and more.</p>
<p>This makes the fabric-cicd library especially powerful for multi-environment deployments where automation, consistency, and traceability are key.</p>
<p>Don't miss the great introductory blog post by Jacob Knightley:<br />👉 <a target="_blank" href="https://blog.fabric.microsoft.com/en-US/blog/introducing-fabric-cicd-deployment-tool/">Introducing the Fabric CICD Deployment Tool</a></p>
<h2 id="heading-a-note-on-variable-libraries">A Note on Variable Libraries</h2>
<p>Variable Libraries are a great addition to Fabric and will simplify many deployment scenarios. However, <strong>they do not handle dynamic references to other Fabric items</strong> like notebooks or pipelines. For that, logical IDs (and tools like <code>fabric-cicd</code>) are still essential.</p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>I hope this post has helped clarify the role of logical IDs in Microsoft Fabric and why they’re vital when working with Git-connected workspaces. Designing your solution with <strong>automation and environment promotion in mind</strong> is key to building scalable, robust, and enterprise-ready data platforms.</p>
<p>Keep an eye out for my upcoming (and very small) post on how to use <code>fabric-cicd</code> to deploy <strong>multiple interconnected workspaces</strong> (Ingest, Prepare, etc.) while preserving cross-layer references.</p>
<p>Until then - automate everything, and automate it smartly 😉</p>
]]></content:encoded></item><item><title><![CDATA[Fabric CLI Beyond Shell Commands]]></title><description><![CDATA[The Microsoft Fabric CLI has become an essential tool for automating and managing your Fabric environments. There are many great articles out there on how to use the CLI locally, from your CI/CD pipelines, and even from within Fabric Notebooks - whet...]]></description><link>https://peerinsights.emono.dk/fabric-cli-beyond-shell-commands</link><guid isPermaLink="true">https://peerinsights.emono.dk/fabric-cli-beyond-shell-commands</guid><category><![CDATA[microsoft fabric]]></category><category><![CDATA[fabriccli]]></category><dc:creator><![CDATA[Peer Grønnerup]]></dc:creator><pubDate>Wed, 11 Jun 2025 05:47:23 GMT</pubDate><content:encoded><![CDATA[<p>The Microsoft Fabric CLI has become an essential tool for automating and managing your Fabric environments. There are many great articles out there on how to use the CLI locally, from your CI/CD pipelines, and even from within Fabric Notebooks - whether that’s using Python’s <code>subprocess</code> to run CLI commands, the <code>!</code> operator for single shell commands, or magic commands such as <code>%%sh</code> or <code>%%bash</code> to execute entire cells in a subprocess.</p>
<p>👉 Sandeep Pawar recently wrote an excellent article on using the Fabric CLI in notebooks, which you can find here: <a target="_blank" href="https://fabric.guru/using-fabric-cli-in-fabric-notebook">Using Fabric CLI in Fabric Notebook</a>.</p>
<p>👉 I also recently shared a blogpost on <strong>automating feature workspace maintenance in Microsoft Fabric</strong> using Python, the Fabric CLI, and GitHub Actions: <a target="_blank" href="https://peerinsights.hashnode.dev/automating-feature-workspace-maintainance-in-microsoft-fabric">Read it here</a>.</p>
<h2 id="heading-so-why-explore-fabric-cli-python-modules-directly">So why explore Fabric CLI Python modules directly?</h2>
<p>With Fabric User Data Functions (UDFs) now in public preview, I decided to investigate whether we could bypass the system shell entirely and <strong>leverage the Fabric CLI’s Python modules and functions directly</strong> instead of executing <code>fab</code> commands in a subprocess.</p>
<h3 id="heading-why-not-just-use-the-subprocess-module"><strong>Why not just use the</strong> <code>subprocess</code> <strong>module?</strong></h3>
<p>Even though you can <strong>add the</strong> <code>ms-fabric-cli</code> library from PyPI in the library management section of your UDF, running shell commands (<a target="_blank" href="http://subprocess.run"><code>subprocess.run</code></a><code>(["fab", ...])</code>) doesn’t work because:</p>
<ul>
<li><p><strong>Fabric UDFs run in sandboxed environments</strong> where direct shell access is restricted.</p>
</li>
<li><p>The <code>subprocess</code> module is often locked down or lacks access to the underlying system shell.</p>
</li>
<li><p>It’s a security and resource isolation measure to ensure reliability and consistency of Fabric workloads.</p>
</li>
</ul>
<p>For standard CLI usage, you should instead run the CLI on your local machine, on a VM, in a container, or through your CI/CD environment (like Azure DevOps or GitHub Actions). Or simply use the <strong>public REST APIs</strong> instead, which are HTTP-based and work well directly in code.</p>
<h2 id="heading-my-approach-directly-using-fabric-cli-python-modules">My Approach: Directly Using Fabric CLI Python Modules</h2>
<p>Since the Fabric CLI is written in Python and is installed via <code>pip</code>, I thought - why not see if I could use the <strong>underlying Python modules</strong> directly?</p>
<p>My goal was to create a Fabric UDF that would run a job synchronously using the same logic as the <code>fab job run</code> command.</p>
<p>To make this work, you must <strong>add the</strong> <code>ms-fabric-cli</code> library from PyPI in the Library management section of your Fabric UDF.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749544697509/1a4dd789-f1b7-4adc-8266-cef49e04d5af.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-how-it-works">How It Works</h3>
<p>First observation - each CLI command has its own dedicated subpackage:</p>
<ul>
<li><code>auth</code>, <code>config</code>, <code>jobs</code>, <code>fs</code> (for filesystem commands), <code>acl</code>, and more.</li>
</ul>
<p>For example, to <strong>configure encryption fallback</strong> and <strong>log in using a service principal</strong>, you can directly import and use:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> argparse <span class="hljs-keyword">import</span> Namespace
<span class="hljs-keyword">from</span> fabric_cli.commands.config <span class="hljs-keyword">import</span> fab_config
<span class="hljs-keyword">from</span> fabric_cli.commands.auth <span class="hljs-keyword">import</span> fab_auth

<span class="hljs-comment"># Set encryption fallback</span>
args = Namespace(
    command_path=[<span class="hljs-string">"/"</span>],
    path=[<span class="hljs-string">"/"</span>],
    command=<span class="hljs-string">"config"</span>,
    config_command=<span class="hljs-string">"set"</span>,
    key=<span class="hljs-string">"encryption_fallback_enabled"</span>,
    value=<span class="hljs-string">"true"</span>
)
fab_config.set_config(args)

<span class="hljs-comment"># Login using service principal</span>
args = Namespace(
    auth_command=<span class="hljs-string">"login"</span>,
    username=<span class="hljs-string">"*****"</span>,
    password=<span class="hljs-string">"*****"</span>,
    tenant=<span class="hljs-string">"*****"</span>,
    identity=<span class="hljs-literal">None</span>,
    federated_token=<span class="hljs-literal">None</span>,
    certificate=<span class="hljs-literal">None</span>
)
fab_auth.init(args)
fab_auth.status(<span class="hljs-literal">None</span>)  <span class="hljs-comment"># Check current authentication status</span>
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">⚠</div>
<div data-node-type="callout-text"><strong>This approach is purely experimental! </strong>Hardcoding client IDs and secrets directly in a UDF is <strong>not recommended</strong> for production scenarios. Currently, UDFs don’t support features like <code>notebookutils</code> to fetch tokens or Key Vault secrets. However, you could create a connection to a Fabric Lakehouse or a Fabric SQL Database containing the credentials for the service principal.</div>
</div>

<h3 id="heading-running-a-fabric-job">Running a Fabric Job</h3>
<p>To run a job (similar to the <code>fab job run</code> command), you import the <code>fab_jobs</code> module. In my implementation, I take the workspace name, item name, and item type as input parameters to build the path for the item to run. These parameters are then used as arguments when executing the Fabric UDF.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fabric_cli.commands.jobs <span class="hljs-keyword">import</span> fab_jobs

args = Namespace(
    command_path=[<span class="hljs-string">"/"</span>],
    command=<span class="hljs-string">"job"</span>,
    jobs_command=<span class="hljs-string">"run"</span>,
    path=[<span class="hljs-string">f"<span class="hljs-subst">{workspacename}</span>.Workspace/<span class="hljs-subst">{itemname}</span>.<span class="hljs-subst">{itemtype}</span>"</span>]
)
fab_jobs.run_command(args)
</code></pre>
<p>And voilà! 🎉 With this concise yet powerful Fabric UDF, you can expose job execution to business super users. For example, finance teams can now trigger <strong>ad-hoc jobs during month-end close</strong> using a transalytical task flow to run a Fabric User data function handling the job execution - without granting the users deep admin access to the entire Fabric workspace.</p>
<h2 id="heading-additional-thoughts">Additional Thoughts</h2>
<p>One important consideration when using this approach is that you won’t see console outputs (like <code>print</code> statements) in the execution logs of your UDF runs. This can make troubleshooting or understanding the full execution flow challenging.</p>
<p>To address this, I’ve added logging of outputs and errors from the CLI commands directly into the UDF source code. You can find this implementation in the downloadable UDF example on my <a target="_blank" href="https://github.com/gronnerup/Fabric/blob/main/FabricCLI_UDF/Utils_UDF.py">GitHub</a><a class="post-section-overview" href="#">.</a></p>
<p>This ensures that all important outputs and errors are written to the log - making it much easier to monitor and debug these Fabric CLI-based UDFs in action.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1749543640630/a13cf73c-355f-48c8-90c0-0049ce888dea.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>While directly using the Fabric CLI’s Python modules in a UDF isn’t the recommended approach - many would argue that the public REST APIs are better suited for managing Fabric items within UDFs - this experiment showed that it’s possible to run CLI commands directly <strong>without</strong> relying on the shell.</p>
<p>It highlights the flexibility of the Fabric CLI’s architecture and suggests exciting future possibilities - imagine how powerful it would be to have a dedicated, fully supported Python interface for Fabric!</p>
<p>👉 <strong>You can find the full Fabric User Data Function implementation on my</strong> <a target="_blank" href="https://github.com/gronnerup/Fabric/blob/main/FabricCLI_UDF/Utils_UDF.py"><strong>GitHub</strong></a><strong>.</strong></p>
]]></content:encoded></item><item><title><![CDATA[Who's Calling?]]></title><description><![CDATA[During Microsoft Fabric project implementations, I’m frequently asked a deceptively simple question: “Under which identity is this running?” It turns out, the answer isn’t always straightforward - and to be honest, it’s a topic I’ve also found quite ...]]></description><link>https://peerinsights.emono.dk/whos-calling</link><guid isPermaLink="true">https://peerinsights.emono.dk/whos-calling</guid><category><![CDATA[microsoft fabric]]></category><category><![CDATA[microsoftfabric]]></category><category><![CDATA[Execution Context]]></category><category><![CDATA[Fabric Data Factory]]></category><dc:creator><![CDATA[Peer Grønnerup]]></dc:creator><pubDate>Fri, 09 May 2025 12:39:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1746522594407/5b67b869-1771-4dd0-9327-66ad58bb2113.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>During Microsoft Fabric project implementations, I’m frequently asked a deceptively simple question: <strong>“Under which identity is this running?”</strong> It turns out, the answer isn’t always straightforward - and to be honest, it’s a topic I’ve also found quite complex at times.</p>
<p>Just because a schedule was created by you doesn’t necessarily mean the entire job triggered by that schedule runs in your user context - or for that matter, in the context of the identity who created the item. And with the introduction of <strong>Service Principal</strong> support, things haven’t exactly become clearer. In fact, it often adds an extra layer of complexity to the already tricky landscape of execution context in Fabric.</p>
<p>In this post, I want to share some of the insights I’ve gathered - especially when working with <strong>data pipelines that trigger child notebooks and other downstream activities</strong>. We’ll look at how identities are used across different components, what you need to be aware of, and how to avoid common pitfalls. Or in short: <strong>Who’s calling?</strong> 📞</p>
<p>Finally, I’ll touch on a <strong>known bug</strong> in the Fabric API and the <strong>SemPy library</strong> that affects <strong>notebook execution in Service Principal contexts</strong>, a setup that’s becoming increasingly common in enterprise-grade, multi-environment data platforms.</p>
<h2 id="heading-test-setup-simulating-real-world-scheduling-scenarios">Test Setup: Simulating Real-World Scheduling Scenarios</h2>
<p>To explore how execution context behaves in Microsoft Fabric, I created a simple but representative setup. Using the <strong>Fabric CLI</strong>, I triggered <strong>on-demand executions</strong> of Fabric items like <strong>data pipelines</strong> that call <strong>child notebooks</strong> as well as triggering notebooks directly.</p>
<p>This setup allows us to control exactly <strong>who initiates the run -</strong> be it a <strong>user</strong> or a <strong>Service Principal</strong> - and observe how that identity flows (or doesn’t) through the various components.</p>
<p>Key components of the setup:</p>
<ul>
<li><p>A Data Pipeline with multiple activities (e.g., Invoke Notebook and Invoke Data Pipeline)</p>
</li>
<li><p>A Notebook which prints identity info as well as runtime properties and other relevant info</p>
</li>
<li><p>A parent Notebook which executes a child notebook (as the one above)</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/rest/api/fabric/articles/fabric-command-line-interface">Fabric CLI</a>-triggered job runs using both <strong>user identity</strong> and <strong>Service Principal</strong></p>
</li>
</ul>
<p>This approach mimics many enterprise deployment scenarios, especially in <strong>multi-environment setups</strong>.</p>
<h2 id="heading-execution-scenarios-what-identity-is-actually-used">Execution Scenarios: What Identity Is Actually Used?</h2>
<p>Regardless of whether a job is triggered by a <strong>user</strong> or a <strong>Service Principal</strong>, the same core logic applies when it comes to <strong>execution context</strong> in Microsoft Fabric. However, what happens next depends heavily on the <strong>type of item</strong> being executed and <strong>how</strong> it's executed.</p>
<p>Let’s break it down…</p>
<h3 id="heading-top-level-execution-who-triggers-the-job">Top-Level Execution: Who Triggers the Job?</h3>
<p>When a pipeline or notebook is triggered - either manually, via schedule, or through a CLI/API call - the <strong>top-level item</strong> (the pipeline or notebook itself) is executed in the <strong>context of the identity that triggered it</strong>.</p>
<p>That could be:</p>
<ul>
<li><p>A user account (e.g., developer in dev/test)</p>
</li>
<li><p>A service principal (e.g., a scheduled run in production)</p>
</li>
</ul>
<p>So far, so good. But once you go deeper, into <strong>child components and downstream activities</strong>, the picture becomes more complicated.</p>
<hr />
<h3 id="heading-notebook-execution-from-notebooks">Notebook Execution from Notebooks</h3>
<p>When one notebook triggers another (e.g., using <a target="_blank" href="http://mssparkutils.notebook.run"><code>notebookutils.notebook.run</code></a><code>()</code>), the <strong>child notebooks</strong> always inherit the <strong>execution context of the parent notebook</strong>.</p>
<p>✅ <em>If a notebook is triggered by a Service Principal, all downstream notebooks will run under the same Service Principal.</em></p>
<p>✅ <em>If a user triggers the parent notebook, all child notebooks will run under that user’s identity.</em></p>
<p>This behavior is consistent and predictable across environments.</p>
<hr />
<h3 id="heading-data-pipelines-a-more-complex-story">Data Pipelines: A More Complex Story</h3>
<p>With <strong>Data Pipelines</strong>, execution context is <strong>activity-specific</strong>. Here’s what governs it:</p>
<h4 id="heading-activities-that-use-connections">🔹 Activities that use <strong>connections</strong></h4>
<p>Examples: Copy Data, Invoke Pipeline (preview), Azure Databricks, Semantic model refresh, Web etc.<br />These activities <strong>run under the identity associated with the connection object</strong> used.</p>
<h4 id="heading-activities-that-do-not-use-connections">🔹 Activities that <strong>do not use connections</strong></h4>
<p>Examples: Notebook, Invoke Pipeline (Legacy activity), Dataflow, Spark Job Definition etc.<br />These activities run under the identity of the user or service principal who <strong>last modified</strong> the pipeline. This is the identity shown as <strong>"Last Modified By"</strong> in the Data Pipeline settings.</p>
<blockquote>
<p>⚠️ Yes, that means if you last edited a pipeline in dev as yourself, but deploy it in test using a service principal, the execution identity in test will be the service principal - even if the original intent was to run it as a user.</p>
</blockquote>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746795852549/b4148762-3299-4727-a499-20e96d7f7879.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-real-life-example-a-lakehouse-medallion-architecture">Real-Life Example: A Lakehouse Medallion Architecture</h3>
<p>Let’s ground this in a practical scenario - a common <strong>Lakehouse Data Platform</strong> with a <strong>3-layer medallion architecture</strong>:</p>
<ol>
<li><p>A <strong>controller pipeline</strong> kicks off the process.</p>
</li>
<li><p>It calls <strong>child pipelines</strong> that ingest raw data into the <strong>bronze</strong> layer.</p>
</li>
<li><p>Then it triggers a <strong>notebook</strong> that processes bronze into <strong>silver</strong>.</p>
</li>
<li><p>Another notebook handles transformations into <strong>gold</strong> (curated data).</p>
</li>
<li><p>Finally, the pipeline refreshes a <strong>semantic model</strong> as the last step.</p>
</li>
</ol>
<p>Here’s how execution context breaks down:</p>
<ul>
<li><p>Activities using <strong>connections</strong> (e.g., Copy Data or Semantic model refresh) run under the <strong>connection identity</strong>.</p>
</li>
<li><p>Notebooks in the pipeline (with no connection) run as the <strong>last modified identity of the pipeline -</strong> which could be a user or service principal.</p>
</li>
<li><p>If a child pipeline triggers a notebook, the same logic applies: the <strong>last modified identity of that pipeline</strong> determines the execution context of its notebook.</p>
</li>
</ul>
<p>So yes, it’s entirely possible that a single run involves:</p>
<ul>
<li><p>Data ingestion as one identity (connection)</p>
</li>
<li><p>Silver transformation as another (pipeline author)</p>
</li>
<li><p>Gold orchestration as yet another (child pipeline modifier)</p>
</li>
</ul>
<hr />
<h3 id="heading-feeling-lost-youre-not-alone">Feeling Lost? You’re Not Alone</h3>
<p>If you’re scratching your head, you’re not alone. The behavior is by design, but it does mean we need to be <strong>deliberate</strong> about how we:</p>
<ul>
<li><p>Modify items</p>
</li>
<li><p>Manage dependencies downstream</p>
</li>
<li><p>Set up connections</p>
</li>
<li><p>Deploy across environments</p>
</li>
</ul>
<p>Most importantly: <strong>how things run in development may not reflect how they run in test or production</strong> - especially if you use a service principal for automated deployments.</p>
<p>That’s why <strong>understanding execution context is critical</strong> for ensuring consistent behavior across environments in enterprise-grade solutions.</p>
<h2 id="heading-known-bug-when-notebooks-fail-under-a-service-principal">Known Bug: When Notebooks Fail Under a Service Principal</h2>
<p>While building enterprise-ready Fabric solutions, it’s increasingly common to run notebooks using <strong>Service Principals</strong>. However, there's a <strong>known bug</strong> that can cause unexpected failures when doing so.</p>
<h3 id="heading-whats-the-problem">What’s the Problem?</h3>
<p>Running a notebook under a Service Principal can break certain functions and environment references, especially those related to <strong>runtime context</strong> and <strong>authentication</strong>. The issue appears to stem from the <strong>scope or limitations of the Service Principal's token</strong>, and Microsoft has acknowledged it as a <strong>bug</strong>. The Fabric product team is actively working on a fix.</p>
<h3 id="heading-what-fails">What Fails?</h3>
<p>Here’s a list of some of the functions and methods that return <code>None</code> or throw errors when executed in a notebook under a Service Principal. Note that mssparkutils is going to be deprecated, notebookutils is the way to go. This is just to illustrate the issue:</p>
<ul>
<li><p><code>mssparkutils.env.getWorkspaceName()</code></p>
</li>
<li><p><code>mssparkutils.env.getUserName()</code></p>
</li>
<li><p><code>notebookutils.runtime.context.get('currentWorkspaceName')</code></p>
</li>
<li><p><code>fabric.resolve_workspace_id()</code></p>
</li>
<li><p><code>fabric.resolve_workspace_name()</code></p>
</li>
<li><p>Any SemPy <code>FabricRestClient</code> operations</p>
</li>
<li><p>Manual API calls using tokens from <code>notebookutils.mssparkutils.credentials.getToken("</code><a target="_blank" href="https://api.fabric.microsoft.com"><code>https://api.fabric.microsoft.com</code></a><code>")</code></p>
</li>
</ul>
<h3 id="heading-importing-sempyfabric-under-a-service-principal">⚠️ Importing <code>sempy.fabric</code> Under a Service Principal</h3>
<p>When executing a notebook in the context of a <strong>Service Principal</strong>, simply importing <code>sempy.fabric</code> will result in the following exception:</p>
<pre><code class="lang-plaintext">Exception: Fetch cluster details returns 401:b''
## Not In PBI Synapse Platform ##
</code></pre>
<p>This error occurs because <strong>SemPy</strong> attempts to fetch cluster and workspace metadata using the <strong>execution identity’s token</strong> - which, as mentioned earlier, lacks proper context or scope when it belongs to a Service Principal.</p>
<p>In short, <strong>any method that fetches workspace name</strong> <strong>or user name -</strong> or relies on the <strong>executing identity’s token for SemPy</strong> or <strong>REST API calls</strong> - is likely to fail or return <code>None</code>.</p>
<h3 id="heading-what-still-works">What Still Works?</h3>
<p>Surprisingly, not everything is broken. Here are some functions that still work under a Service Principal:</p>
<ul>
<li><p><code>spark.conf.get('</code><a target="_blank" href="http://trident.workspace.id"><code>trident.workspace.id</code></a><code>')</code> – this gives you the workspace ID reliably</p>
</li>
<li><p><code>sempy.fabric.get_workspace_id()</code> – still functional, eventhough importing <code>sempy.fabric</code> will throw an exception as shown above.</p>
</li>
<li><p><code>notebookutils.credentials.getSecret(...)</code> – useful for pulling secrets like client credentials from a Key Vault</p>
</li>
</ul>
<p>Using these, you can still <strong>manually generate a token</strong> and pass it into your REST requests - or even inject a custom <code>token_provider</code> into the SemPy <code>FabricRestClient</code>.</p>
<h3 id="heading-workarounds">Workarounds</h3>
<p>If you hit this issue, here are some paths forward:</p>
<ul>
<li><p>Avoid relying on runtime context methods when running under a Service Principal</p>
</li>
<li><p>Use a <strong>manual token approach</strong>: fetch your own token using credentials from Key Vault and use that in REST requests</p>
</li>
<li><p>Where possible, <strong>shift context resolution logic out of notebooks</strong> and into deployment orchestration or pipeline steps</p>
</li>
<li><p>Watch for updates: Microsoft is aware of the issue and a fix is on the way</p>
</li>
</ul>
<h2 id="heading-why-this-bug-matters-for-cicd-and-execution-context">Why This Bug Matters for CI/CD and Execution Context</h2>
<p>This issue ties directly back to the core topic of this blog post - <strong>execution context in Microsoft Fabric</strong>. Remember that when a <strong>notebook is triggered by a Data Pipeline</strong>, its execution identity depends on <strong>who last modified the data pipeline</strong>.</p>
<p>In modern CI/CD workflows - whether you're using <strong>Azure DevOps Pipelines</strong>, <strong>GitHub Actions</strong>, or any other automation platform - you’re most likely deploying with a <strong>Service Principal</strong>. That means after every deployment, <strong>the "Last Modified By" identity on your Data Pipelines becomes the Service Principal</strong>.</p>
<p>This wouldn’t be an issue <em>if</em> notebooks worked reliably under Service Principal identity. But as we've seen above, <strong>notebooks run into serious limitations when executed in that context</strong> - missing environment properties, failed API calls, and broken logic in dynamic configurations.</p>
<h3 id="heading-a-practical-workaround-let-a-web-activity-re-assign-ownership">A Practical Workaround: Let a Web Activity Re-Assign Ownership</h3>
<p>Here’s one way to get around it:<br />Use a <strong>Web activity in a Fabric Pipeline</strong> - configured with an <strong>OAuth2 connection for a specific user -</strong> to <strong>update the description</strong> of your Data Pipelines post-deployment.</p>
<p>Why this works:</p>
<ul>
<li><p>A Web activity executes in the context of the <strong>connection identity</strong></p>
</li>
<li><p>Updating the pipeline’s description (even just reapplying the same description) is enough to change the <strong>"Last Modified By"</strong> property</p>
</li>
<li><p>As a result, <strong>all notebooks executed by those pipelines will now run in the context of the user tied to the OAuth2 connection</strong>, not the Service Principal</p>
</li>
</ul>
<p>This allows you to:</p>
<ul>
<li><p>Deploy pipelines automatically with a Service Principal</p>
</li>
<li><p>Then post-process them to <strong>re-assign their execution identity to a user</strong>, for scenarios where notebook behavior matters</p>
</li>
</ul>
<p>This approach also allows you to apply filters to target only specific Data Pipelines, updating the <strong>Last Modified By</strong> property selectively. This way, you can still support notebook execution under a Service Principal where needed.</p>
<h3 id="heading-pipeline-template-available-on-github">Pipeline Template: Available on GitHub</h3>
<p>You can see a visual of this post-deployment ownership adjustment pipeline below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746788273598/de8f1815-9c60-4ef0-9515-65f9ee0aada9.png" alt class="image--center mx-auto" /></p>
<p>I’ve also published the <strong>pipeline definition</strong> on my GitHub including a short description on how to use the 2 parameters: <a target="_blank" href="https://github.com/gronnerup/Fabric/tree/main/FabricExecutionContext">View on GitHub</a></p>
<blockquote>
<p><strong>Note:</strong> All activities in the definition are currently <strong>disabled by default</strong> so you can safely copy-paste it into your own Fabric Data Pipeline json definition and adjust the <strong>connection settings</strong>, <strong>pipeline selection logic</strong> etc. as needed.</p>
</blockquote>
]]></content:encoded></item><item><title><![CDATA[Automating Feature Workspace maintainance in Microsoft Fabric]]></title><description><![CDATA[📣
Update – July 2025: This post has been updated to reflect new support for connecting and synchronizing Microsoft Fabric workspaces with Azure DevOps Repos using a Service Principal. The update includes a new section on Azure DevOps setup, covering...]]></description><link>https://peerinsights.emono.dk/automating-feature-workspace-maintainance-in-microsoft-fabric</link><guid isPermaLink="true">https://peerinsights.emono.dk/automating-feature-workspace-maintainance-in-microsoft-fabric</guid><category><![CDATA[microsoftfabric]]></category><category><![CDATA[ci-cd]]></category><dc:creator><![CDATA[Peer Grønnerup]]></dc:creator><pubDate>Mon, 21 Apr 2025 08:53:40 GMT</pubDate><content:encoded><![CDATA[<div data-node-type="callout">
<div data-node-type="callout-emoji">📣</div>
<div data-node-type="callout-text"><strong>Update – July 2025:</strong> This post has been updated to reflect new support for connecting and synchronizing Microsoft Fabric workspaces with Azure DevOps Repos using a Service Principal. The update includes a new section on Azure DevOps setup, covering the required permissions, repository access etc. when using Azure DevOps as your Git provider.</div>
</div>

<p>At the Microsoft Fabric Community Conference in Las Vegas in April 2025, Microsoft announced the public preview of the <a target="_blank" href="https://learn.microsoft.com/en-us/fabric/admin/fabric-command-line-interface"><strong>Fabric CLI</strong></a> - a powerful, developer-first command line interface that brings a file-system-inspired way to explore and manage your Fabric environment. As someone who's been deep in the weeds with the Fabric REST APIs for quite some time (and have blogged about it before), I was excited to see how the Fabric CLI was building on the APIs to make automation more intuitive and accessible than ever.</p>
<p>In this blog post, I’ll walk you through how to use the Fabric CLI <em>from within Python</em> to support a best-practice approach for <strong>auto-generating and auto-configuring feature development workspaces</strong> in Microsoft Fabric.</p>
<p>In myexample, I’ll first focus <strong>GitHub Actions</strong> along with a <strong>service principal</strong> for authentication.  </p>
<p>When this post was originally published, <strong>Service Principal authentication was only supported when using GitHub as the Git provider</strong> in Microsoft Fabric.</p>
<p>However, Fabric is evolving rapidly and with that evolution, we now have <strong>Service Principal support for the Git Connect operations via the Fabric REST APIs when using Azure DevOps</strong> as the Git provider as well.</p>
<p>⚠️ <strong>Important caveat:</strong> Service Principal <strong>is not supported</strong> when the Git provider is Azure DevOps <strong>and</strong> the authentication method is set to <strong>"Automatic"</strong>.</p>
<p>There are additional details and setup requirements you'll want to be aware of. I’ve included a <strong>dedicated section at the end of this post</strong> covering how to configure Azure DevOps to work with a Service Principal for secure, automated workspace synchronization.</p>
<p>This post ties closely to my session at FabCon 2025 in Las Vegas <strong>"From Setup to CI/CD: Automating Microsoft Fabric for Scalable Data Solutions"</strong> - where I showcased an end-to-end automation approach. If you’re interested, you can find the session materials and sample code here:</p>
<ul>
<li><p>🔗 <strong>Session code &amp; presentation from FabCon 2025:</strong> <a target="_blank" href="https://github.com/gronnerup/Fabric">github.com/gronnerup/Fabric</a></p>
</li>
<li><p>🛠 <strong>This article's code repo (ongoing work):</strong> <a target="_blank" href="https://github.com/gronnerup/FabricAutomation">github.com/gronnerup/FabricAutomation</a></p>
</li>
</ul>
<h2 id="heading-automating-feature-workspace-maintainance-using-github-actions">Automating feature workspace maintainance using GitHub Actions</h2>
<p>This section focuses on automating the setup and teardown of feature workspaces using <strong>GitHub Actions</strong>.<br />While the examples here use GitHub, <strong>much of the approach also applies when using Azure DevOps</strong>.<br />If you're working with <strong>Azure DevOps Pipelines</strong>, be sure to check out the dedicated section at the end of this post for platform-specific guidance.</p>
<h3 id="heading-prerequisites-and-requirements">Prerequisites and Requirements</h3>
<p>Before automating the creation of isolated <strong>feature development workspaces</strong> in Microsoft Fabric using the Fabric CLI and GitHub Actions, make sure you have the following in place:</p>
<h4 id="heading-1-service-principal-authentication">1. Service Principal Authentication</h4>
<p>This solution uses <strong>service principal authentication with a client secret</strong>, allowing secure, automated access to your Fabric environment. You’ll need to create an <strong>App Registration</strong> in Microsoft Entra ID and ensure the service principal is properly configured for Fabric API access.</p>
<p>In your GitHub repository, define these <strong>repository secrets</strong>:</p>
<ul>
<li><p><code>SPN_TENANT_ID</code> – The <strong>Tenant ID</strong> of your Microsoft Fabric environment.</p>
</li>
<li><p><code>SPN_CLIENT_ID</code> – The <strong>Client ID</strong> (Application ID) of your app registration.</p>
</li>
<li><p><code>SPN_CLIENT_SECRET</code> – The <strong>Client Secret</strong> of the app registration.</p>
</li>
</ul>
<p>Make sure the service principal is <strong>enabled for the Fabric REST APIs</strong> by following the official guidance:<br /><a target="_blank" href="https://learn.microsoft.com/en-us/rest/api/fabric/articles/identity-support#service-principal-tenant-setting">Enable service principal for Fabric REST APIs</a></p>
<h4 id="heading-2-github-personal-access-token-pat">2. GitHub Personal Access Token (PAT)</h4>
<p>You’ll also need to create a <strong>GitHub Personal Access Token (PAT)</strong> to enable Fabric’s Git integration. This token is used to authenticate Fabric when connecting to your GitHub repository.</p>
<p>Follow this guide to create a PAT and connect your workspace to Git:<br /><a target="_blank" href="https://learn.microsoft.com/en-us/fabric/cicd/git-integration/git-get-started?tabs=azure-devops%2CGitHub%2Ccommit-to-git#connect-to-a-git-repo">Connect to a Git repo (Microsoft Learn)</a></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">⚠</div>
<div data-node-type="callout-text"><strong>Important:</strong> Before setting up Git integration, review the <strong>considerations and limitations</strong> outlined here: <a target="_new" href="https://learn.microsoft.com/en-us/fabric/cicd/git-integration/intro-to-git-integration?tabs=github#considerations-and-limitations">Git integration: Considerations and Limitations</a></div>
</div>

<p><strong>3. Fork the Repository</strong></p>
<p>To get started, <strong>fork the repository</strong> to your own GitHub account so you can safely configure secrets and CI/CD pipelines: <a target="_blank" href="https://github.com/gronnerup/FabricAutomation">https://github.com/gronnerup/FabricAutomation</a></p>
<p>This provides a clean slate to experiment and build on top of the existing automation approach.</p>
<h3 id="heading-my-approach-to-continuous-integration-with-git">My Approach to Continuous Integration with Git</h3>
<p>When implementing Continuous Integration (CI) in Microsoft Fabric, it's essential to have a clear structure for both your <strong>workspaces</strong> and your <strong>Git repository</strong>. This helps ensure that your development process supports scalability, collaboration, and automation from day one.</p>
<h4 id="heading-workspace-structure-layer-separated-for-clarity-and-control">Workspace Structure: Layer-Separated for Clarity and Control</h4>
<p>The question of how to best structure workspaces in Fabric has been the subject of many discussions across blog posts, LinkedIn and Reddit threads. While there's no single “right” answer, my recommendation, based on practical experience and architectural clarity, is to follow a <strong>layer-separated workspace pattern</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745072238160/a0a0f33b-3bc2-48bf-ac00-d6ae6d4c5438.png" alt class="image--center mx-auto" /></p>
<p>This architecture separates your Fabric solution into logical layers, such as:</p>
<ul>
<li><p><strong>Store</strong>: Lakehouse etc.</p>
</li>
<li><p><strong>Ingest</strong>: Notebooks, Data Pipelines, etc.</p>
</li>
<li><p><strong>Prepare</strong>: Notebooks focused on shaping and cleansing data</p>
</li>
<li><p><strong>Serve</strong>: Semantic Models and related artifacts</p>
</li>
<li><p><strong>Orchestrate</strong>: Data Pipelines or Notebooks driving execution logic</p>
</li>
<li><p><strong>Core</strong>: Components such as Variable Libraries, Environments, and Fabric Databases used for metadata</p>
</li>
</ul>
<p>Each layer gets its <strong>own dedicated workspace</strong>, allowing for:</p>
<ul>
<li><p><strong>Transparent organization</strong> of items and responsibilities</p>
</li>
<li><p><strong>Improved access control</strong> at the workspace level</p>
</li>
<li><p><strong>Capacity separation</strong>, which is especially useful in large-scale environments</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">📘</div>
<div data-node-type="callout-text">I’ve previously written about this setup and why I believe it’s a solid foundation for modern Fabric development: 🔗 <a target="_new" href="https://peerinsights.hashnode.dev/automating-fabric-kickstart-your-fabric-data-platform-setup">Automating Fabric: Kickstart Your Fabric Data Platform Setup</a></div>
</div>

<p>This structure does introduce one consideration: <strong>isolated feature development workspaces</strong> may need to mirror more than one layer, depending on the scope of the feature being implemented. In other words, a single feature branch may touch multiple workspaces and that’s okay, as long as it’s organized.</p>
<h4 id="heading-git-repository-structure-one-repo-to-rule-them-all">Git Repository Structure: One Repo to Rule Them All</h4>
<p>To support this workspace setup effectively, I recommend keeping <strong>all your Fabric resources in a single Git repository</strong>. Within this repo, each solution layer is represented by a subfolder, and each layer-specific workspace connects to its respective folder via Fabric’s Git integration.</p>
<p>A typical structure might look like this:</p>
<pre><code class="lang-plaintext">/.azure-pipelines    # Azure DevOps pipelines
/.github             # GitHub Actions workflows
/automation          # Scripts, deployment helpers etc.
/documentation       # Solution documentation. Can be used for Azure DevOps project Wiki
/solution            # Solution folders for the different layers
  /Core
  /Ingest
  /Orchestrate
  /Prepare
  /Serve
  /Store
</code></pre>
<p>This structure offers a few key benefits:</p>
<ul>
<li><p><strong>End-to-end feature branches</strong> – You can implement a business requirement across all relevant layers (and include documentation!) in a single branch.</p>
</li>
<li><p><strong>CI/CD alignment</strong> – Makes it easier to automate build/test/deploy processes using GitHub Actions or Azure Pipelines.</p>
</li>
<li><p><strong>Organizational clarity</strong> – Developers always know where to find and contribute to specific parts of the solution.</p>
</li>
</ul>
<p>With this setup, isolated <strong>feature development workspaces</strong> are created dynamically and point to the relevant subfolders. This aligns perfectly with the approach demonstrated in this blog post, and it’s designed to scale with the complexity of your data platform.</p>
<h3 id="heading-automating-the-feature-development-process">Automating the Feature Development Process</h3>
<p>As highlighted in <a target="_blank" href="https://learn.microsoft.com/en-us/fabric/cicd/manage-deployment#development-process">Microsoft’s official documentation on deployment and development processes</a>, it’s considered <strong>best practice to isolate development work</strong> outside of your main collaboration branch. This ensures cleaner version control, better collaboration, and minimizes disruption to ongoing work.</p>
<p>Following Git standards, development should happen in <strong>feature branches</strong>, each representing a specific unit of work. This allows for focused development, easier reviews, and safer integration into the mainline once complete.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745073211197/e45d0e70-fd9c-4baa-bcbc-03a16269ba76.png" alt class="image--center mx-auto" /></p>
<p>When working in Microsoft Fabric, isolated development also means creating <strong>separate workspaces</strong> to support and validate your changes. There are two primary ways to do this:</p>
<ol>
<li><p><strong>Manual setup via the Fabric UI</strong></p>
</li>
<li><p><strong>Programmatic setup via the Fabric REST APIs or the Fabric CLI</strong></p>
</li>
</ol>
<p>But why stop at manual or semi-automated processes?</p>
<hr />
<h3 id="heading-taking-it-to-the-next-level-automating-workspace-creation">Taking It to the Next Level: Automating Workspace Creation</h3>
<p>By leveraging <strong>GitHub Actions</strong> or <strong>Azure DevOps pipelines</strong>, we can automate the entire process of setting up and later tearing down <strong>feature development workspaces</strong>. This not only saves time but ensures consistency across environments.</p>
<p>In my approach, I use a <strong>recipe file</strong> that defines exactly how feature workspaces should be configured. This file, <code>feature.json</code>, lives in the repository at: <code>automation/resources/environments/</code></p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"feature_name"</span> : <span class="hljs-string">"*{feature_name}-{layer_name}"</span>,
    <span class="hljs-attr">"capacity_name"</span>: <span class="hljs-string">"MyCapacity"</span>,
    <span class="hljs-attr">"git_settings"</span>: {
        <span class="hljs-attr">"gitProviderDetails"</span>: {
            <span class="hljs-attr">"gitProviderType"</span>: <span class="hljs-string">"GitHub"</span>,
            <span class="hljs-attr">"ownerName"</span>: <span class="hljs-string">"MyGitHubProfile"</span>,
            <span class="hljs-attr">"repositoryName"</span>: <span class="hljs-string">"MyGitHubRepo"</span>
        },
        <span class="hljs-attr">"myGitCredentials"</span>: {
            <span class="hljs-attr">"source"</span>: <span class="hljs-string">"ConfiguredConnection"</span>,
            <span class="hljs-attr">"connectionId"</span>: <span class="hljs-string">"00000000-0000-0000-0000-000000000000"</span>
        }
    },
    <span class="hljs-attr">"permissions"</span>: {
        <span class="hljs-attr">"admin"</span>: [
            {<span class="hljs-attr">"type"</span>: <span class="hljs-string">"Group"</span>, <span class="hljs-attr">"id"</span>: <span class="hljs-string">"00000000-0000-0000-0000-000000000000"</span>}
        ],
        <span class="hljs-attr">"contributor"</span>: [
            {<span class="hljs-attr">"type"</span>: <span class="hljs-string">"User"</span>, <span class="hljs-attr">"id"</span>: <span class="hljs-string">"00000000-0000-0000-0000-000000000000"</span>}
        ]
    },
    <span class="hljs-attr">"layers"</span>: {
        <span class="hljs-attr">"Prepare"</span>: {
            <span class="hljs-attr">"spark_settings"</span>: {
                <span class="hljs-attr">"pool"</span>: {
                    <span class="hljs-attr">"starterPool"</span>: {
                        <span class="hljs-attr">"maxExecutors"</span>: <span class="hljs-number">1</span>,
                        <span class="hljs-attr">"maxNodeCount"</span>: <span class="hljs-number">1</span>
                    }
                }
            },
            <span class="hljs-attr">"git_directoryName"</span>: <span class="hljs-string">"solution/prepare"</span>
        },
        <span class="hljs-attr">"Ingest"</span>: { <span class="hljs-attr">"git_directoryName"</span>: <span class="hljs-string">"solution/prepare"</span> },
        <span class="hljs-attr">"Orchestrate"</span>: { <span class="hljs-attr">"git_directoryName"</span>: <span class="hljs-string">"solution/orchestrate"</span> }
    }
}
</code></pre>
<p>The key elements include:</p>
<ul>
<li><p>A <strong>naming convention</strong> for feature workspaces (prefixed with an asterisk for easy visibility)</p>
</li>
<li><p>The <strong>target capacity</strong> for deployment</p>
</li>
<li><p><strong>Git integration settings</strong> and authentication</p>
</li>
<li><p><strong>Permissions configuration</strong> for users and/or groups</p>
</li>
<li><p>Layer-specific settings such as <strong>Spark pool resource limits</strong> can be particularly useful - for example, by configuring a single-node Spark pool, you can reduce vCore consumption and minimize the risk of hitting concurrency limits.</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">📘</div>
<div data-node-type="callout-text">My good friend <a target="_new" href="https://justb.dk/blog/">Just Blindbæk</a> has written a great series on optimizing Spark for collaboration and scaling - definitely worth a read!</div>
</div>

<h3 id="heading-github-workflows-creation-and-cleanup">GitHub Workflows: Creation and Cleanup</h3>
<p>Inside the <code>.github/workflows</code> folder of the repository, you’ll find two workflows:</p>
<ol>
<li><p><strong>Create Fabric feature workspaces on feature branch creation</strong><br /> Triggered when a new feature branch is created.</p>
</li>
<li><p><strong>Cleanup Fabric feature workspaces on merge to main</strong><br /> Triggered when the feature is merged into <code>main</code>.</p>
</li>
</ol>
<p>Both workflows call the Python script <code>fabric_feature_maintainance.py</code> (found in <code>automation/scripts</code>), which handles the actual creation or deletion logic. Under the hood, the script uses the <strong>Fabric CLI</strong>, calling commands via a utility function defined in: <code>automation/scripts/modules/fabric_cli_functions.py</code></p>
<p>CLI commands are executed using a simple <code>run_command()</code> function:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run_command</span>(<span class="hljs-params">command: str</span>) -&gt; str:</span>
    <span class="hljs-keyword">try</span>:
        result = subprocess.run(
            [<span class="hljs-string">"fab"</span>, <span class="hljs-string">"-c"</span>, command],
            capture_output=<span class="hljs-literal">True</span>,
            text=<span class="hljs-literal">True</span>,
            check=EXIT_ON_ERROR
        )
        <span class="hljs-keyword">return</span> result.stdout.strip()
    ...
</code></pre>
<p>And for functionality <strong>not yet covered by Fabric CLI commands</strong>, I use the powerful <code>fab api</code> command to interact directly with the Fabric REST API - for example, when <strong>connecting and synchronizing Git</strong> repositories.</p>
<h3 id="heading-quickstart-walkthrough">Quickstart Walkthrough</h3>
<p>Curious how this works in practice? Here’s a simple walkthrough to get you up and running with automated feature workspace creation in Microsoft Fabric.</p>
<p><strong>1. Fork the Repository</strong></p>
<p>Head over to:<br />👉 <a target="_blank" href="https://github.com/gronnerup/FabricAutomation">https://github.com/gronnerup/FabricAutomation</a><br />Fork it to your own GitHub account.</p>
<p><strong>2. Set Up Your Secrets and Service Principal</strong></p>
<p>Make sure you’ve followed the prerequisites:</p>
<ul>
<li><p>Create a <strong>service principal</strong> and assign necessary API permissions</p>
</li>
<li><p>Configure your repository secrets:</p>
<ul>
<li><p><code>SPN_TENANT_ID</code></p>
</li>
<li><p><code>SPN_CLIENT_ID</code></p>
</li>
<li><p><code>SPN_CLIENT_SECRET</code></p>
</li>
</ul>
</li>
<li><p>Set up <strong>Git integration</strong> with a <strong>GitHub Personal Access Token (PAT)</strong> and create a new cloud connection in Fabric go generate the required connection id. Choose <strong>Github - Source control</strong> as the connection type.</p>
</li>
</ul>
<p><strong>3. Customize the</strong> <code>feature.json</code> <strong>recipe file</strong></p>
<p>Edit the file <code>automation/resources/environments/feature.json</code><br />Define how your feature workspaces should be created:</p>
<ul>
<li><p>Workspace naming pattern</p>
</li>
<li><p>Fabric capacity</p>
</li>
<li><p>Git repo connection settings</p>
</li>
<li><p>Layers to include and optional Spark pool settings</p>
</li>
</ul>
<p><strong>4. Create a Feature Branch</strong></p>
<p>Create a new branch in your GitHub repository by using the naming convention <strong><em>feature/\</em>***</strong>.</p>
<p>This will automatically trigger the <strong>GitHub Action</strong> responsible for creating your feature workspaces.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745048786500/9a081808-3d0a-438e-be82-bae016c118e3.png" alt class="image--center mx-auto" /></p>
<p><strong>5. Watch the Workspaces Come to Life</strong></p>
<p>Within seconds, your configured feature workspaces will appear in Microsoft Fabric - connected to Git and syncronized, with permissions and Spark settings applied.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745049043670/c96eb8b8-3f0f-43cd-b1a6-df16686525d4.png" alt class="image--center mx-auto" /></p>
<p><strong>6. Merge and Clean Up Automatically</strong></p>
<p>When the feature is complete and you merge your branch into <code>main</code>, a separate GitHub Action will trigger and <strong>clean up</strong> the feature workspaces - keeping your Fabric environment tidy and focused.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745049075337/310110ab-8b38-4116-889c-f7bab9d84f0b.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-using-azure-devops-pipelines">Using Azure DevOps Pipelines</h2>
<p>If you haven’t already, make sure to read the section on automating feature workspace maintenance using GitHub Actions. It dives deeper into the overall approach, recommended repository structure, and reusable configuration files. This section focuses only on Azure DevOps-specific details - including authentication, limitations, and pipeline setup when using Azure DevOps as your Git provider.</p>
<h3 id="heading-prerequisites-and-requirements-1">Prerequisites and Requirements</h3>
<p>Simular to using Github Actions we to make sure a few things are in place before we can automate the creation of isolated <strong>feature development workspaces</strong>, the prerequisites are:</p>
<h4 id="heading-1-setup-azure-devops-repo">1. Setup Azure DevOps Repo</h4>
<p>Using this guide <a target="_blank" href="https://learn.microsoft.com/en-us/azure/devops/repos/git/import-git-repository?view=azure-devops">https://learn.microsoft.com/en-us/azure/devops/repos/git/import-git-repository</a> import the Github repository <a target="_blank" href="https://github.com/gronnerup/FabricAutomation">https://github.com/gronnerup/FabricAutomation</a> into you own Azure DevOps Repo.</p>
<h4 id="heading-2-variable-group-for-holding-service-principal-credentials">2. Variable Group for holding Service Principal credentials</h4>
<p>Create a new Variable Group under Repos → Library named <strong>Fabric_Automation</strong> and add the following variables:</p>
<ul>
<li><p><code>SPN_TENANT_ID</code> – The <strong>Tenant ID</strong> of your Microsoft Fabric environment.</p>
</li>
<li><p><code>SPN_CLIENT_ID</code> – The <strong>Client ID</strong> (Application ID) of your app registration.</p>
</li>
<li><p><code>SPN_CLIENT_SECRET</code> – The <strong>Client Secret</strong> of the app registration.</p>
</li>
</ul>
<p>Make sure the service principal is <strong>enabled for the Fabric REST APIs</strong> by following the official guidance:<br /><a target="_blank" href="https://learn.microsoft.com/en-us/rest/api/fabric/articles/identity-support#service-principal-tenant-setting">Enable service principal for Fabric REST APIs</a></p>
<h4 id="heading-3-create-azure-devops-pipelines-for-feature-workspace-creation-and-feature-teardown">3. Create Azure DevOps Pipelines for feature workspace creation and feature teardown</h4>
<p>Create 2 new Azure DevOps pipelines using the YAML pipelines located in the .azure-pipelines folder.</p>
<ul>
<li><p><strong>Create Feature Workspaces</strong> pointing to <code>.azure-pipelines/feature_fabric_branch.yml</code></p>
</li>
<li><p><strong>Cleanup Feature workspaces</strong> pointing to <code>.azure-pipelines/feature_fabric_cleanup.yml</code></p>
</li>
</ul>
<h4 id="heading-4-create-azure-devops-azure-devops-source-control-connections">4. Create Azure DevOps Azure DevOps source control connections</h4>
<p>Create a new <strong>connection</strong> to Azure DevOps in Fabric. This connection can be established using either a <strong>user principal</strong> or a <strong>Service Principal</strong>. Whichever option you choose, ensure that the identity has the necessary <strong>access to the Azure DevOps repository</strong>.<br />And don’t forget to <strong>explicitly add the Service Principal as a user of the connection</strong> to authorize its use in Git operations.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753553744712/92b4655d-b131-4aaf-8473-77642d46165b.png" alt class="image--center mx-auto" /></p>
<p><strong>5. Customize the</strong> <code>feature.json</code> <strong>recipe file</strong></p>
<p>Edit the file <code>automation/resources/environments/feature.json</code> as also described in the section covering GitHub setup.<br />Note that the gitProviderType must be set to <strong>AzureDevOps</strong>.</p>
<p><strong>6. Create a new feature branch watch feature worksaces come to life</strong></p>
<p>Creating a new feature named feature/*** will trigger the pipeline <strong>Create feature workspaces</strong> which will automatically create the required workspaces based on the recipe file, connect them to the Azure DevOps Repo and perform a syncronization.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753560516926/20cfde7c-5add-444a-ab39-a63848acff32.png" alt class="image--center mx-auto" /></p>
<p><strong>7. Merge and Clean Up Automatically</strong></p>
<p>When the feature is complete and you merge your branch into <code>main</code>, triggering the pipeline and <strong>Cleanup Feature workspaces</strong> - keeping your Fabric environment tidy and focused.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753636847595/30d6c5f9-1785-4758-899a-40c8769a8b9f.png" alt class="image--center mx-auto" /></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Note that unlike GitHub Actions, Azure DevOps uses a different syntax for defining triggers in YAML. For example the ADO does not have native support <code>on: create</code> and <code>types: [closed]</code>. However we do check whether the corresponding feature workspaces already exists before creating it and use a condition the cleanup pipeline to ensures the logic only runs on individual commits, not on merge completions.</div>
</div>

<h2 id="heading-tip-dynamically-defining-source-control-connections">Tip: Dynamically defining source control connections</h2>
<p>In the <code>feature.json</code> recipe file, you can now define the source control connection <strong>dynamically</strong> using the <code>connectionName</code> field with string interpolation. This provides a flexible alternative to using a fixed <code>connectionId</code> and allows you to tailor the connection to the identity of the user triggering the pipeline.</p>
<p>Instead of this:</p>
<pre><code class="lang-python">jsonCopyEdit<span class="hljs-string">"myGitCredentials"</span>: {
  <span class="hljs-string">"source"</span>: <span class="hljs-string">"ConfiguredConnection"</span>,
  <span class="hljs-string">"connectionId"</span>: <span class="hljs-string">"12345678-abcd-efgh-ijkl-9876543210"</span>
}
</code></pre>
<p>You can now do this:</p>
<pre><code class="lang-python">jsonCopyEdit<span class="hljs-string">"myGitCredentials"</span>: {
  <span class="hljs-string">"source"</span>: <span class="hljs-string">"ConfiguredConnection"</span>,
  <span class="hljs-string">"connectionName"</span>: <span class="hljs-string">"PeerInsights_AzureDevOps_{identity_username}"</span>
}
</code></pre>
<p>The placeholders <code>{identity_username}</code> and <code>{identity_id}</code> are automatically resolved at runtime:</p>
<ul>
<li><p><code>{identity_username}</code></p>
<ul>
<li><p>In <strong>Azure DevOps</strong>, this maps to the predefined variable <code>Build.RequestedForEmail</code> (converted to <strong>uppercase</strong>).</p>
</li>
<li><p>In <strong>GitHub</strong>, it uses the <code>GITHUB_ACTOR</code> environment variable (must match the casing exactly).</p>
</li>
</ul>
</li>
<li><p><code>{identity_id}</code></p>
<ul>
<li><p>In <strong>Azure DevOps</strong>, this is <code>Build.RequestedForId</code>.</p>
</li>
<li><p>In <strong>GitHub</strong>, it uses <code>GITHUB_ACTOR_ID</code>.</p>
</li>
</ul>
</li>
</ul>
<p>This enables even more granular connection setups, for example:</p>
<ul>
<li><p><code>FabricSourceControl_GRONNERUP</code> (based on username)</p>
</li>
<li><p><code>FabricSourceControl_3e8609e9-9292-4e1e-9f2d-3f533ed6d7f8</code> (based on user ID)</p>
</li>
</ul>
<blockquote>
<p><strong>Note:</strong> In Azure DevOps, the username used in the connection must be in <strong>uppercase</strong>. In GitHub, the casing must exactly match how the username is stored in the platform. That’s just my design…</p>
</blockquote>
<p>Also, remember that the <strong>Service Principal</strong> used by the pipeline in Azure DevOps or GitHub must be added as a <strong>user of the connection</strong> to access it during automation.</p>
<p>This dynamic approach makes your automation workflows more flexible and scalable, especially in environments with multiple contributors.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>Automating the creation of feature workspaces in Microsoft Fabric is a key step toward a scalable, repeatable, and developer-friendly data platform. By combining the power of the Fabric CLI, GitHub Actions, Azure DevOps Pipelines and a simple recipe-based configuration, we can streamline the entire development process - from branch creation to workspace provisioning and eventual cleanup.</p>
<p>This is just the beginning.</p>
<p>I’ll continue to enhance the <a target="_blank" href="https://github.com/gronnerup/FabricAutomation">FabricAutomation</a> repository to reflect my latest work, including:</p>
<ul>
<li><p><strong>Automated solution setup</strong> for new projects and environments</p>
</li>
<li><p><strong>Solution automation using a metadata-driven framework</strong></p>
</li>
<li><p><strong>CI/CD pipelines</strong> using <strong>Fabric CLI</strong> and the <strong>fabric-cicd</strong> Python library</p>
</li>
<li><p><strong>Branching and merging strategies</strong> for structured, enterprise-grade development</p>
</li>
<li><p><strong>Enhanced support</strong> for user specific recipe files and much more…</p>
</li>
</ul>
<p>Stay tuned - and feel free to star the repo or follow along if you're as excited about Fabric automation as I am. 🚀</p>
]]></content:encoded></item><item><title><![CDATA[Automating Fabric:  Maintaining workspace icon images]]></title><description><![CDATA[When working with data platform solutions in Microsoft Fabric, a well-structured approach is crucial for maintaining scalability and organization. One best practice is to separate workspaces not only into different environments (such as development, ...]]></description><link>https://peerinsights.emono.dk/automating-fabric-maintaining-workspace-icon-images</link><guid isPermaLink="true">https://peerinsights.emono.dk/automating-fabric-maintaining-workspace-icon-images</guid><dc:creator><![CDATA[Peer Grønnerup]]></dc:creator><pubDate>Mon, 10 Feb 2025 18:57:55 GMT</pubDate><content:encoded><![CDATA[<p>When working with data platform solutions in Microsoft Fabric, a well-structured approach is crucial for maintaining scalability and organization. One best practice is to separate workspaces not only into different environments (such as development, test, and production) but also into distinct layers—data storage, data ingestion, transformation, semantic modeling, and reporting. This separation improves governance, security, and clarity in large-scale deployments.</p>
<p>However, managing multiple workspaces can quickly become overwhelming. Identifying and distinguishing them at a glance is not always easy. Fortunately, Microsoft Fabric allows us to assign <strong>Workspace Images</strong>, which provide a simple yet effective way to visually categorize different workspaces based on their purpose and environment.</p>
<p>Uploading these images manually is feasible, but when dealing with a large number of workspaces, automation becomes the obvious solution. In this blog post, I will walk you through how to automate the process of uploading workspace images using a <strong>Fabric Notebook</strong>, making it easy to manage and update workspace visuals at scale.</p>
<p><strong>Disclaimer:</strong> <em>This solution uses a non-documented and unofficial Microsoft endpoint for fetching and updating workspace metadata in Microsoft Fabric/Power BI. Since this is not an officially supported API, it may change without notice, which could impact the functionality of this approach. Use it with that in mind, and feel free to experiment!</em></p>
<h2 id="heading-fabric-notebook-to-automating-workspace-image-uploads">Fabric Notebook to automating Workspace Image uploads</h2>
<p>To demonstrate how to <strong>maintain workspace icon images programmatically</strong>, I’ve created a simple <strong>Fabric Notebook</strong>. This notebook provides methods for:</p>
<ul>
<li><p>Identifying workspaces based on a filter definition.</p>
</li>
<li><p>Fetching workspace metadata, including existing icons.</p>
</li>
<li><p>Setting new workspace icons in bulk.</p>
</li>
</ul>
<p>For this demonstration, the notebook utilizes <strong>icons from Marc Lelijveld’s blog post</strong> on <a target="_blank" href="https://data-marc.com/2023/07/10/designing-architectural-diagrams-with-the-latest-microsoft-fabric-icons/">Designing Architectural Diagrams with the Latest Microsoft Fabric Icons</a>.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">ℹ</div>
<div data-node-type="callout-text">The notebook must be executed by a user with <strong>workspace admin</strong> permissions to update the icon for a given workspace.</div>
</div>

<h3 id="heading-requirements"><strong>Requirements</strong></h3>
<p>The notebook requires a few Python libraries:</p>
<ul>
<li><p><code>cairosvg</code> – Converts base64 SVGs to PNG images.</p>
</li>
<li><p><code>Pillow</code> – Supports adding an environment letter on top of the icons (not used in the example but available for experimentation).</p>
</li>
</ul>
<p>To fetch all accessible workspaces, the notebook uses <strong>SemanticLink</strong> and the <code>FabricRestClient</code> class:<br /><a target="_blank" href="https://learn.microsoft.com/en-us/python/api/semantic-link-sempy/sempy.fabric.fabricrestclient?view=semantic-link-python&amp;viewFallbackFrom=semantic-link-python%3Fwt.mc_id%3Dmvp_335074">SemanticLink FabricRestClient Documentation</a>.</p>
<h3 id="heading-filtering-workspaces"><strong>Filtering Workspaces</strong></h3>
<p>The notebook filters workspaces using two parameters:</p>
<pre><code class="lang-python">must_contain = <span class="hljs-string">"PeerInsights"</span>
either_contain = [<span class="hljs-string">"dev"</span>, <span class="hljs-string">"tst"</span>, <span class="hljs-string">"prd"</span>]
</code></pre>
<p>A custom Python function <code>filter_items</code> then filters the list of workspaces:</p>
<pre><code class="lang-python">workspaces = filter_items(all_workspaces, must_contain, either_contain)
</code></pre>
<h3 id="heading-defining-workspace-icons"><strong>Defining Workspace Icons</strong></h3>
<p>A JSON structure is used to define workspace icons and color overlays:</p>
<pre><code class="lang-python">workspace_icon_def = {
    <span class="hljs-string">"icons"</span>: {
        <span class="hljs-string">"prepare"</span>: <span class="hljs-string">"Notebook"</span>,
        <span class="hljs-string">"ingest"</span>: <span class="hljs-string">"Pipelines"</span>,
        <span class="hljs-string">"store"</span>: <span class="hljs-string">"Lakehouse"</span>,
        <span class="hljs-string">"serve"</span>: <span class="hljs-string">"Dataset"</span>
    },
    <span class="hljs-string">"color_overlays"</span>: {
        <span class="hljs-string">"dev"</span>: <span class="hljs-string">"#1E90FF"</span>,   <span class="hljs-comment"># Blue</span>
        <span class="hljs-string">"tst"</span>: <span class="hljs-string">"#FFA500"</span>,   <span class="hljs-comment"># Orange</span>
        <span class="hljs-string">"prd"</span>: <span class="hljs-string">"#008000"</span>    <span class="hljs-comment"># Green    </span>
    }
}
</code></pre>
<p><em>Note: To remove an existing icon, set the icon title to</em> <code>None</code>.</p>
<h3 id="heading-updating-workspace-icons"><strong>Updating Workspace Icons</strong></h3>
<ol>
<li><p>In <strong>Cell 7</strong> of the notebook, a new property <code>icon_base64img</code> is added to each workspace, storing the base64-encoded PNG string of the new icon.</p>
</li>
<li><p>The function <code>display_workspace_icons</code> generates an <strong>HTML table</strong> showing the old and new workspace icons for verification.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739212726045/bbf6ce30-c203-4cb5-9b47-4ec269fcc100.png" alt class="image--center mx-auto" /></p>
</li>
<li><p>Finally, we iterate through the filtered workspaces and updates their icons using the <code>set_workspace_icon</code> function.</p>
</li>
</ol>
<p><strong>The result….</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739213366304/f0a59ef7-3263-4dde-9c6a-cacb157bb3a6.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-using-the-non-documented-metadata-endpoint"><strong>Using the non-documented metadata endpoint</strong></h2>
<p>Workspace icons are updated by calling the <strong>non-documented Microsoft endpoint</strong>:</p>
<pre><code class="lang-plaintext">{cluster_base_url}metadata/folders/{workspace_id}
</code></pre>
<p>The <code>cluster_base_url</code> can be retrieved using the <strong>Power BI REST API</strong>:</p>
<pre><code class="lang-plaintext">https://api.powerbi.com/v1.0/myorg/capacities
</code></pre>
<p>Example base URL:</p>
<pre><code class="lang-plaintext">https://wabi-north-europe-j-primary-redirect.analysis.windows.net/v1.0/...
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">⚠</div>
<div data-node-type="callout-text">The undocumented endpoints can only be accessed using a <strong>user identity</strong>. Setting a workspace icon is not supported when using a <strong>Service Principal</strong>.</div>
</div>

<h3 id="heading-making-api-calls"><strong>Making API Calls</strong></h3>
<p><strong>Fetching workspace metadata:</strong></p>
<pre><code class="lang-python">GET {cluster_base_url}metadata/folders/{workspace_id}
</code></pre>
<p><strong>Updating the workspace icon:</strong></p>
<pre><code class="lang-python">PUT {cluster_base_url}metadata/folders/{workspace_id}
</code></pre>
<p>With the following payload:</p>
<pre><code class="lang-json">{ <span class="hljs-attr">"icon"</span>: <span class="hljs-string">"data:image/png;base64,{base64_png}"</span> }
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">ℹ</div>
<div data-node-type="callout-text">According to the documentation from Microsoft a Workspace image can be in .png or .jpg format and the size of the file has to be less than 45 KB.</div>
</div>

<h2 id="heading-conclusion">Conclusion</h2>
<p>By automating the upload of workspace images in Microsoft Fabric, we can enhance the visual organization of workspaces, making it easier to distinguish between different layers and environments. Instead of manually updating images across multiple workspaces, a Fabric Notebook provides an efficient and scalable solution.</p>
<p>You're free to use my example as-is or as inspiration for your own automated setup. I’d love to hear how others are tackling this challenge - what solutions have you come up with, and what are your use cases? Feel free to comment, ask questions, or share suggestions!</p>
<p>If you're interested in trying this out, you can download the Fabric Notebook <a target="_blank" href="https://github.com/gronnerup/Fabric/blob/f1b54a4588fd52a8cf278c6394c3d8423352b3ff/AutomatingFabric/Notebooks/AutomatingFabric-WorkspaceIcons.ipynb">here</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Automating Fabric: Dynamically Configuring Microsoft Fabric Data Pipelines]]></title><description><![CDATA[In a typical end-to-end Microsoft Fabric data platform, we use workspace structures and stages - like store, ingest, prepare, serve and orchestrate - to organize the data lifecycle. If you're unfamiliar with these stages, I’ve detailed them in my pre...]]></description><link>https://peerinsights.emono.dk/automating-fabric-dynamically-configuring-microsoft-fabric-data-pipelines</link><guid isPermaLink="true">https://peerinsights.emono.dk/automating-fabric-dynamically-configuring-microsoft-fabric-data-pipelines</guid><category><![CDATA[microsoftfabric]]></category><category><![CDATA[microsoft fabric]]></category><dc:creator><![CDATA[Peer Grønnerup]]></dc:creator><pubDate>Wed, 29 Jan 2025 18:42:00 GMT</pubDate><content:encoded><![CDATA[<p>In a typical end-to-end Microsoft Fabric data platform, we use workspace structures and stages - like <strong>store</strong>, <strong>ingest</strong>, <strong>prepare, serve</strong> and <strong>orchestrate</strong> - to organize the data lifecycle. If you're unfamiliar with these stages, I’ve detailed them in my previous post: <a target="_blank" href="https://peerinsights.hashnode.dev/automating-fabric-kickstart-your-fabric-data-platform-setup">Automating Fabric: Kickstart Your Fabric Data Platform Setup</a>.</p>
<p>This post will focus on the ingest and orchestrate stage and how to ensure valid and robust references between Data Pipelines.</p>
<h3 id="heading-the-challenge-of-automating-data-pipelines-in-fabric">The challenge of automating Data Pipelines in Fabric</h3>
<p>Automation is key to enabling an efficient CI/CD flow, but Microsoft Fabric, as a relatively new platform, doesn’t always provide the ideal tools for seamless automation. A prime example is how Fabric Data Pipelines manage dependencies and references - whether invoking other pipelines, running notebooks, refreshing semantic models, or connecting to resources like Lakehouses or SQL databases.</p>
<h4 id="heading-a-common-scenario">A Common Scenario</h4>
<p>Consider this scenario:</p>
<ul>
<li><p>You create a <strong>controller pipeline</strong> that orchestrates data ingestion by invoking child pipelines.</p>
</li>
<li><p>The controller pipeline then triggers notebooks to transform data from <strong>bronze</strong> to <strong>gold</strong> in a medallion architecture.</p>
</li>
<li><p>Finally, it refreshes a semantic model to support business intelligence workloads.</p>
</li>
</ul>
<p>This solution evolves through feature branches in a <strong>development</strong> environment, moves to <strong>test</strong> for user acceptance testing, and is eventually deployed to <strong>production</strong>.</p>
<p>A key challenge here is ensuring that references to resources - like workspaces and pipelines - are dynamically updated as part of the deployment process, without adding complexity for data engineers or compromising CI/CD workflows.</p>
<p>Many of you who has already worked with Data Pipelines in Fabric in combination with Git and deploying Data Pipelines may have found yourself frustrated by the way pipelines reference other pipelines and how it can lead to errors.</p>
<p>In this post, I’ll show how to dynamically configure the <strong>Invoke Pipeline</strong> activity in Fabric Data Factory to support automated CI/CD deployments.</p>
<hr />
<h3 id="heading-two-approaches-to-invoking-data-pipelines">Two Approaches to Invoking Data Pipelines</h3>
<p>Fabric offers two main ways to invoke one data pipeline from another:</p>
<h4 id="heading-1-legacy-invoke-data-pipeline">1. <strong>Legacy Invoke Data Pipeline</strong></h4>
<p>This is the older, now deprecated, approach. It:</p>
<ul>
<li><p>Allows pipeline execution <strong>only within the same workspace</strong>.</p>
</li>
<li><p>Does <strong>not</strong> support dynamic expressions for workspace or pipeline references.</p>
</li>
</ul>
<h4 id="heading-2-invoke-pipeline-preview">2. <strong>Invoke Pipeline (Preview)</strong></h4>
<p>This newer, more versatile activity (currently in preview) allows:</p>
<ul>
<li><p>Executing pipelines across <strong>different workspaces</strong>.</p>
</li>
<li><p>Using <strong>dynamic expressions</strong> for workspace and pipeline references.</p>
</li>
<li><p>Invoking Azure Data Factory Pipelines and Synapse Pipelines.</p>
</li>
</ul>
<p>However, it depends on a new type of connection that leverages user principal identity for authentication.</p>
<hr />
<h3 id="heading-a-dynamic-solution-for-cicd">A Dynamic Solution for CI/CD</h3>
<p>To enable automated CI/CD deployment while maintaining dynamic references, we use the <strong>Invoke Pipeline (Preview)</strong> activity with dynamic settings for workspace and pipeline IDs. Here’s how:</p>
<h4 id="heading-step-1-extract-pipeline-metadata">Step 1: Extract Pipeline Metadata</h4>
<p>First, we need a <strong>Web activity</strong> to retrieve metadata about all pipelines in the current workspace.</p>
<ul>
<li><p>This activity calls the Fabric REST API using a <strong>service principal</strong>.</p>
</li>
<li><p>Configure a Web connection with a base URL: <code>https://api.fabric.microsoft.com/v1</code>.</p>
</li>
<li><p>Use the following <strong>dynamic expression</strong> for the relative URL:</p>
<pre><code class="lang-typescript">  <span class="hljs-meta">@concat</span>(<span class="hljs-string">'workspaces/'</span>, pipeline().DataFactory, <span class="hljs-string">'/items?type=DataPipeline'</span>)
</code></pre>
</li>
</ul>
<p>This fetches details about data pipelines, including their display names and IDs. We use the Core endpoint <a target="_blank" href="https://learn.microsoft.com/en-us/rest/api/fabric/core/items/list-items?tabs=HTTP">List Items</a> which returns a list of items from a specified workspace.</p>
<h4 id="heading-step-2-dynamically-set-workspace-and-pipeline-references">Step 2: Dynamically Set Workspace and Pipeline References</h4>
<p>Next, use the <strong>Invoke Pipeline (Preview)</strong> activity with dynamic content for the workspace and pipeline settings:</p>
<ol>
<li><p><strong>Workspace Reference</strong><br /> Set the workspace dynamically using:</p>
<pre><code class="lang-typescript"> <span class="hljs-meta">@pipeline</span>().DataFactory
</code></pre>
<p> This ensures the activity always points to the workspace of the executing pipeline.</p>
</li>
<li><p><strong>Pipeline Reference</strong><br /> Use the following expression to dynamically retrieve the pipeline ID based on its display name:</p>
<pre><code class="lang-typescript"> <span class="hljs-meta">@string</span>(
     xpath(
         xml(
             json(concat(<span class="hljs-string">'{"root":'</span>, activity(<span class="hljs-string">'GetPipelines'</span>).output, <span class="hljs-string">'}'</span>))
         ),
         <span class="hljs-string">'string(/root/value[normalize-space(displayName)="MyChildPipeline"]/id)'</span>
     )
 )
</code></pre>
<p> This searches for the ID of the pipeline <strong>MyChildPipeline</strong> within the current workspace.</p>
</li>
</ol>
<hr />
<h3 id="heading-why-this-works-for-cicd">Why This Works for CI/CD</h3>
<p>By configuring workspace and pipeline references dynamically:</p>
<ul>
<li><p>You eliminate hardcoding, ensuring pipelines adapt to the target environment (development, test, or production).</p>
</li>
<li><p>References are automatically updated during deployment, reducing manual effort and risk of errors.</p>
</li>
<li><p>The solution remains flexible and scalable for feature branches and multi-stage workflows.</p>
</li>
</ul>
<hr />
<h3 id="heading-conclusion">Conclusion</h3>
<p>Dynamic configuration of Fabric Data Pipelines is essential for a robust and automated CI/CD process. By leveraging the <strong>Invoke Pipeline (Preview)</strong> activity and integrating with Fabric REST APIs, you can achieve seamless deployments across environments while maintaining clarity and simplicity in your pipeline design.</p>
<p>I hope this guide helps you on your journey to automate Microsoft Fabric solutions. Let me know your thoughts or questions in the comments below!</p>
]]></content:encoded></item><item><title><![CDATA[Automating Microsoft Fabric: 
Private Endpoint Setup in workspaces]]></title><description><![CDATA[In an exciting development, Microsoft Fabric just announced support for APIs dedicated to managing private endpoints, a crucial feature for organizations prioritizing secure and private data access. Building on my previous posts on automating Fabric ...]]></description><link>https://peerinsights.emono.dk/automating-microsoft-fabric-private-endpoint-setup-in-workspaces</link><guid isPermaLink="true">https://peerinsights.emono.dk/automating-microsoft-fabric-private-endpoint-setup-in-workspaces</guid><category><![CDATA[microsoftfabric]]></category><category><![CDATA[APIs]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Peer Grønnerup]]></dc:creator><pubDate>Wed, 30 Oct 2024 16:23:14 GMT</pubDate><content:encoded><![CDATA[<p>In an exciting development, Microsoft Fabric just announced support for APIs dedicated to managing private endpoints, a crucial feature for organizations prioritizing secure and private data access. Building on my previous posts on automating Fabric workspaces and lakehouses and leveraging Fabric REST APIs, I’ll guide you through automating the creation of managed private endpoints within your Fabric workspaces. In this post, I’ll cover not only how to set up these private connections but also how to streamline approvals via Azure management APIs, if permitted in your environment.</p>
<p>Find the official blog post from Microsoft on APIs for Managed Private Endpoints here: <a target="_blank" href="https://blog.fabric.microsoft.com/en-US/blog/apis-for-managed-private-endpoint-are-now-available/">https://blog.fabric.microsoft.com/en-US/blog/apis-for-managed-private-endpoint-are-now-available/</a></p>
<h3 id="heading-previous-approach-to-automating-managed-private-endpoint-creation">Previous Approach to Automating Managed Private Endpoint Creation</h3>
<p>Before official API support for managed private endpoints was available in Microsoft Fabric, our approach relied on using Fabric's internal, undocumented APIs. To automate endpoint creation within a workspace, I would send a POST request to:</p>
<pre><code class="lang-plaintext">https://wabi-north-europe-j-primary-redirect.analysis.windows.net/metadata/workspaces/00000000-0000-0000-0000-000000000000/privateEndpoints
</code></pre>
<p>And with the following JSON payload:</p>
<pre><code class="lang-json">{
   <span class="hljs-attr">"name"</span>:<span class="hljs-string">"my-private-endpoint"</span>,
   <span class="hljs-attr">"requestMessage"</span>:<span class="hljs-string">"Auto-generated managed private endpoint"</span>,
   <span class="hljs-attr">"privateLinkResourceId"</span>:<span class="hljs-string">"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/rg-peerinsights-dev/providers/Microsoft.KeyVault/vaults/kv-peerinsights-dev"</span>,
   <span class="hljs-attr">"groupId"</span>:<span class="hljs-string">"00000000-0000-0000-0000-000000000000"</span>
}
</code></pre>
<p>While effective, this approach was less than ideal - it depended on an unsupported API and allowed only user identity for authentication, not service principals or managed identities.</p>
<p>With the recent additions to the Fabric APIs, creating managed private endpoints can now be achieved through officially supported, documented endpoints. Even better, service principal authentication is now supported, offering a more secure and scalable way to automate private endpoint management.</p>
<h3 id="heading-adding-managed-private-endpoints-with-fabric-apis">Adding Managed Private Endpoints with Fabric APIs</h3>
<p>Building upon my previous blog post on automating your Fabric environment setup, I’ve enhanced the helper functions notebook to support the creation and management of managed private endpoints, including handling the long-running nature of the setup process.</p>
<p>In the <code>fabric_</code><a target="_blank" href="http://functions.py"><code>functions.py</code></a> script, I added a few key functions to streamline this process. Two of the most critical functions are:</p>
<ul>
<li><p><code>create_workspace_managed_private_endpoint</code>: This function automates the creation of a managed private endpoint within a Microsoft Fabric workspace, monitoring its provisioning status until fully completed.</p>
</li>
<li><p><code>approve_private_endpoint</code>: This function automates the approval of a private endpoint connection within Azure, updating its status to "Approved" through an API request.</p>
</li>
</ul>
<p>To integrate this functionality, I extended the staging recipe used in the workspace setup to include private endpoints that should be created and, if desired, automatically approved. Here’s an example of the updated <code>fabric_stages</code> configuration:</p>
<pre><code class="lang-python">fabric_stages = {
    <span class="hljs-string">"Prepare"</span>: {
        <span class="hljs-string">"private_endpoints"</span>: [
            {
                <span class="hljs-string">"name"</span>: <span class="hljs-string">"mpe-kv-peerinsights-dev"</span>,
                <span class="hljs-string">"auto_approve"</span>: <span class="hljs-literal">True</span>,
                <span class="hljs-string">"id"</span>: <span class="hljs-string">"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/rg-peerinsights-dev/providers/Microsoft.KeyVault/vaults/kv-peerinsights-dev"</span>
            }
        ]
    }
}
</code></pre>
<p>With this new functionality, private endpoints can be easily integrated into the Fabric setup process. And by using the <code>auto_approve</code> property in the private endpoint definition, we can direct our setup to automatically approve the newly created endpoint. Here’s how it works:</p>
<pre><code class="lang-python"><span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> stage_props.get(<span class="hljs-string">"private_endpoints"</span>) <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span>:
    <span class="hljs-keyword">for</span> private_endpoint <span class="hljs-keyword">in</span> stage_props.get(<span class="hljs-string">"private_endpoints"</span>):
        fabfunc.create_workspace_managed_private_endpoint(
            fabric_access_token, workspace_id, private_endpoint.get(<span class="hljs-string">"name"</span>), private_endpoint.get(<span class="hljs-string">"id"</span>)
        )
        <span class="hljs-keyword">if</span> private_endpoint.get(<span class="hljs-string">"auto_approve"</span>):
            connection_name = <span class="hljs-string">f"<span class="hljs-subst">{workspace_id}</span>.<span class="hljs-subst">{private_endpoint.get(<span class="hljs-string">'name'</span>)}</span>-conn"</span>
            management_access_token = fabfunc.get_access_token(tenant_id, app_id, app_secret, <span class="hljs-string">'https://management.core.windows.net'</span>)
            fabfunc.approve_private_endpoint(
                management_access_token, private_endpoint.get(<span class="hljs-string">"id"</span>), connection_name
            )
</code></pre>
<p>And the result…</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730305751863/3c544063-71a2-4c52-b8cc-c7c7769b50a9.png" alt class="image--center mx-auto" /></p>
<p>With this approach, managed private endpoints can now be included as an integrated part of the Fabric setup, ensuring a smooth and automated deployment from start to finish.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Every Fabric API update brings us closer to fully automating and streamlining data platform workflows, steadily checking off my 'must-have' features list—big kudos to the Fabric team!</p>
<p>I’ll keep sharing insights on automating Microsoft Fabric, so stay tuned for more from Peer Insights! As a sneak peek, I’ll be exploring ways of working within Fabric to simplify the setup of feature development workspaces and more.</p>
<p>You can download the enhanced notebooks, now supporting managed private endpoint setup, here: <a target="_blank" href="https://github.com/gronnerup/Fabric/tree/main/FabricSolutionInit">GitHub - FabricSolutionInit</a>.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">I initially forgot to include the <code>azure_</code><a target="_self" href="http://functions.py"><code>functions.py</code></a> file in the repository, but it has now been added. You can find it alongside the other resources to support your setup.</div>
</div>]]></content:encoded></item><item><title><![CDATA[Automating Fabric: Kickstart your Fabric Data Platform setup]]></title><description><![CDATA[Setting up and managing workspaces in Microsoft Fabric can be a time-consuming task, especially when you need multiple workspaces for various stages of the data lifecycle across different environments. This blog post demonstrates how to streamline yo...]]></description><link>https://peerinsights.emono.dk/automating-fabric-kickstart-your-fabric-data-platform-setup</link><guid isPermaLink="true">https://peerinsights.emono.dk/automating-fabric-kickstart-your-fabric-data-platform-setup</guid><category><![CDATA[microsoftfabric]]></category><category><![CDATA[automation]]></category><category><![CDATA[Python]]></category><category><![CDATA[PowerBI]]></category><category><![CDATA[lakehouse]]></category><dc:creator><![CDATA[Peer Grønnerup]]></dc:creator><pubDate>Mon, 28 Oct 2024 21:53:40 GMT</pubDate><content:encoded><![CDATA[<p>Setting up and managing workspaces in Microsoft Fabric can be a time-consuming task, especially when you need multiple workspaces for various stages of the data lifecycle across different environments. This blog post demonstrates how to streamline your Fabric setup using Python and Fabric REST APIs and automate the creation, configuration, and if required clean up of Fabric workspaces etc.</p>
<h3 id="heading-my-approach-to-workspace-setup-and-configuration">My approach to workspace setup and configuration</h3>
<p>I will introduce a recipe-based setup approach, where I define essential parameters like workspace naming pattern, environment-specific settings, stages, Git configurations, and more.</p>
<p>Using Python scripts I will demonstrate how quickly and efficiently you can perform the following tasks:</p>
<ul>
<li><p><strong>Configure environments</strong> (Development, Test, Production) using environment-specific parameters.</p>
</li>
<li><p><strong>Set up workspaces for different data lifecycle stages</strong> (Ingest, Prepare, Serve, and Consume) and for each of the configured environments.</p>
</li>
<li><p><strong>Automate workspace assignments to Fabric capacities.</strong></p>
</li>
<li><p><strong>Manage access and permissions</strong> for secure, compliant collaboration.</p>
</li>
<li><p><strong>Integrate workspaces with Git</strong> for seamless CI/CD workflows.</p>
</li>
</ul>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>There are a few prerequisites to this approach. These include:</p>
<ul>
<li><p><strong>Python Environment</strong>: Ensure you have Python 3.x installed with essential libraries (<code>requests</code> and <code>azure-identity</code>) which is needed for interacting with REST APIs and authenticating to Azure.<br />  You can install the required Python libraries by using the command:<br />  <code>pip install requests azure-identity —-user</code></p>
</li>
<li><p><strong>Fabric API Access</strong>: Access to Fabric REST APIs via service principal, configured with necessary permissions.</p>
</li>
<li><p><strong>Git Access</strong>: Access to integrate Fabric workspaces with a Git repository.</p>
</li>
<li><p><strong>Python functions file and setup sample scripts</strong>: Clone or download the Python scripts from my GitHub repo to get started. You can find a link to the repository at the bottom of this blog post.</p>
</li>
</ul>
<p><em>Tips:</em><br /><em>A Fabric administrator will need to enable API permissions and workspace creation rights for your service principal.</em></p>
<h3 id="heading-workspace-structure">Workspace structure</h3>
<p>Before jumping into action, let’s discuss how to structure your Fabric workspaces effectively. My recommendation is to separate workspaces by stages and environments, as shown below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730146903577/12b153d8-f421-40ba-818f-1e45f77619b9.png" alt class="image--center mx-auto" /></p>
<p>In a typical end-to-end data platform setup, we have distinct components for each stage of the data lifecycle: pipelines and notebooks for data ingestion, notebooks for data preparation, lakehouses for storage, semantic models for serving data and reports for data consumption. Separating these stages into individual workspaces, then multiplying them by environments (such as dev, tst, prd), allows you to assign security at the stage level and provides flexibility in allocating different Fabric capacities for each stage and environment.</p>
<p>For enhanced governance and security, consider further dividing the storage workspace into separate workspaces for each layer of the medallion architecture. This approach simplifies permission management and supports a more scalable, secure setup across the data platform. On the other hand this approach will increase complexity and it add management overhead .</p>
<h3 id="heading-recipe-based-setup">Recipe-Based Setup</h3>
<p>My automation approach is built around variables and recipes, defining details for each environment and stage, including:</p>
<ul>
<li><p><strong>Naming</strong>: A generic pattern for defining how workspaces are named.</p>
</li>
<li><p><strong>Environments</strong>: Details for Dev, Test, and Production environments. This also includes Fabric capacity details and permissions.</p>
</li>
<li><p><strong>Stages</strong>: The purpose (Ingest, Prepare, Serve, Consume) and definition of Fabric items such as lakehouses.</p>
</li>
<li><p><strong>Git Setup Information</strong>: Definition of Git repository information and branch details for each workspace.</p>
</li>
</ul>
<h2 id="heading-script-and-setup-configuration">Script and setup configuration</h2>
<p>To streamline the setup of a Fabric data platform solution, I’ve created two Python scripts: <code>init_fabric_</code><a target="_blank" href="http://solution.py"><code>solution.py</code></a> and <code>fabric_</code><a target="_blank" href="http://functions.py"><code>functions.py</code></a>. These scripts automate the creation and configuration of workspaces, capacities, and permissions across various stages and environments using Fabric and Power BI REST APIs.</p>
<p>The <code>init_fabric_</code><a target="_blank" href="http://solution.py"><code>solution.py</code></a> script manages the main setup process, leveraging helper functions in <code>fabric_</code><a target="_blank" href="http://functions.py"><code>functions.py</code></a>. These helper functions encapsulate the necessary Fabric and Power BI REST API calls, keeping the code clean, reusable, and easy to maintain. This approach makes it simple to add or adjust functions as setup needs evolve.</p>
<p>Together, these scripts provide a fully automated, scalable method for configuring your Fabric solution with minimal manual effort.</p>
<p>Let me walk you through the key steps in the setup process, covering the creation of Fabric workspaces, items, and the initialization of Git integration.</p>
<h3 id="heading-step-1-authenticating-with-fabric-rest-apis"><strong>Step 1: Authenticating with Fabric REST APIs</strong></h3>
<p>Workspaces and lakehouses are created using a service principal, following best practices to ensure that ownership is assigned to the service principal rather than an individual user account.</p>
<p>The Tenant ID, App ID, and App Secret for the service principal can be stored in a <code>credentials.json</code> file or directly in the <code>init_fabric_</code><a target="_blank" href="http://solution.py"><code>solution.py</code></a> script, depending on your preference.</p>
<p><em>Example of credentials.json file</em></p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"tenant_id"</span>: <span class="hljs-string">"00000000-0000-0000-0000-000000000000"</span>,
    <span class="hljs-attr">"app_id"</span>: <span class="hljs-string">"00000000-0000-0000-0000-000000000000"</span>,
    <span class="hljs-attr">"app_secret"</span>: <span class="hljs-string">"YourAppSecret"</span>
}
</code></pre>
<p>The wrapper function <code>get_access_token</code> is then called, passing in the service principal credentials and scope.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Load the credentials from the credentials.json file. Remove this and use hardcoded values if credentials file is not used. </span>
credentials = fabfunc.get_credentials_from_file(<span class="hljs-string">"credentials.json"</span>)

tenant_id = credentials[<span class="hljs-string">"tenant_id"</span>]
app_id = credentials[<span class="hljs-string">"app_id"</span>]
app_secret = credentials[<span class="hljs-string">"app_secret"</span>]

fabric_access_token = fabfunc.get_access_token(tenant_id, app_id, app_secret, <span class="hljs-string">'https://api.fabric.microsoft.com'</span>)
<span class="hljs-comment">#endregion</span>
</code></pre>
<h3 id="heading-step-2-naming-pattern-environments-and-stages"><strong>Step 2: Naming pattern, environments and stages</strong></h3>
<p>After authentication, the setup process follows a structured naming convention defined by the <code>fabric_solution_name</code> variable. This variable uses string interpolation to generate the names of each workspace, incorporating the specified stage and environment names.</p>
<p><strong>Key Configuration Variables</strong></p>
<ol>
<li><p><code>fabric_solution_name</code>: Sets the base naming pattern for the workspaces. For example:</p>
<pre><code class="lang-python"> fabric_solution_name = <span class="hljs-string">'MyDataPlatform - {stage} [{environment}]'</span>
</code></pre>
<p> This pattern ensures consistency in naming by automatically incorporating each workspace's stage and environment into its name.</p>
</li>
<li><p><code>fabric_environments</code>: This JSON-like variable defines each environment to be created, including:</p>
<ul>
<li><p><code>capacity_id</code>: Specifies the Fabric capacity to which each workspace in the environment will be assigned.</p>
</li>
<li><p><code>permissions</code>: Lists the user or group permissions for each environment. Supports Admin, Contributor, Member and Viewer and Group, User and App identities. For example:</p>
<pre><code class="lang-json">  fabric_environments = {
      <span class="hljs-attr">"dev"</span>: {
          <span class="hljs-attr">"capacity_id"</span>: <span class="hljs-string">"79CF9D57-8F75-4879-B906-691A0D85A36B"</span>,
          <span class="hljs-attr">"permissions"</span>: {
              <span class="hljs-attr">"Admin"</span>: [
                  {<span class="hljs-attr">"type"</span>: <span class="hljs-string">"Group"</span>, <span class="hljs-attr">"id"</span>: <span class="hljs-string">"a9327fc3-a6a0-4b82-8087-6b0d698323d7"</span>},
                  {<span class="hljs-attr">"type"</span>: <span class="hljs-string">"User"</span>, <span class="hljs-attr">"id"</span>: <span class="hljs-string">"pg@kapacity.dk"</span>}
              ]
          }
      },
      <span class="hljs-attr">"tst"</span>: {
          # Additional environment configurations
      },
  }
</code></pre>
</li>
</ul>
</li>
<li><p><code>fabric_stages</code>: This variable defines the stages and resources to be created within each environment, specifying different stages of data processing. For instance:</p>
<pre><code class="lang-json"> fabric_stages = {
     <span class="hljs-attr">"Store"</span>: { <span class="hljs-attr">"lakehouses"</span>: [<span class="hljs-string">"Bronze"</span>, <span class="hljs-string">"Silver"</span>, <span class="hljs-string">"Gold"</span>] },
     <span class="hljs-attr">"Ingest"</span>: {},
     <span class="hljs-attr">"Prepare"</span>: {},
     <span class="hljs-attr">"Serve"</span>: {}
 }
</code></pre>
<p> In this configuration, lakehouses are created within the “Store” area, segmented into <strong>Bronze</strong>, <strong>Silver</strong>, and <strong>Gold</strong> layers to align with data lifecycle management.</p>
</li>
</ol>
<p>Together, these variables enable a scalable and automated setup that generates workspaces, assigns capacities, and configures permissions across environments with minimal manual intervention.</p>
<h2 id="heading-automating-workspace-setup">Automating Workspace Setup</h2>
<p>With the naming pattern, environments, stages, and Git integration configured, you’re ready to execute the script to set up your Fabric workspaces and lakehouses.</p>
<p>The script will automatically output the results of the setup process, as shown below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730150932310/c81b4614-c0d8-48ff-9751-17cf0638e7a8.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-cleaning-up-automating-workspace-deletion">Cleaning Up: Automating workspace deletion</h2>
<p>In scenarios where workspaces need to be decommissioned, the python script <code>cleanup_fabric_solution.py</code> can be used to batch-deleting Fabric workspaces based on naming pattern, environments and stages.</p>
<p>Simply specify the naming pattern of you Fabric solution, environments and stages.</p>
<pre><code class="lang-python"><span class="hljs-comment">#region Fabric solution setup</span>
fabric_solution_name = <span class="hljs-string">'MyDataPlatform - {stage} [{environment}]'</span>
fabric_environments = [<span class="hljs-string">'dev'</span>, <span class="hljs-string">'tst'</span>, <span class="hljs-string">'prd'</span>]
fabric_stages = [<span class="hljs-string">'Data'</span>, <span class="hljs-string">'Ingest'</span>, <span class="hljs-string">'Prepare'</span>, <span class="hljs-string">'Serve'</span>]
<span class="hljs-comment">#endregion</span>
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>This approach to automating workspace creation in Microsoft Fabric accelerates setup, ensures consistency, and simplifies the integration of workspaces with Git for CI/CD. By leveraging Fabric REST APIs and Python, you’ll be able to manage and maintain your Fabric data platform workspaces efficiently across all environments.</p>
<p>In the near future, I’ll also be looking into using the Terraform Provider for Fabric which is currently in preview. And also though upon a lot of other topic related to automating Fabric. So stay tuned for more Peer insights!</p>
<p>You can download the notebooks and credentials.json file used in this post here:<br /><a target="_blank" href="https://github.com/gronnerup/Fabric/tree/main/FabricSolutionInit">https://github.com/gronnerup/Fabric/tree/main/FabricSolutionInit</a></p>
]]></content:encoded></item><item><title><![CDATA[Automating Microsoft Fabric: 
Extracting Identity Support data]]></title><description><![CDATA[🆕
The notebooks has been updated on the 28th of March 2025 to reflect changes in the documentation as well as automate the creating of a Lakehouse and import of report definition file directly from the GitHub repo.


In Microsoft Fabric, REST APIs p...]]></description><link>https://peerinsights.emono.dk/automating-microsoft-fabric-extracting-identity-support-data</link><guid isPermaLink="true">https://peerinsights.emono.dk/automating-microsoft-fabric-extracting-identity-support-data</guid><category><![CDATA[microsoftfabric]]></category><category><![CDATA[semantic-link]]></category><dc:creator><![CDATA[Peer Grønnerup]]></dc:creator><pubDate>Mon, 21 Oct 2024 13:33:26 GMT</pubDate><content:encoded><![CDATA[<div data-node-type="callout">
<div data-node-type="callout-emoji">🆕</div>
<div data-node-type="callout-text">The notebooks has been updated on the 28th of March 2025 to reflect changes in the documentation as well as automate the creating of a Lakehouse and import of report definition file directly from the GitHub repo.</div>
</div>

<p>In Microsoft Fabric, REST APIs play a crucial role in automating and optimizing various aspects of platform management, from CI/CD processes to maintaining a data lakehouse. They enable seamless interactions with Fabric items, making it easier to streamline data workflows and handle large-scale operations with minimal manual intervention. Understanding which identities - such as service principals or managed identities - are supported by different Fabric REST API endpoints is essential to ensure secure and efficient platform management.</p>
<p>And wouldn't it be great if we didn't have to visit each individual API documentation page to check which Microsoft Entra identities are supported? Constantly navigating through multiple pages to find this information can be time-consuming and inefficient. Fortunately, there's a way to automate this process, allowing us to extract and centralize the data with ease - saving both time and effort.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743148726090/33296948-2f80-4e73-af58-8df5e2603c29.png" alt class="image--center mx-auto" /></p>
<p>In this blog post, I'll walk through how to scrape Microsoft Fabric REST API documentation using a Fabric Notebook to extract information on supported identities for each endpoint. Once the data is extracted, we can leverage Semantic Link Labs to build a semantic model that exposes data from the Fabric Lakehouse.</p>
<p>And finally, we can create a report using Semantic Link, offering insights into how these identities are supported across various Fabric APIs.</p>
<p>The task for accomplishing the above split into 3 steps:</p>
<ul>
<li><p>Extracting information from the Fabric REST API documentation</p>
</li>
<li><p>Creating a semantic model using Semantic Link Labs</p>
</li>
<li><p>Creating a Power BI report using Semantic Link Labs</p>
</li>
</ul>
<h3 id="heading-extracting-fabric-rest-api-identity-support-information">Extracting Fabric REST API identity support information</h3>
<p>To automate the extraction of identity support information from the Microsoft Fabric REST API documentation, I used BeautifulSoup (from the <code>bs4</code> library) to scrape the necessary data directly from the Microsoft Learn site. Here's a brief overview of how the process works:</p>
<ol>
<li><p><strong>Setup Fabric Items</strong>: Start by creating a new Workspace and assigning it to a Fabric capacity. Next, import the two sample notebooks. You can find a link to the notebooks in the Conclusion section of this blog post.</p>
</li>
<li><p><strong>Fetching the API Documentation</strong>: The code starts by making an HTTP request using <code>requests.get()</code> to fetch the table of contents (TOC) from Microsoft Learn, which is structured in JSON format. The TOC contains links to each API's documentation page.</p>
</li>
<li><p><strong>Parsing the HTML</strong>: For each API page, BeautifulSoup parses the HTML content, looking for a specific section that lists the supported Microsoft Entra identities (e.g., User, Service Principal, and Managed Identities).</p>
</li>
<li><p><strong>Extracting the Identity Data</strong>: Once the correct section is found, the code extracts the table containing identity types. The table rows are iterated over to capture the identity information for each API endpoint, storing the results in a structured format (<code>data_list</code>).</p>
</li>
<li><p><strong>Handling Nested Documentation</strong>: My function <code>extract_all_articles()</code> recursively navigates through nested API documentation sections, ensuring that all relevant pages are checked, even when organized in hierarchical structures.</p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> requests

<span class="hljs-keyword">from</span> pyspark.sql.types <span class="hljs-keyword">import</span> StructType, StructField, StringType
<span class="hljs-keyword">from</span> bs4 <span class="hljs-keyword">import</span> BeautifulSoup
<span class="hljs-keyword">from</span> pyspark.sql.functions <span class="hljs-keyword">import</span> *

baseurl = <span class="hljs-string">"https://learn.microsoft.com/en-us/rest/api/fabric/"</span>

<span class="hljs-comment">### Extract Fabric API documentation</span>
response = requests.get(baseurl+<span class="hljs-string">"toc.json"</span>)
data = response.json()

<span class="hljs-comment"># Call the extract_all_articles function and store the return value as data_list</span>
data_list = extract_all_articles(data)
</code></pre>
<p>This approach allows us to programmatically gather the identity support data, eliminating the need to manually check each API page.</p>
<p>Once collected, the data can be processed further or integrated into a Fabric Lakehouse for analysis. In our case we convert the data_list to a Spark DataFrame and write the DataFrame to a Delta table in our lakehouse. Also we create a manual table holding each Identity option. This table will be used for grouping and filtering APIs in the Power BI report which we will create later.</p>
<h3 id="heading-build-semantic-model-using-semantic-link-labs">Build semantic model using Semantic Link Labs</h3>
<p>After extracting the necessary data from the Microsoft Fabric REST API documentation, the next step is to leverage <a target="_blank" href="https://github.com/microsoft/semantic-link-labs"><strong>Semantic Link Labs</strong></a> to create a semantic model. Semantic Link Labs is <strong>a Python library designed for use in Microsoft Fabric notebooks</strong>. This library extends the capabilities of <a target="_blank" href="https://learn.microsoft.com/en-us/fabric/data-science/semantic-link-overview">Semantic Link</a> offering additional functionalities to seamlessly integrate and work alongside it. Semantic Link Labs simplifies building semantic models, reports and more directly from our Fabric notebooks.</p>
<p>To use Semantic Link Labs we first need to install the Semantic Link Labs package within our Fabric Notebook environment. This can be done by running:</p>
<pre><code class="lang-python">%pip install semantic-link-labs
</code></pre>
<p>Once Semantic Link Labs is installed, we can generate a blank semantic model as a foundation to which we will add our extracted data.</p>
<p>This blank model serves as a starting point, where we’ll later introduce the tables and data derived from your scraping process, along with defining specific measures and hierarchies needed for reporting.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> sempy_labs <span class="hljs-keyword">as</span> labs
<span class="hljs-keyword">from</span> sempy_labs.tom <span class="hljs-keyword">import</span> connect_semantic_model
<span class="hljs-keyword">from</span> sempy_labs <span class="hljs-keyword">import</span> report

lakehouse_name = <span class="hljs-string">"FabricDocs"</span>
lakehouse = mssparkutils.lakehouse.get(lakehouse_name)
workspace_name = notebookutils.runtime.context.get(<span class="hljs-string">"currentWorkspaceName"</span>)

<span class="hljs-comment"># Create a new blank semantic model</span>
semantic_model_name = <span class="hljs-string">f"<span class="hljs-subst">{lakehouse_name}</span>_Model"</span>
labs.create_blank_semantic_model(semantic_model_name)
</code></pre>
<p>After creating the blank model, we will connect to it (using <code>connect_semantic_model</code>) and add objects like tables, expressions, hierarcies etc.</p>
<h3 id="heading-create-a-new-report-using-semantic-link-labs">Create a new report using Semantic Link Labs</h3>
<p>Finally, after setting up the semantic model, we will create a report that exposes the extracted data from our Direct Lake semantic model. This is also achieved using Semantic Link Labs, which enables us to seamlessly generate reports based on the data stored in the model.</p>
<p>The following code is used to create the report:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Read the file as a DataFrame where each row represents a line in the file</span>
df = spark.read.text(<span class="hljs-string">"Files/report.json"</span>)

<span class="hljs-comment"># Convert the DataFrame rows (lines) into a single string</span>
json_raw = <span class="hljs-string">''</span>.join(df.rdd.map(<span class="hljs-keyword">lambda</span> row: row[<span class="hljs-number">0</span>]).collect())
jobject = json.loads(json_raw)

<span class="hljs-comment"># Create a new report based on the report.json file located in our Lakehouse</span>
labs.report.create_report_from_reportjson(
    report=<span class="hljs-string">"Fabric REST API Docs"</span>, 
    dataset=semantic_model_name, 
    report_json=jobject, 
    workspace=workspace_name
    )
</code></pre>
<p>This code reads a JSON file, which contains the report structure, and uses it to create a new report that is tied to the semantic model you previously built. This allows you to easily visualize and analyze the identity data extracted from the Microsoft Fabric REST API documentation, directly within your Fabric Lakehouse environment.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743162420346/e0aeb8d1-7d0a-46e7-a55d-a03e7767c2b5.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>The Microsoft Fabric APIs are essential for automating key components of your Fabric setup, providing a strong foundation for CI/CD, governance, and scaling your data platform. By extracting and centralizing identity support information from the API documentation, you can streamline processes and ensure that your platform is built with both efficiency and security in mind.</p>
<p>In the near future, I’ll be publishing more articles on how to leverage the Fabric REST APIs to jumpstart your Fabric Lakehouse Data Platform, manage CI/CD pipelines, and much more. So stay tuned for more insights!</p>
<p>You can download the notebook etc. used in this post here: <a target="_blank" href="https://github.com/gronnerup/Fabric/tree/main/FabricRestApiDocs">https://github.com/gronnerup/Fabric/tree/main/FabricRestApiDocs</a>.</p>
]]></content:encoded></item></channel></rss>