Azure SDK is over 500 MB and growing on each release

https://github.com/Azure/azure-sdk-for-python/issues/17801

@sodul

The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the Azure SDK, in the SDK the network directory is taking almost half of the size and this is because there are 39 versions of the SDK.

I have never seen anyone doing such a strange approach to version their API clients. I fail to understand why anyone would even want to use the client from 2015 on a cloud product like Azure.

root@1bba10bd1500:~/.pyenv/versions/3.9.2/lib/python3.9/site-packages/azure/mgmt/network# du -shc * | grep M | sort -n 
1.2M	aio
2.4M	v2015_06_15
3.3M	v2016_09_01
3.5M	v2016_12_01
3.7M	v2017_03_01
4.4M	v2017_06_01
4.4M	v2017_08_01
4.9M	v2017_09_01
5.1M	v2017_10_01
5.1M	v2017_11_01
5.1M	v2018_01_01
5.7M	v2018_02_01
6.5M	v2018_04_01
6.6M	v2018_06_01
6.9M	v2018_07_01
8.3M	v2018_08_01
8.4M	v2018_10_01
8.6M	v2018_11_01
8.8M	v2018_12_01
9.0M	v2019_02_01
9.5M	v2019_04_01
10M	v2019_06_01
11M	v2019_07_01
11M	v2019_08_01
11M	v2019_09_01
11M	v2019_11_01
11M	v2019_12_01
12M	v2020_03_01
12M	v2020_04_01
13M	v2020_05_01
13M	v2020_06_01
13M	v2020_07_01
13M	v2020_08_01
259M	total

Can the default release only prove the latest version of the client libraries, or at least provide a 'lean' version of the SDK? This release model is certainly not sustainable and is causing useless grief to your users.

The text was updated successfully, but these errors were encountered:

saurav-clumio, stk0vrfl0w, amberclumio, viveksk-clumio, mjschultz, jiasli, lenon, olivecoder, travcunn, j4m3s-s, and 12 more reacted with thumbs up emoji
All reactions
  • 👍 22 reactions

@ghost ghost added needs-triage

Workflow: This is a new issue that needs to be triaged to the appropriate team.

customer-reported

Issues that are reported by GitHub users external to the Azure organization.

question

The issue doesn't require a change to the product in order to be resolved. Most issues start as that

labels

Apr 5, 2021

@ghost

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @aznetsuppgithub.

Issue Details

The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the Azure SDK, in the SDK the network directory is taking almost half of the size and this is because there are 39 versions of the SDK.

I have never seen anyone doing such a strange approach to version their API clients. I fail to understand why anyone would even want to use the client from 2015 on a cloud product like Azure.

root@1bba10bd1500:~/.pyenv/versions/3.9.2/lib/python3.9/site-packages/azure/mgmt/network# du -shc * | grep M | sort -n 
1.2M	aio
2.4M	v2015_06_15
3.3M	v2016_09_01
3.5M	v2016_12_01
3.7M	v2017_03_01
4.4M	v2017_06_01
4.4M	v2017_08_01
4.9M	v2017_09_01
5.1M	v2017_10_01
5.1M	v2017_11_01
5.1M	v2018_01_01
5.7M	v2018_02_01
6.5M	v2018_04_01
6.6M	v2018_06_01
6.9M	v2018_07_01
8.3M	v2018_08_01
8.4M	v2018_10_01
8.6M	v2018_11_01
8.8M	v2018_12_01
9.0M	v2019_02_01
9.5M	v2019_04_01
10M	v2019_06_01
11M	v2019_07_01
11M	v2019_08_01
11M	v2019_09_01
11M	v2019_11_01
11M	v2019_12_01
12M	v2020_03_01
12M	v2020_04_01
13M	v2020_05_01
13M	v2020_06_01
13M	v2020_07_01
13M	v2020_08_01
259M	total

Can the default release only prove the latest version of the client libraries, or at least provide a 'lean' version of the SDK? This release model is certainly not sustainable and is causing useless grief to your users.

Author: sodul
Assignees: -
Labels:

Mgmt, Network, Service Attention, customer-reported, question

Milestone: -

@kristapratico

Hi @sodul, thanks for the feedback, we'll investigate asap.

@jiasli

Previously reported in #11149.

@sodul

To clarify #11149 is only about azure-mgmt-network which is the largest directory but the problem is present across the entire Azure SDK.

I understand the reasoning for the approach to keep everything for backward compatibility but if you do have customers that point to the old versions then they should pin their requirement versions to the old pypi.org releases of the Azure SDK, not force everyone to keep a copy of everything around. How about providing two versions of the SDKs: one large with everything, one small with just the latest version.

@nolaexe

Hey, is there any update?

@sodul

I wrote a script that we run after pip install. It detects the unused versions and this got us an azure folder shrink from ~ 680MB to ~ 280MB. It cannot go any lower because for some reason some of the objects model definitions from multiple versions are merged together to make the final list that is then used. The script detects the versions that are used internally by the SDK and preserves them, making the script very safe to use.

If there is interest I can open source the script.

@sodul

We have released our script on GitHub. It does delete a good chunk of the API folders but not all of it. With the script the Azure directory is now just under 300MB instead of over 700MB. It is compatible with most, but not all, third party packages, as long as they do not point to a version that is trimmed.

https://github.com/clumio-code/azure-sdk-trim

@KranthiPakala-MSFT

@kristapratico Following up to see if there is any update on this issue? - Thank you

@lmazuel

@KranthiPakala-MSFT we are working on this, and there is ongoing discussion on the issue to be sure we consider all possible impact of any decisions, and nobody would be broken by it.

@logachev

@lmazuel I think one old proposal that won't break anything is to release separate azure-sdk-slim with only latest APIs (that are used by default) and possibly do something with comments (iirc, removing comments reduces the size by 30%)

@sodul

Removing non latest APIs, will remove about 60% of the disk space needed. A further design issues is that some of the API definitions import prior APIs in order to have a complete set of objects. I have no idea why these API definitions where designed this way but it is definitely not very good. I did not think of the idea of stripping comments, which means that we could probably extend azure-sdk-trim to remove comments and other useless whitespace. There is probably a tool that 'compresses' python that we could run. Of course we would not want to remove docstrings, they do help.

@logachev

@sodul Yeah, agreed. So far I saw only keyvault being broken by your tool (which should be fixed soon I guess #21623).

I think there are actually 2 scenarios we're talking about.. Development - I agree, comments & doc strings are useful.
However, building production image - docstrings are unnecessary.. The only trick there is - need to preserve number of empty lines as a replacement for a docstring comment to get same line numbers with exceptions.

@ghost

Hi @sodul. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

@sodul

@dry4ng

Azure is taking twice the disk size of all our packages combined. 10x the size of AWS libraries.

@github-actions GitHub Actions

Hi @sodul, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.

@sodul

For others we have now published the azure-sdk-trim tool to pypi.org to make it installable with a simple pip install azure-sdk-trim.
https://pypi.org/project/azure-sdk-trim/

This tool is NOT affiliated with Microsoft or the Azure SDK maintainers.

With azure-cli==2.59.0 the trimming still helps a lot:

> azure-sdk-trim
/home/user/.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is using 1.2 GB.
Detected az cli with 39 SDKs to keep.
/home/user/.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is now using 607.5 MB.

@github-actions GitHub Actions

Hi @sodul, we deeply appreciate your input into this project. Regrettably, this issue has remained unresolved for over 2 years and inactive for 30 days, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.

@Lawouach

@sodul

I've noticed a significant improvement with the more recent releases of the SDK. The space used has been pretty much halved from 1.2GB to 600MB and with azure-sdk-trim we went from 600MB to 300MB.

+ azure-sdk-trim
.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is using 606.9 MB.
Detected az cli with 39 SDKs to keep.
.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is now using 305.7 MB.
Saved 301.2 MB.

This was with azure-cli==2.60.0.

@iscai-msft

Amazing, it's still an ongoing process but the sdk and the cli team have both been working on reducing the package size, glad that you're able to see the difference!

@github-actions GitHub Actions

Hi @sodul, we deeply appreciate your input into this project. Regrettably, this issue has remained unresolved for over 2 years and inactive for 30 days, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.

@sodul

The latest SDK releases are back to 1.2GB somehow.

Output from running https://github.com/clumio-code/azure-sdk-trim:

/Users/stephane/.pyenv/versions/3.12.6/lib/python3.12/site-packages/azure is using 1.2 GB.
Detected az cli with 39 SDKs to keep.
/Users/stephane/.pyenv/versions/3.12.6/lib/python3.12/site-packages/azure is now using 603.3 MB.
Saved 622.8 MB.

@iscai-msft

@msyyc do you know why the size would bump x 2?

@msyyc

@msyyc do you know why the size would bump x 2?

I think it is still related with some multiapi packages (e.g azure-mgmt-network/web/containerservice). These packages are updated frequently with more new api-version so the size increases more.

@sodul

@msyyc @iscai-msft is there a hard limit on the size where it will be deemed unacceptable and be made a blocker for new releases? Is it 1.5GB, 2GB, 5GB, 10GB? Unless there is some drastic changes with the current SDK model these sizes will be reached.

I can't see this path to be sustainable, especially in the modern container based world.

@iscai-msft

@sodul agreed, @msyyc are you able to reach out to your contacts on the management team and ask which multiapi packages they're releasing and why they aren't using the multiapi combiner script? We definitely don't want to move backwards here

{
"by": "varun_chopra",
"descendants": 3,
"id": 40245553,
"kids": [
40245593,
40247401,
40245639
],
"score": 4,
"time": 1714726390,
"title": "Azure SDK is over 500 MB and growing on each release",
"type": "story",
"url": "https://github.com/Azure/azure-sdk-for-python/issues/17801"
}
{
"author": "Azure",
"date": "2021-04-05T12:00:00.000Z",
"description": "The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the A…",
"image": "https://opengraph.githubassets.com/8640b1bb09d02d66dfa6f6783bc95ed7accf21c6499a68302f6fd7b4b11d985c/Azure/azure-sdk-for-python/issues/17801",
"logo": "https://logo.clearbit.com/github.com",
"publisher": "GitHub",
"title": "Azure SDK is over 500MB and growing on each release. · Issue #17801 · Azure/azure-sdk-for-python",
"url": "https://github.com/Azure/azure-sdk-for-python/issues/17801"
}
{
"url": "https://github.com/Azure/azure-sdk-for-python/issues/17801",
"title": "Azure SDK is over 500MB and growing on each release. · Issue #17801 · Azure/azure-sdk-for-python",
"description": "The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the A...",
"links": [
"https://github.com/Azure/azure-sdk-for-python/issues/17801"
],
"image": "https://opengraph.githubassets.com/8640b1bb09d02d66dfa6f6783bc95ed7accf21c6499a68302f6fd7b4b11d985c/Azure/azure-sdk-for-python/issues/17801",
"content": "<div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\"><img src=\"https://avatars.githubusercontent.com/u/1504511?s=80&amp;v=4\" alt=\"@sodul\" /></a>\n</p>\n <div>\n<table>\n <tbody>\n <tr>\n <td>\n <p>The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the Azure SDK, in the SDK the network directory is taking almost half of the size and this is because there are 39 versions of the SDK.</p>\n<p>I have never seen anyone doing such a strange approach to version their API clients. I fail to understand why anyone would even want to use the client from 2015 on a cloud product like Azure.</p>\n<div><pre><code>root@1bba10bd1500:~/.pyenv/versions/3.9.2/lib/python3.9/site-packages/azure/mgmt/network# du -shc * | grep M | sort -n \n1.2M\taio\n2.4M\tv2015_06_15\n3.3M\tv2016_09_01\n3.5M\tv2016_12_01\n3.7M\tv2017_03_01\n4.4M\tv2017_06_01\n4.4M\tv2017_08_01\n4.9M\tv2017_09_01\n5.1M\tv2017_10_01\n5.1M\tv2017_11_01\n5.1M\tv2018_01_01\n5.7M\tv2018_02_01\n6.5M\tv2018_04_01\n6.6M\tv2018_06_01\n6.9M\tv2018_07_01\n8.3M\tv2018_08_01\n8.4M\tv2018_10_01\n8.6M\tv2018_11_01\n8.8M\tv2018_12_01\n9.0M\tv2019_02_01\n9.5M\tv2019_04_01\n10M\tv2019_06_01\n11M\tv2019_07_01\n11M\tv2019_08_01\n11M\tv2019_09_01\n11M\tv2019_11_01\n11M\tv2019_12_01\n12M\tv2020_03_01\n12M\tv2020_04_01\n13M\tv2020_05_01\n13M\tv2020_06_01\n13M\tv2020_07_01\n13M\tv2020_08_01\n259M\ttotal\n</code></pre></div>\n<p>Can the default release only prove the latest version of the client libraries, or at least provide a 'lean' version of the SDK? This release model is certainly not sustainable and is causing useless grief to your users.</p>\n </td>\n </tr>\n <tr>\n <td>\n <div>\n <p>\n The text was updated successfully, but these errors were encountered:\n </p>\n <ol>\n </ol>\n </div>\n </td>\n </tr>\n </tbody>\n</table>\n <div>\n <div>\n saurav-clumio, stk0vrfl0w, amberclumio, viveksk-clumio, mjschultz, jiasli, lenon, olivecoder, travcunn, j4m3s-s, and 12 more reacted with thumbs up emoji\n <div>\n <details>\n <summary> <span>\n <span>All reactions</span>\n </span>\n</summary>\n <ul>\n <li>\n 👍\n <span>22 reactions</span>\n </li>\n </ul>\n </details>\n </div>\n </div>\n</div>\n </div>\n</div>\n <div>\n <div>\n <p><img src=\"https://avatars.githubusercontent.com/u/10137?s=32&amp;v=4\" alt=\"@ghost\" />\n <strong>ghost</strong>\n added\n<a target=\"_blank\" href=\"https://github.com/Azure/azure-sdk-for-python/labels/needs-triage\">\n needs-triage\n</a></p>Workflow: This is a new issue that needs to be triaged to the appropriate team.\n<p><a target=\"_blank\" href=\"https://github.com/Azure/azure-sdk-for-python/labels/customer-reported\">\n customer-reported\n</a></p>Issues that are reported by GitHub users external to the Azure organization.\n<p><a target=\"_blank\" href=\"https://github.com/Azure/azure-sdk-for-python/labels/question\">\n question\n</a></p>The issue doesn't require a change to the product in order to be resolved. Most issues start as that<p>\n labels\n </p><a target=\"_blank\" href=\"https://github.com/Azure/azure-sdk-for-python/issues/17801#event-4552974507\">Apr 5, 2021</a>\n </div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/ghost\"><img src=\"https://avatars.githubusercontent.com/u/10137?s=80&amp;v=4\" alt=\"@ghost\" /></a>\n</p>\n <div>\n<div>\n <p>Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc <a target=\"_blank\" href=\"https://github.com/aznetsuppgithub\">@aznetsuppgithub</a>.</p>\n<details>\n<summary>Issue Details</summary>\n<hr />\n<p>The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the Azure SDK, in the SDK the network directory is taking almost half of the size and this is because there are 39 versions of the SDK.</p>\n<p>I have never seen anyone doing such a strange approach to version their API clients. I fail to understand why anyone would even want to use the client from 2015 on a cloud product like Azure.</p>\n<div><pre><code>root@1bba10bd1500:~/.pyenv/versions/3.9.2/lib/python3.9/site-packages/azure/mgmt/network# du -shc * | grep M | sort -n \n1.2M\taio\n2.4M\tv2015_06_15\n3.3M\tv2016_09_01\n3.5M\tv2016_12_01\n3.7M\tv2017_03_01\n4.4M\tv2017_06_01\n4.4M\tv2017_08_01\n4.9M\tv2017_09_01\n5.1M\tv2017_10_01\n5.1M\tv2017_11_01\n5.1M\tv2018_01_01\n5.7M\tv2018_02_01\n6.5M\tv2018_04_01\n6.6M\tv2018_06_01\n6.9M\tv2018_07_01\n8.3M\tv2018_08_01\n8.4M\tv2018_10_01\n8.6M\tv2018_11_01\n8.8M\tv2018_12_01\n9.0M\tv2019_02_01\n9.5M\tv2019_04_01\n10M\tv2019_06_01\n11M\tv2019_07_01\n11M\tv2019_08_01\n11M\tv2019_09_01\n11M\tv2019_11_01\n11M\tv2019_12_01\n12M\tv2020_03_01\n12M\tv2020_04_01\n13M\tv2020_05_01\n13M\tv2020_06_01\n13M\tv2020_07_01\n13M\tv2020_08_01\n259M\ttotal\n</code></pre></div>\n<p>Can the default release only prove the latest version of the client libraries, or at least provide a 'lean' version of the SDK? This release model is certainly not sustainable and is causing useless grief to your users.</p>\n<table>\n <tbody><tr>\n <th>Author:</th>\n <td>sodul</td>\n </tr>\n <tr>\n <th>Assignees:</th>\n <td>-</td>\n </tr>\n <tr>\n <th>Labels:</th>\n <td>\n<p><code>Mgmt</code>, <code>Network</code>, <code>Service Attention</code>, <code>customer-reported</code>, <code>question</code></p>\n</td>\n </tr>\n <tr>\n <th>Milestone:</th>\n <td>-</td>\n </tr>\n</tbody></table>\n</details>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/kristapratico\"><img src=\"https://avatars.githubusercontent.com/u/31998003?s=80&amp;u=0d91cde56e2c25d8ee7447bc55099e3dad047e99&amp;v=4\" alt=\"@kristapratico\" /></a>\n</p>\n <div>\n<div>\n <p>Hi <a target=\"_blank\" href=\"https://github.com/sodul\">@sodul</a>, thanks for the feedback, we'll investigate asap.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/jiasli\"><img src=\"https://avatars.githubusercontent.com/u/4003950?s=80&amp;u=e4a8aa80d79257079bc7d220e352602594715618&amp;v=4\" alt=\"@jiasli\" /></a>\n</p>\n <div>\n<div>\n <p>Previously reported in <a target=\"_blank\" href=\"https://github.com/Azure/azure-sdk-for-python/issues/11149\">#11149</a>.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\"><img src=\"https://avatars.githubusercontent.com/u/1504511?s=80&amp;v=4\" alt=\"@sodul\" /></a>\n</p>\n <div>\n<div>\n <p>To clarify <a target=\"_blank\" href=\"https://github.com/Azure/azure-sdk-for-python/issues/11149\">#11149</a> is only about azure-mgmt-network which is the largest directory but the problem is present across the entire Azure SDK.</p>\n<p>I understand the reasoning for the approach to keep everything for backward compatibility but if you do have customers that point to the old versions then they should pin their requirement versions to the old pypi.org releases of the Azure SDK, not force everyone to keep a copy of everything around. How about providing two versions of the SDKs: one large with everything, one small with just the latest version.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/nolaexe\"><img src=\"https://avatars.githubusercontent.com/u/37687536?s=80&amp;v=4\" alt=\"@nolaexe\" /></a>\n</p>\n <div>\n<div>\n <p>Hey, is there any update?</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\"><img src=\"https://avatars.githubusercontent.com/u/1504511?s=80&amp;v=4\" alt=\"@sodul\" /></a>\n</p>\n <div>\n<div>\n <p>I wrote a script that we run after <code>pip install</code>. It detects the unused versions and this got us an azure folder shrink from ~ 680MB to ~ 280MB. It cannot go any lower because for some reason some of the objects model definitions from multiple versions are merged together to make the final list that is then used. The script detects the versions that are used internally by the SDK and preserves them, making the script very safe to use.</p>\n<p>If there is interest I can open source the script.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\"><img src=\"https://avatars.githubusercontent.com/u/1504511?s=80&amp;v=4\" alt=\"@sodul\" /></a>\n</p>\n <div>\n<div>\n <p>We have released our script on GitHub. It does delete a good chunk of the API folders but not all of it. With the script the Azure directory is now just under 300MB instead of over 700MB. It is compatible with most, but not all, third party packages, as long as they do not point to a version that is trimmed.</p>\n<p><a target=\"_blank\" href=\"https://github.com/clumio-code/azure-sdk-trim\">https://github.com/clumio-code/azure-sdk-trim</a></p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/KranthiPakala-MSFT\"><img src=\"https://avatars.githubusercontent.com/u/50185738?s=80&amp;v=4\" alt=\"@KranthiPakala-MSFT\" /></a>\n</p>\n <div>\n<div>\n <p><a target=\"_blank\" href=\"https://github.com/kristapratico\">@kristapratico</a> Following up to see if there is any update on this issue? - Thank you</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/lmazuel\"><img src=\"https://avatars.githubusercontent.com/u/1050156?s=80&amp;u=917458fd857c01ca15a644b08c232b2a04a29244&amp;v=4\" alt=\"@lmazuel\" /></a>\n</p>\n <div>\n<div>\n <p><a target=\"_blank\" href=\"https://github.com/KranthiPakala-MSFT\">@KranthiPakala-MSFT</a> we are working on this, and there is ongoing discussion on the issue to be sure we consider all possible impact of any decisions, and nobody would be broken by it.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/logachev\"><img src=\"https://avatars.githubusercontent.com/u/1036695?s=80&amp;v=4\" alt=\"@logachev\" /></a>\n</p>\n <div>\n<div>\n <p><a target=\"_blank\" href=\"https://github.com/lmazuel\">@lmazuel</a> I think one old proposal that won't break anything is to release separate <code>azure-sdk-slim</code> with only latest APIs (that are used by default) and possibly do something with comments (iirc, removing comments reduces the size by 30%)</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\"><img src=\"https://avatars.githubusercontent.com/u/1504511?s=80&amp;v=4\" alt=\"@sodul\" /></a>\n</p>\n <div>\n<div>\n <p>Removing non latest APIs, will remove about 60% of the disk space needed. A further design issues is that some of the API definitions import prior APIs in order to have a complete set of objects. I have no idea why these API definitions where designed this way but it is definitely not very good. I did not think of the idea of stripping comments, which means that we could probably extend <code>azure-sdk-trim</code> to remove comments and other useless whitespace. There is probably a tool that 'compresses' python that we could run. Of course we would not want to remove docstrings, they do help.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/logachev\"><img src=\"https://avatars.githubusercontent.com/u/1036695?s=80&amp;v=4\" alt=\"@logachev\" /></a>\n</p>\n <div>\n<div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\">@sodul</a> Yeah, agreed. So far I saw only keyvault being broken by your tool (which should be fixed soon I guess <a target=\"_blank\" href=\"https://github.com/Azure/azure-sdk-for-python/issues/21623\">#21623</a>).</p>\n<p>I think there are actually 2 scenarios we're talking about.. Development - I agree, comments &amp; doc strings are useful.<br />\nHowever, building production image - docstrings are unnecessary.. The only trick there is - need to preserve number of empty lines as a replacement for a docstring comment to get same line numbers with exceptions.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/ghost\"><img src=\"https://avatars.githubusercontent.com/u/10137?s=80&amp;v=4\" alt=\"@ghost\" /></a>\n</p>\n <div>\n<div>\n <p>Hi <a target=\"_blank\" href=\"https://github.com/sodul\">@sodul</a>. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “<code>/unresolve</code>” to remove the “issue-addressed” label and continue the conversation.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\"><img src=\"https://avatars.githubusercontent.com/u/1504511?s=80&amp;v=4\" alt=\"@sodul\" /></a>\n</p>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/dry4ng\"><img src=\"https://avatars.githubusercontent.com/u/34044009?s=80&amp;v=4\" alt=\"@dry4ng\" /></a>\n</p>\n <div>\n<div>\n <p>Azure is taking twice the disk size of all our packages combined. 10x the size of AWS libraries.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/apps/github-actions\"><img src=\"https://avatars.githubusercontent.com/in/15368?s=80&amp;v=4\" alt=\"@github-actions\" /></a>\n <a target=\"_blank\" href=\"https://github.com/apps/github-actions\">\n <img src=\"https://avatars.githubusercontent.com/in/15368?s=40&amp;u=167a342ed94d2a713daf64a8b476ead2cebe1852&amp;v=4\" alt=\"GitHub Actions\" />\n </a>\n</p>\n <div>\n<div>\n <p>Hi <a target=\"_blank\" href=\"https://github.com/sodul\">@sodul</a>, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\"><img src=\"https://avatars.githubusercontent.com/u/1504511?s=80&amp;v=4\" alt=\"@sodul\" /></a>\n</p>\n <div>\n<div>\n <p>For others we have now published the azure-sdk-trim tool to pypi.org to make it installable with a simple <code>pip install azure-sdk-trim</code>.<br />\n<a target=\"_blank\" href=\"https://pypi.org/project/azure-sdk-trim/\">https://pypi.org/project/azure-sdk-trim/</a></p>\n<p>This tool is NOT affiliated with Microsoft or the Azure SDK maintainers.</p>\n<p>With azure-cli==2.59.0 the trimming still helps a lot:</p>\n<div><pre><code>&gt; azure-sdk-trim\n/home/user/.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is using 1.2 GB.\nDetected az cli with 39 SDKs to keep.\n/home/user/.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is now using 607.5 MB.\n</code></pre></div>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/apps/github-actions\"><img src=\"https://avatars.githubusercontent.com/in/15368?s=80&amp;v=4\" alt=\"@github-actions\" /></a>\n <a target=\"_blank\" href=\"https://github.com/apps/github-actions\">\n <img src=\"https://avatars.githubusercontent.com/in/15368?s=40&amp;u=167a342ed94d2a713daf64a8b476ead2cebe1852&amp;v=4\" alt=\"GitHub Actions\" />\n </a>\n</p>\n <div>\n<div>\n <p>Hi <a target=\"_blank\" href=\"https://github.com/sodul\">@sodul</a>, we deeply appreciate your input into this project. Regrettably, this issue has remained unresolved for over 2 years and inactive for 30 days, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/Lawouach\"><img src=\"https://avatars.githubusercontent.com/u/769982?s=80&amp;u=1cfeea8664bb58ba3c81fa838d99cd103fe31b89&amp;v=4\" alt=\"@Lawouach\" /></a>\n</p>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\"><img src=\"https://avatars.githubusercontent.com/u/1504511?s=80&amp;v=4\" alt=\"@sodul\" /></a>\n</p>\n <div>\n<div>\n <p>I've noticed a significant improvement with the more recent releases of the SDK. The space used has been pretty much halved from 1.2GB to 600MB and with azure-sdk-trim we went from 600MB to 300MB.</p>\n<div><pre><code>+ azure-sdk-trim\n.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is using 606.9 MB.\nDetected az cli with 39 SDKs to keep.\n.pyenv/versions/3.12.3/lib/python3.12/site-packages/azure is now using 305.7 MB.\nSaved 301.2 MB.\n</code></pre></div>\n<p>This was with <code>azure-cli==2.60.0</code>.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/iscai-msft\"><img src=\"https://avatars.githubusercontent.com/u/43154838?s=80&amp;u=e0782debf7784c0dc550a9feab55e357b912a029&amp;v=4\" alt=\"@iscai-msft\" /></a>\n</p>\n <div>\n<div>\n <p>Amazing, it's still an ongoing process but the sdk and the cli team have both been working on reducing the package size, glad that you're able to see the difference!</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/apps/github-actions\"><img src=\"https://avatars.githubusercontent.com/in/15368?s=80&amp;v=4\" alt=\"@github-actions\" /></a>\n <a target=\"_blank\" href=\"https://github.com/apps/github-actions\">\n <img src=\"https://avatars.githubusercontent.com/in/15368?s=40&amp;u=167a342ed94d2a713daf64a8b476ead2cebe1852&amp;v=4\" alt=\"GitHub Actions\" />\n </a>\n</p>\n <div>\n<div>\n <p>Hi <a target=\"_blank\" href=\"https://github.com/sodul\">@sodul</a>, we deeply appreciate your input into this project. Regrettably, this issue has remained unresolved for over 2 years and inactive for 30 days, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\"><img src=\"https://avatars.githubusercontent.com/u/1504511?s=80&amp;v=4\" alt=\"@sodul\" /></a>\n</p>\n <div>\n<div>\n <p>The latest SDK releases are back to 1.2GB somehow.</p>\n<p>Output from running <a target=\"_blank\" href=\"https://github.com/clumio-code/azure-sdk-trim\">https://github.com/clumio-code/azure-sdk-trim</a>:</p>\n<div><pre><code>/Users/stephane/.pyenv/versions/3.12.6/lib/python3.12/site-packages/azure is using 1.2 GB.\nDetected az cli with 39 SDKs to keep.\n/Users/stephane/.pyenv/versions/3.12.6/lib/python3.12/site-packages/azure is now using 603.3 MB.\nSaved 622.8 MB.\n</code></pre></div>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/iscai-msft\"><img src=\"https://avatars.githubusercontent.com/u/43154838?s=80&amp;u=e0782debf7784c0dc550a9feab55e357b912a029&amp;v=4\" alt=\"@iscai-msft\" /></a>\n</p>\n <div>\n<div>\n <p><a target=\"_blank\" href=\"https://github.com/msyyc\">@msyyc</a> do you know why the size would bump x 2?</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/msyyc\"><img src=\"https://avatars.githubusercontent.com/u/70930885?s=80&amp;u=92cc9fd91d1cf1807011ec2f619314f334d87d23&amp;v=4\" alt=\"@msyyc\" /></a>\n</p>\n <div>\n<div>\n <blockquote>\n<p><a target=\"_blank\" href=\"https://github.com/msyyc\">@msyyc</a> do you know why the size would bump x 2?</p>\n</blockquote>\n<p>I think it is still related with some multiapi packages (e.g azure-mgmt-network/web/containerservice). These packages are updated frequently with more new api-version so the size increases more.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\"><img src=\"https://avatars.githubusercontent.com/u/1504511?s=80&amp;v=4\" alt=\"@sodul\" /></a>\n</p>\n <div>\n<div>\n <p><a target=\"_blank\" href=\"https://github.com/msyyc\">@msyyc</a> <a target=\"_blank\" href=\"https://github.com/iscai-msft\">@iscai-msft</a> is there a hard limit on the size where it will be deemed unacceptable and be made a blocker for new releases? Is it 1.5GB, 2GB, 5GB, 10GB? Unless there is some drastic changes with the current SDK model these sizes will be reached.</p>\n<p>I can't see this path to be sustainable, especially in the modern container based world.</p>\n </div>\n </div>\n</div>\n <div>\n <p><a target=\"_blank\" href=\"https://github.com/iscai-msft\"><img src=\"https://avatars.githubusercontent.com/u/43154838?s=80&amp;u=e0782debf7784c0dc550a9feab55e357b912a029&amp;v=4\" alt=\"@iscai-msft\" /></a>\n</p>\n <div>\n<div>\n <p><a target=\"_blank\" href=\"https://github.com/sodul\">@sodul</a> agreed, <a target=\"_blank\" href=\"https://github.com/msyyc\">@msyyc</a> are you able to reach out to your contacts on the management team and ask which multiapi packages they're releasing and why they aren't using the multiapi combiner script? We definitely don't want to move backwards here</p>\n </div>\n </div>\n</div>\n</div>\n </div>",
"author": "",
"favicon": "https://github.githubassets.com/favicons/favicon.svg",
"source": "github.com",
"published": "",
"ttr": 343,
"type": "object"
}