First, they restricted code search without logging in so I’m using sourcegraph But now, I cant even view discussions or wiki without logging in.
It was a nice run
What about the time they fired their artists and then immediately wrote a blog post congratulating themselves for making AI art from a model trained on the ex-employees’ art. Inspiring.
GitHub has art?
Aaw cute little logo character thingie.
The writing was on the wall when they established a generative AI using everyone’s code and of course without asking anyone for permission.
It’s an interesting debate isn’t it? Does AI transform something free into something that’s not? Or does it simply study the code?
@xilliah It’s not free though. It came with licenses. And LLMs don’t have the capability to “study”, they are just a glorified random word generator.
Ok
No, it’s exhausting.
There’s no debate. LLMs are plagiarism with extra steps. They take data (usually illegally) wholesale and then launder it.
A lot of people have been doing research into the ethics of these systems and that’s more or less what they found. The reason why they’re black boxes is precisely the reason we all suspected; they were made that way because if they weren’t we’d all see them for what they are.
The reason they’re black boxes is because that’s how LLMs work. Nothing new here, neural networks have been basically black boxes for a long time.
Sure, but nothing is theoretically stopping them from documenting every single data source input into the training module and then crediting it later.
For some reason they didn’t want to do that of course.
The reason they are blackboxes is because they are function approximators with billions of parameters. Theory has not caught up with practical results. This is why you tune hyperparameters (learning rate, number of layers, number of neurons ina layer, etc.) and have multiple iterations of training to get an approximation of the distribution of the inputs. Training is also sensitive to the order of inputs to the network. A network trained on the same training set but in a different order might converge to an entirely different function. This is why you train on the same inputs in random order over multiple episodes to hopefully average out such variations. They are blackboxes simply because you can’t yet prove theoretically the function it has approximated or converged to given the input.
Can you link it please? I’d like to inform myself.
I doubt they have a factual basis for their opinion, considering
they were made that way because if they weren’t we’d all see them for what they are.
Is just plain wrong. Researchers would love to have a non black box AI (i.e. a white box AI), but it’s unfortunately impossible with the current architecture.
They also broke some stuff with some javascript, I think. I’m using KDE’s web browser (Falkon) and it used to work well.
I moved all my open source projects to Gitlab the day Microsoft announced they were acquiring Github.
(I wish in retrospect I’d taken the time to research and decide on the right host. I likely would have gone to Codeberg instead of Gitlab had I done so. But Gitlab’s still better than Github. And I don’t really know for sure that Codeberg was even around back when Microsoft acquired Github.)
I’m OOTL. Why is Codeberg better than GitLab?
Codeberg is ran by a German nonprofit. GitLab is publically-traded on NASDAQ.
Codeberg us really new, i think like 2 years. Since covid for sure.
I registered there june 2020 so longer than that
Ah. Good to know. I don’t feel so bad about going with Gitlab now.
The only thing surprising is that it took Microsoft almost three years to turn on the shit-spigot.
There’s nothing wrong with it honestly, and OP seems to be giving bad info… And trust me, I’m not a fan of Microsoft lol
i literally just tested Discussions and wiki in private browsing mode on a few repos and they work. Which just proves it’s not a big deal that needing a login isn’t an issue. Seems nobody actually upvoting doesn’t have a login
I heard other people complaining about what OP says, so I’m thinking maybe it’s A/B testing…
You gotta embrace first
They also added some crappy requirements to their student benefit package.
Are you trying to get people to use it, or trying to get people to accidentally keep paying a subscription?
I just checked, and unless I’m missing something, you’re wrong? Tried https://github.com/snowplow/snowplow/wiki in private browser mode. Seems to work fine… Discussions work too.
And the restricted code search is not a big deal. You can still see and download all the source code you want and search that way. What usecase do you have for code searching without login? Lemmy is restricted too without login (as well as literally everything)
Creating a login is free too, and so is downloading source code. Github is a FREE service lol… And you’re whinging you need to create a free login? If you don’t like Github, then don’t use it lol. Absolutely nothing is preventing anyone migrating lol
Lemmy is restricted too without login (as well as literally everything)
You mean that you cannot comment or vote without an account? That just makes sense, because you need an account to tell the server to save some data of yours. That has to be connected to an account. Search does not (unless you are fixated on saving all actions of the user on the platform for behavioral analysis)
The funny thing is that the last person I saw make a huge deal of this on Lemmy/Reddit, didn’t have a huge number of github commits over the years (they definitely had some, so they were active though, but even our newbies at work overtook them in months)
Maybe you didn’t know, but not everyone in IT (job or hobby) writes code.
Creating a login is free too
Not really: you have to give personal information.
It’s not much of a problem until they only need an email address and are not too opinionated on your provider, but it’s not rare at all that platforms also require a phone number (either upfront at registration, or discord-microsoft-style, locking you out of your account untill you give it them) which for the most part won’t be private at all. Thus, you are paying with your data. For something (repo content) that the maintainers wanted to be public and free.Creating a login is free too, and so is downloading source code
What about the Wiki and Discussions? Several others said things that make me think it’s under A/B testing.
I’m honestly blown away by whomever finds this surprising. This is Microsoft we’re talking about. Everything they touch turns into this. Taking what is not theirs, using it for profit, and not even giving credit where credit is due.
I don’t really feel like self-hosting a Git instance is a good idea for me personally, but I’ve been really happy with Gitlab for around 8 years now
I’m not a developer so I’m not very familiar with this world. But it kind of amazes me that the code for so many open source projects are hosted by Microsoft. Isn’t there a FOSS alternative? edit: seems Gitlab is an alternative. Then the question is, why are people using microsoft products?
Codeberg.org is the ethical choice
Github started independently and was amazing service(and still is except now its going downhill) but Microsoft acquired it it 2018
The power of git ( the backbone of github ) comes in that you can easily take a repository and move it to a different server. Its like, 3 commands? ( git vlone, git add remote, git push ). So if people would leave github, nothing is lost :)
It could be much worse.
Yeah… It could be… OP could have checked their facts for starters
Not sure how they got so many votes. i literally just tested Discussions and wiki in private browsing mode on a few repos and they work.
Because after placing code search through the login wall, and everything that is copilot it can really be expected that something like this happens. That you don’t see it does not tell much, as companies large like this are making good use of A/B testing.
There was also a partial outage 2 days ago. So That they did see it that day doesn’t say much either.
The reality is, hosting your own repo is a pain, and developers are looking for stability. It’s also not cheap to host source code, and Microsoft are doing it for free for open source projects
They also need to handle dodgy usage of the hosting too (which they’ve successfully been doing).
And again, if op wants to migrate, that’s up to them. I don’t care about code search though for people who aren’t logged in so I wouldn’t move, especially since if they don’t have a login, they’re not contributing anything anyway
I used to host code on source forge 20 years ago using CVS, and they were free but wouldn’t even let you delete any code you uploaded.
Sorry, but I don’t see your points.
I don’t see what you mean by that outage.
Then stability does not need locking read-only features behind a requirement to log in.
Microsoft has chosen to host public source code for free (or for their benefit which does not have monetary costs to users), no one forced them to acquire GitHub.
Defending against dodgy usage and moderating repos also don’t require read-only features to be login walled: if you don’t log in, you already couldn’t do anything that would need moderation.And again, if op wants to migrate, that’s up to them.
The post was not about them migrating their projects. It is raising awareness about an unwelcome change that affects them and probably others too. It bothers me too if Microsoft (or anyone else) wants to force me to log in for read-only access to content that was uploaded on their platform to be made public, because to me that means Microsoft wants to meddle with my data, including knowing what projects I’ve stumbled upon, but possibly even through absurd registration (or account kepping) rules like handing out a stable personal identifier like a phone number or an email address at a select few email providers.
I don’t care about code search though for people who aren’t logged in so I wouldn’t move,
I read this as “it’s not me, so I couldn’t care less”. I would bet you also find absolutely no problem with using google services (or those of any other data mining companies) and making others do that too.
especially since if they don’t have a login, they’re not contributing anything anyway
Oh, that’s not just about that. I have an account, but I don’t want Microsoft to tie every little search to my account for behavioral analysis.
This move is very much like public transport requiring the use of such bus passes that need to be scanned when you hop on, and which is tied to your person. They shouldn’t need to do that for verifying if I’m eligible for the service, but they are doing it anyway, for whatever unknown reason.But also, do you remember that GitHub also hosts tons of projects which are licensed to not only those who contribute?
I’m actually growing increasingly suspicious that you personally haven’t actually tested OP’s claims… Have you? Or am I literally the only one in this thread who tested anything lol
There’s literally no actual evidence what they’re saying is true, and you’re making assumptions that its an A/B test. We don’t even know what projects they tested (for all we know, they tested a project where the wiki was restricted to the team only, and assumed it affected everyone)
Either that or OP is just simply wrong, or, was affected by the outage that conveniently happened the day OP posted this, which specifically mentioned things including pages… You can see outages on https://www.githubstatus.com/history . You’re assuming again it shouldn’t affect other things…
They didn’t even post any information on what repos they tested. We see these crazy witch-hunts constantly in the tech community. Remember the Xbox ring of death debacle where people told others that Microsoft was stupid and left a piece of paper in their heatsinks? Turned out it was a thermal pad.
What data do you think Microsoft gains from “datamining” searches in repos?
Also, there are huge open source projects on Github, and if the searching thing was a big deal at all, they would be making public announcements… They aren’t. And again, the people making a deal out of it I’ve looked up, haven’t contributed much either (so, it feels like they simply are using it to attack Microsoft).
I just tested 3 other repos and they all have wiki’s and such working publicly. Given op is the only one I’ve seen complain about this anywhere and hasn’t posted any evidence, I think its just a weird witch hunt… Either that, or it could even be done in a specific country for legal reasons too… But there is no way of even testing that because we don’t know where OP is (they didn’t say that). Or it could be done to reduce server load
I sure as hell don’t agree with Copilot scraping repos, but there is no actual evidence in this discussion thread, only a claim by OP
Not sure how they got so many votes.
Social media loves a good roasting.
Social
Yeah. The funny thing is that the other guy who made a HUGE deal about the search thing and how it was bad for open source didn’t even have many commits over the years
Honestly for selfhosters, I can’t recommend enough setting up an instance of Gitea. You’ll be very happy hosting your code and such there, then just replicate it to github or something if you want it on the big platforms.
Just so you’re aware, Gitea was taken over by a for-profit company. Which is why it was forked and Forgejo was formed. If you don’t use Github as a matter of principle, then you should switch to Forgejo instead.
did they get federation working?
Nothing usable yet unfortunately, but they seem to be making good progress: https://codeberg.org/forgejo/forgejo/issues/59
Thanks for the link! As long as it’s being worked on I feel comfortable spinning up an instance. I’ve been meaning to do gitea for a while so I’m glad I waited.
Oh man, thanks for this. I had no idea, having used gitea for years now.
Damnit of course it was. Thanks for letting me know, now I’ll have to redo my 100+ repos.
Changing the remote should be fairly trivial with enough bash skills
It’s more I don’t have them all checked out, and a good chunk are mirrors of github, so I’ll have to list out each one and push to a new remote, mirrors will have to be setup again, and I also use the container and package registries. I’m pretty embedded. It’s not impossible, but it’s a weekend project for sure.
If it was just forked, cant you just switch the package/container-image and be done?
Depends on how much it was changed I’m guessing. Fingers crossed I could just flip it over, but who knows
Simply changing the binary worked for me. Been more than 1 month and no migration issues.
It does still show gitea branding, however.
+1 for Gitea. It’s super lightweight, and works really well! I recently switched to Gitlab simply because I wanted experience with hosting it, but Gitea is much lighter and easier to use.
Forgejo please. Gitea was acquired by a for-profit company
Maybe have a look at this comment elsewhere in the thread.
Does it have any features that github doesn’t?
Forgejo for you chap.
Honestly I’m kind of surprised that Gitea is still being recommended on Lemmy, it’s been a while since Gitea was acquired and the community has been raging since. Lemmy is regressing
Lemmy is regressing
it is not lol, you are just realising that you are not part of any elite for the simple reason of using it
I’m still stuck on why I have to create a password-equivalent API token, and then store it on my hard drive if I want an at-all-convenient workflow.
“We made it more secure!”
“How is storing it on my hard drive more secure”
“Just have it expire after a week!”
“How is it more secure now, seems like now there are two points of failure in the system, and anyway I keep hearing about security problems in github which this hasn’t been a solution to any of them”
“SHUT UP THAT’S HOW”
Because of someone gets your API token they can only push and pull. If someone gets your password they can do anything
Let’s go over the attack vectors involved for different common workflows. I’m going to use the specific case of how I use git.
- Store passwords in pass, have them memorized and type them anew every time
- Store passwords in pass, store API tokens in OSX keychain
Which is more secure? The thing that you’re saying is better-protected because it’s limited, doesn’t exist in workflow #1. The tokens aren’t limited to push and pull, because they’re limited to nothing.
If someone gets my password in case #2, they can still do anything. That’s my central point – you haven’t removed any point of vulnerability, you’ve created another point of vulnerability and then mandated that people use it. And this isn’t an abstract issue; there are several compromises of github data stemming from people’s API tokens being compromised. My assertion is that in some of those cases, using case #1 instead of storing the API tokens would have prevented the compromise. Maybe I am wrong in that. I know that password compromises happen too. But my point is, you’re not preventing anybody from getting their password compromised. Someone can still steal my password out of pass. Someone who puts a keylogger on my computer will have the passwords to my OSX keychain and pass, both. You’re simply introducing another point of compromise, additional to password compromises, and mandated storage of your new password-equivalents on storage where before you at least had the option of memorizing them and typing them every time.
Edit: And just to say it again, I have no problem with API tokens. If someone’s got an automated workflow set up, such that they have to set up a password-equivalent on their script that accesses github, they should absolutely create a usage-restricted API token and use that instead. I’m talking more specifically about the decision to ban people from typing their passwords when they want to interact with github, pretending that somehow that makes compromising the un-usage-restricted password impossible (when it doesn’t at all), and forcing people to store auth tokens in their local storage when they’d rather type their password every time.
An API token is more secure than a password by virtue of it not needing to be typed in by a human. Phishing, writing down passwords, and the fact that API tokens can have restricted scopes all make them more secure.
Expiration on its own doesn’t make it more secure, but it can if it’s in the context of loading the token onto a system that you might lose track of/not have access to in the future.
Individual API tokens can also be revoked without revoking all of them, unlike a password where changing it means you have to re-login everywhere.
And that’s just the tip of the iceberg. Lmk if you have questions, though.
Oh, API tokens in general, I think are great. As an additional layer of security between “I need my program to be able to access this API” and “I type my password”, they are great. My issue is with the specific way that github has implemented them.
An API token is more secure than a password by virtue of it not needing to be typed in by a human.
Remind me. When I create my API token, how do I provide it to git?
Am I, more or less, forced to save my token to persistent storage in a way I wouldn’t be with a password? I realize that most people store either one in a password manager at this point. My point is, if you’re going to store your password-equivalent in a password manager, how have you achieved greater security as compared with storing a password in the same password manager? How is that not just adding another compromise vector?
Phishing
Remind me. Does making a system significantly more complex mean that phishing gets easier? Or harder?
As an example, if someone can phish my password from me to compromise my security, is that better or worse than if they can either phish my password or else compromise my tokens? I remember this compromise for example, but I can’t remember whether it involved passwords or tokens.
writing down passwords
Remind me. Help me understand. Can someone write down their github password if the API token system exists? If they have to use it sometimes to log in to the web site anyway?
and the fact that API tokens can have restricted scopes
Yes. API tokens are a good system, in general, and restricting the scope of what they can do and making them time-limited are good reasons why.
My argument is that, in general, (a) adding an additional point of access to a system without doing anything to disable the existing point of access, and (b) saving a password equivalent to someone’s system instead of having the “standard way” be for them to retype their password to authenticate each session but not have it saved anywhere, are both overall reductions in security.
I get the motivation that github sometimes protects really critical stuff, and so it needs to be more secure. I am saying that their particular implementation of API tokens led to an overall reduction in security as opposed to an increase.
Remind me. When I create my API token, how do I provide it to git?
By copy-pasting it somewhere it has access to it. It can be the config file, it has several ways to use the system’s secret storage, and you can also autotype it from your password manager every time if you want.
forced to save my token to persistent storage in a way I wouldn’t be with a password
So not really
My point is, if you’re going to store your password-equivalent in a password manager, how have you achieved greater security as compared with storing a password in the same password manager?
Passwords can be short and simple. API tokens are lengthy and random, and you can’t change that. Also, you never type in your API key, and that can help against shoulder- and camera-surfing.
without doing anything to disable the existing point of access
You can’t do that, because
- the API token is strictly for API access for outside programs
- the API token cannot be used to manage your account, like change password or emails, or to create additional tokens
API tokens are not a total replacement, just a more secure and restricted replacement for the everyday and not too risky tasks and for automated systems.
I think this comment pretty well summarizes my argument on it. The only parts not addressed:
Passwords can be short and simple. API tokens are lengthy and random, and you can’t change that.
You can, as most modern web services including github do, have a minimum length and complexity for the password. That’s a very important part of the process yes.
Plus, you seem to still not be grasping the core of my argument: github still authenticates with a password. You can still log in to the web site and change everything, if you compromise someone’s password, whether because it’s insufficiently complex or for any other reason.
Also, you never type in your API key, and that can help against shoulder- and camera-surfing.
I would like to see a quantitative comparison of how many github compromises there have been because of a stolen API token vs. compromises of some comparable service from a shoulder-surfed password.
You can, as most modern web services including github do, have a minimum length and complexity for the password. That’s a very important part of the process yes.
Sorry, I wasn’t clear. What I wanted to say is that passwords can be insecure, and in the case of lazy people that had consequences on security. I think the minimum is often not really secure, it’s just “fine if you really must” but allowed to not lose to many users.
And at the same time tokens are always secure. It’s not defined by the user, they cannot lazy it away, it’s made equally complicated for everyone. Fortunately they don’t have to type it either, it’s copy paste and done.However I have to admit that while writing this response, complexity is not really the point with github access tokens.
Plus, you seem to still not be grasping the core of my argument: github still authenticates with a password. You can still log in to the web site and change everything, if you compromise someone’s password, whether because it’s insufficiently complex or for any other reason.
That’s right, these tokens won’t protect the lazy from their account being taken over. But I think these are still more secure for their use case: storing them in mostly text files, because the programs to which you give these will probably do that, and as these are not really password-equivalent things (these have very limited access to your account), it’s less of a problem.
Your original question here was how will it be more secure that we are storing these tokens in our password managers besides our passwords. My answer is that even if you put it into your password manager, that’s not it’s final place: it will probably end up in text files and other such places, and if such a file gets into the wrong hands you’ll be in less of a trouble because of the limited permissions. If you would have stored your password there, you could be hoping that you’ll get your account back, and that the person did not do anything bad in your name.
I think much of the confusion is coming from you believing that api tokens are equivalent to passwords. That’s not the case. Even if you give all possible permissions to a token, it won’t be able to do everything that you can do with the password through the website. In short, the main point here is that you don’t have to use your password in places where that’s totally unnecessary, and fewer permissions are fine.
it will probably end up in text files and other such places, and if such a file gets into the wrong hands you’ll be in less of a trouble because of the limited permissions
I am abandoning this conversation. This is only true with API tokens. With passwords, it generally stays in the password manager. The fact that the damage from your stolen API token is then mitigated if you’ve reduced its scope still leaves you in a worse position than if it had never been stored in the text file and never been stolen in the first place. If you can’t or won’t grasp this central point (or the other I mentioned in my other message), I think we have nothing to discuss.
The fact that the damage from your stolen API token is then mitigated if you’ve reduced its scope still leaves you in a worse position than if it had never been stored in the text file and never been stolen in the first place.
First, it’s not a question if you have reduced it’s permissions. With an api token you simply can’t do a lot of things that you can with a password.
Second, you don’t use api tokens as a hobby. You use them because you want to use a tool that needs to have access to your account. Either you use an api token that has a limited set of permissions, or your password that can do anything. Independently of that, it will be stored in a plain text file, because where in the heaven would it store it so that it does not need to prompt you for it every single time? Yes, there are a dozen secret store programs that could be used instead, but a lot of programs will not have support for every one of them. I fail to see that in case how a token with fewer permissions is worse than a password with all the permissions.
Never used it in GitHub, but in GitLab it is not password equivalent, you can restrict its usage.
Hold up, are you sure you can’t view Discussions or Wiki? Which sites can you not view them?
I’m fine viewing them for public repos that I usually visit.
Asking to make sure that Github is not slowly rolling out this lockdown.
What are good alternatives to GitHub except selfhosting? I only know gitlab.com. Anything else?
Codeberg is very good, and non-profit.
Codeberg
SSH + an HTTP server can work if you are going barebones
codeberg
Sourcehut