Message ID | 20210714185121.24646-1-michael@niedermayer.cc |
---|---|
State | New |
Headers | show |
Series | [FFmpeg-devel] ffmpeg-web/robots.txt: attempt to keep spiders out of dynamically generated git content | expand |
Context | Check | Description |
---|---|---|
andriy/configure | warning | Failed to apply patch |
On Wed, Jul 14, 2021 at 08:51:21PM +0200, Michael Niedermayer wrote: > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > --- > htdocs/robots.txt | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/htdocs/robots.txt b/htdocs/robots.txt > index eb05362..4bbc395 100644 > --- a/htdocs/robots.txt > +++ b/htdocs/robots.txt > @@ -1,2 +1,13 @@ > User-agent: * > -Disallow: > +Crawl-delay: 10 > +Disallow: /gitweb/ > +Disallow: /*a=search* > +Disallow: /*/search/* > +Disallow: /*a=blobdiff* > +Disallow: /*/blobdiff/* > +Disallow: /*a=commitdiff* > +Disallow: /*/commitdiff/* > +Disallow: /*a=snapshot* > +Disallow: /*/snapshot/* > +Disallow: /*a=blame* > +Disallow: /*/blame/* This is based on https://serverfault.com/questions/506613/ideal-robots-txt-for-a-gitweb-installation i will add this link to robots.txt [...]
On 2021-07-14 14:51, Michael Niedermayer wrote: > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > --- > htdocs/robots.txt | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/htdocs/robots.txt b/htdocs/robots.txt > index eb05362..4bbc395 100644 > --- a/htdocs/robots.txt > +++ b/htdocs/robots.txt > @@ -1,2 +1,13 @@ > User-agent: * > -Disallow: > +Crawl-delay: 10 > +Disallow: /gitweb/ > +Disallow: /*a=search* > +Disallow: /*/search/* > +Disallow: /*a=blobdiff* > +Disallow: /*/blobdiff/* > +Disallow: /*a=commitdiff* > +Disallow: /*/commitdiff/* > +Disallow: /*a=snapshot* > +Disallow: /*/snapshot/* > +Disallow: /*a=blame* > +Disallow: /*/blame/* LGTM based on my own personal experiences. But the robots.txt has to be applied for git.ffmpeg.org as well, and not just ffmpeg.org. Or else they will just do the same for git.ffmpeg since there are treated separately.
On Wed, Jul 14, 2021 at 04:00:53PM -0400, ffmpegandmahanstreamer@lolcow.email wrote: > On 2021-07-14 14:51, Michael Niedermayer wrote: > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > --- > > htdocs/robots.txt | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/htdocs/robots.txt b/htdocs/robots.txt > > index eb05362..4bbc395 100644 > > --- a/htdocs/robots.txt > > +++ b/htdocs/robots.txt > > @@ -1,2 +1,13 @@ > > User-agent: * > > -Disallow: > > +Crawl-delay: 10 > > +Disallow: /gitweb/ > > +Disallow: /*a=search* > > +Disallow: /*/search/* > > +Disallow: /*a=blobdiff* > > +Disallow: /*/blobdiff/* > > +Disallow: /*a=commitdiff* > > +Disallow: /*/commitdiff/* > > +Disallow: /*a=snapshot* > > +Disallow: /*/snapshot/* > > +Disallow: /*a=blame* > > +Disallow: /*/blame/* > LGTM based on my own personal experiences. But the robots.txt has to be will apply > applied for git.ffmpeg.org as well, and not just ffmpeg.org. Or else they > will just do the same for git.ffmpeg since there are treated separately. was expecting this a bit ... i will look into that tomorrow or so unless someone else does before me thx [...]
On Wed, Jul 14, 2021 at 10:40:53PM +0200, Michael Niedermayer wrote: > On Wed, Jul 14, 2021 at 04:00:53PM -0400, ffmpegandmahanstreamer@lolcow.email wrote: > > On 2021-07-14 14:51, Michael Niedermayer wrote: > > > Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> > > > --- > > > htdocs/robots.txt | 13 ++++++++++++- > > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > > > diff --git a/htdocs/robots.txt b/htdocs/robots.txt > > > index eb05362..4bbc395 100644 > > > --- a/htdocs/robots.txt > > > +++ b/htdocs/robots.txt > > > @@ -1,2 +1,13 @@ > > > User-agent: * > > > -Disallow: > > > +Crawl-delay: 10 > > > +Disallow: /gitweb/ > > > +Disallow: /*a=search* > > > +Disallow: /*/search/* > > > +Disallow: /*a=blobdiff* > > > +Disallow: /*/blobdiff/* > > > +Disallow: /*a=commitdiff* > > > +Disallow: /*/commitdiff/* > > > +Disallow: /*a=snapshot* > > > +Disallow: /*/snapshot/* > > > +Disallow: /*a=blame* > > > +Disallow: /*/blame/* > > LGTM based on my own personal experiences. But the robots.txt has to be > > will apply > > > > applied for git.ffmpeg.org as well, and not just ffmpeg.org. Or else they > > will just do the same for git.ffmpeg since there are treated separately. > > was expecting this a bit ... > i will look into that tomorrow or so unless someone else does before me done [...]
diff --git a/htdocs/robots.txt b/htdocs/robots.txt index eb05362..4bbc395 100644 --- a/htdocs/robots.txt +++ b/htdocs/robots.txt @@ -1,2 +1,13 @@ User-agent: * -Disallow: +Crawl-delay: 10 +Disallow: /gitweb/ +Disallow: /*a=search* +Disallow: /*/search/* +Disallow: /*a=blobdiff* +Disallow: /*/blobdiff/* +Disallow: /*a=commitdiff* +Disallow: /*/commitdiff/* +Disallow: /*a=snapshot* +Disallow: /*/snapshot/* +Disallow: /*a=blame* +Disallow: /*/blame/*
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc> --- htdocs/robots.txt | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)