Question about robots.txt

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Question about robots.txt

kseifried@redhat.com
http://cve.mitre.org/robots.txt

User-agent: *
Disallow: /cgi-bin/

This means all the CVE's in the database, e.g.


are not really searchable/indexed via any search engine, is this intentional?

--
Kurt Seifried -- Red Hat -- Product Security -- Cloud
PGP A90B F995 7350 148F 66BF 7554 160D 4553 5E26 7993
Red Hat Product Security contact: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Question about robots.txt

Coffin, Chris

Hi Kurt,

 

We made the choice a long time ago to not allow indexing of the cve.mitre.org web site. At least part of that decision was simply resource constraints – when CVE was in its toddler years, search engine indexers were very resource intensive.

 

We are currently re-examining this policy and will keep the Board posted.

 

Chris Coffin

The CVE Team

The MITRE Corporation

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Kurt Seifried
Sent: Wednesday, December 02, 2015 11:32 AM
To: cve-editorial-board-list <[hidden email]>
Subject: Question about robots.txt

 

http://cve.mitre.org/robots.txt

 

User-agent: *

Disallow: /cgi-bin/

 

This means all the CVE's in the database, e.g.

 

 

are not really searchable/indexed via any search engine, is this intentional?

 

--
Kurt Seifried -- Red Hat -- Product Security -- Cloud
PGP A90B F995 7350 148F 66BF 7554 160D 4553 5E26 7993
Red Hat Product Security contact: 
[hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question about robots.txt

kseifried@redhat.com
On Tue, Dec 8, 2015 at 1:42 PM, Coffin, Chris <[hidden email]> wrote:

Hi Kurt,

 

We made the choice a long time ago to not allow indexing of the cve.mitre.org web site. At least part of that decision was simply resource constraints – when CVE was in its toddler years, search engine indexers were very resource intensive.

 

We are currently re-examining this policy and will keep the Board posted.


Is there an ETA on how long this will take roughly? Days/weeks/longer?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Question about robots.txt

Coffin, Chris

Kurt,

 

We do not have an ETA at this time.

 

Chris Coffin

The CVE Team

The MITRE Corporation

 

From: Kurt Seifried [mailto:[hidden email]]
Sent: Tuesday, December 08, 2015 2:50 PM
To: Coffin, Chris <[hidden email]>
Cc: cve-editorial-board-list <[hidden email]>
Subject: Re: Question about robots.txt

 

On Tue, Dec 8, 2015 at 1:42 PM, Coffin, Chris <[hidden email]> wrote:

Hi Kurt,

 

We made the choice a long time ago to not allow indexing of the cve.mitre.org web site. At least part of that decision was simply resource constraints – when CVE was in its toddler years, search engine indexers were very resource intensive.

 

We are currently re-examining this policy and will keep the Board posted.

 

Is there an ETA on how long this will take roughly? Days/weeks/longer?

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Question about robots.txt

jericho
In reply to this post by Coffin, Chris
On Tue, 8 Dec 2015, Coffin, Chris wrote:

: We made the choice a long time ago to not allow indexing of the
: cve.mitre.org web site. At least part of that decision was simply
: resource constraints ? when CVE was in its toddler years, search engine
: indexers were very resource intensive.

That 'decision' was based on crap excuses, even back then. =) As someone
who ran two sites over the time MITRE ran CVE, and intensively watched
logs on one of them (attrition.org, since 1998-10-07), search engines were
NOT resource intensive back then. Attrition staff talked about that issue
and didn't block any of our content in robots.txt because search engine
spam was present, but not heavy. For those interested in Internet
history...

forced ~$ more /home/admin/util/list.filter
72.14.203.104
forced.attrition.org
images.search.yahoo.com
casualgamer.org
myspace.com
stumbleupon.com
f-mai.gif
f-bak.gif
f-att.gif
thefiles.gif
panopta.com
divinelanguage.com
forced ~$ grep -i google /home/admin/util/list.*
/home/admin/util/list.bot:googlebot.com
/home/admin/util/list.bot:Feedfetcher-Google
/home/admin/util/list.filter-old:google.com
/home/admin/util/list.filter-old:google.co.jp/search
/home/admin/util/list.filter-old:google.de
/home/admin/util/list.filter-old:google.fr
/home/admin/util/list.filter-old:google.co.uk
forced ~$

"list.filter-old" is from 2003-08-25. The limited set of Google domains
should be very telling, given the year and traffic generated.

We actually *stopped* filtering Google at some point, while ignoring Yahoo
early on. Why? Because they were simply not hammering sites and causing
any undue burden, to a random desktop machine bought at the local computer
store. Those were "ignore displaying those entries in our log parser", not
"block them from reaching our web server" via iptables.

That was Attrition when it was run on a ~ $500 box bought in 1998 and
hosted on a consumer link, compared to MITRE's resources and CVE contract
money from the government at the time. So to be clear, MITRE's answer in
2015, is based on people forgetting what it was like in 1997 - 1999.

That said, after Kurt's mail in December of 2015... in the last ~ 30 - 60
days, I noticed that MITRE finally changed that. Google is now indexing
and caching the CVE pages.

Thank you, as a long-time taxpayer funding MITRE's projects, including
CVE, to the tune of $1,487,334,000 in MITRE income last year. Good to see
you making these small changes to help the industry.

: We are currently re-examining this policy and will keep the Board
: posted.

Except... you didn't. Just like you didn't ask us about the 3k+ RESERVED
fiasco that got several of us talking about this morning, figuring out how
we'd handle it. When NVD spoke up, we all collectively said "hell yeah!"

The fact that NVD called you out, and has since said they will be
'ignoring' those IDs, is also very significant in CVE history. This is the
first *real* break that NVD has had from CVE ever. There have been other
breaks the last year+, but they were more pedantic and favored NVD over
MITRE/CVE, based on the time of entries becoming public (e.g. NVD
published before MITRE did).

Brian
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Question about robots.txt

Coffin, Chris
Brian,

> That said, after Kurt's mail in December of 2015... in the last ~ 30 - 60 days, I noticed that MITRE finally changed that. Google is now indexing and caching the CVE pages.

We made the change to allow indexing back in Feb of 2016, which was a few months after Kurt had pointed out the issue. We apologize to all for not replying to the original thread at that time. Dan also mentioned the same in a response to you back in April of this year (http://common-vulnerabilities-and-exposures-cve-board.1128451.n5.nabble.com/Re-CVENEW-New-CVE-CANs-2017-04-23-19-00-count-1-td722.html#a727).

> Just like you didn't ask us about the 3k+ RESERVED fiasco that got several of us talking about this morning, figuring out how we'd handle it. When NVD spoke up, we all collectively said "hell yeah!"
>
> The fact that NVD called you out, and has since said they will be 'ignoring' those IDs, is also very significant in CVE history. This is the first *real* break that NVD has had from CVE ever. There have been other breaks the last year+, but they were more pedantic and favored NVD > over MITRE/CVE, based on the time of entries becoming public (e.g. NVD published before MITRE did).

We are not absolutely certain what concern you have in the case of the RESERVED CVE IDs moving to REJECT status. Please let us know if the following explanation does not clear up your concerns.

We have had multiple conversations during Board conference calls regarding the fact that there are many RESERVED CVE IDs within the current CVE list, and there was a general consensus that they should be cleaned up (i.e., REJECT or populate). As you are probably aware, there are multiple reasons that a CVE ID might be stuck in a RESERVED status. One of those reasons could be that the CNA obtained a block of CVE IDs, but never actually assigned some of those IDs to vulnerabilities.

As a first step in tackling the larger cleanup effort, we began contacting CNAs in March of this year to determine what CVE IDs they had not used from their previously assigned CVE ID blocks. All but a couple of CNAs responded and pointed out which CVE IDs were not used. In every case, the CVE ID in question moved from a status of RESERVED to a status of REJECT. The CVE IDs in question were moved to REJECT status earlier today.

You are correct and Dave at NIST had sent a message in regards to this first step and he was not clear on exactly what the end result would be. Dave and I spoke on the phone, we cleared up the gaps in understanding, and even decided to hold off for a day to give the NIST NVD folks a bit more time to analyze the impact.

Dave can correct me if I'm wrong, but we didn't interpret the comment "ignored by the NVD" to mean that the NVD team would not publish the REJECT CVE entries. Our interpretation is that the NVD team does not see a need to analyze the entries and will simply publish them as is, with no significant effort on their part.

Regards,

Chris Coffin
The CVE Team

-----Original Message-----
From: jericho [mailto:[hidden email]]
Sent: Thursday, May 11, 2017 12:32 AM
To: Coffin, Chris <[hidden email]>
Cc: Kurt Seifried <[hidden email]>; cve-editorial-board-list <[hidden email]>
Subject: RE: Question about robots.txt
Importance: High


On Tue, 8 Dec 2015, Coffin, Chris wrote:

: We made the choice a long time ago to not allow indexing of the
: cve.mitre.org web site. At least part of that decision was simply
: resource constraints ? when CVE was in its toddler years, search engine
: indexers were very resource intensive.

That 'decision' was based on crap excuses, even back then. =) As someone who ran two sites over the time MITRE ran CVE, and intensively watched logs on one of them (attrition.org, since 1998-10-07), search engines were NOT resource intensive back then. Attrition staff talked about that issue and didn't block any of our content in robots.txt because search engine spam was present, but not heavy. For those interested in Internet history...

forced ~$ more /home/admin/util/list.filter
72.14.203.104
forced.attrition.org
images.search.yahoo.com
casualgamer.org
myspace.com
stumbleupon.com
f-mai.gif
f-bak.gif
f-att.gif
thefiles.gif
panopta.com
divinelanguage.com
forced ~$ grep -i google /home/admin/util/list.* /home/admin/util/list.bot:googlebot.com
/home/admin/util/list.bot:Feedfetcher-Google
/home/admin/util/list.filter-old:google.com
/home/admin/util/list.filter-old:google.co.jp/search
/home/admin/util/list.filter-old:google.de
/home/admin/util/list.filter-old:google.fr
/home/admin/util/list.filter-old:google.co.uk
forced ~$

"list.filter-old" is from 2003-08-25. The limited set of Google domains should be very telling, given the year and traffic generated.

We actually *stopped* filtering Google at some point, while ignoring Yahoo early on. Why? Because they were simply not hammering sites and causing any undue burden, to a random desktop machine bought at the local computer store. Those were "ignore displaying those entries in our log parser", not "block them from reaching our web server" via iptables.

That was Attrition when it was run on a ~ $500 box bought in 1998 and hosted on a consumer link, compared to MITRE's resources and CVE contract money from the government at the time. So to be clear, MITRE's answer in 2015, is based on people forgetting what it was like in 1997 - 1999.

That said, after Kurt's mail in December of 2015... in the last ~ 30 - 60 days, I noticed that MITRE finally changed that. Google is now indexing and caching the CVE pages.

Thank you, as a long-time taxpayer funding MITRE's projects, including CVE, to the tune of $1,487,334,000 in MITRE income last year. Good to see you making these small changes to help the industry.

: We are currently re-examining this policy and will keep the Board
: posted.

Except... you didn't. Just like you didn't ask us about the 3k+ RESERVED fiasco that got several of us talking about this morning, figuring out how we'd handle it. When NVD spoke up, we all collectively said "hell yeah!"

The fact that NVD called you out, and has since said they will be 'ignoring' those IDs, is also very significant in CVE history. This is the first *real* break that NVD has had from CVE ever. There have been other breaks the last year+, but they were more pedantic and favored NVD over MITRE/CVE, based on the time of entries becoming public (e.g. NVD published before MITRE did).

Brian
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Question about robots.txt

jericho
On Thu, 11 May 2017, Coffin, Chris wrote:

: > That said, after Kurt's mail in December of 2015... in the last ~ 30 -
: 60 days, I noticed that MITRE finally changed that. Google is now
: indexing and caching the CVE pages.
:
: We made the change to allow indexing back in Feb of 2016, which was a
: few months after Kurt had pointed out the issue. We apologize to all for

Something I cannot prove, because I don't screenshot my daily "missing
CVE" searches as far as Google results go. But I would swear this is not
the case. We'll have to agree you have your official statement, and I have
my 'anecdotal' evidence as someone who searches on new/missing CVE IDs
every day.

: not replying to the original thread at that time. Dan also mentioned the
: same in a response to you back in April of this year
: (http://common-vulnerabilities-and-exposures-cve-board.1128451.n5.nabble.com/Re-CVENEW-New-CVE-CANs-2017-04-23-19-00-count-1-td722.html#a727).

You see, I don't have to read that. Kurt mailed in Dec 2015, you say Dan
replied to me in April 2017. Use your numbers. My point stands about
MITRE's promise of following up.

: > Just like you didn't ask us about the 3k+ RESERVED fiasco that got several of us talking about this morning, figuring out how we'd handle it. When NVD spoke up, we all collectively said "hell yeah!"
: >
: > The fact that NVD called you out, and has since said they will be 'ignoring' those IDs, is also very significant in CVE history. This is the first *real* break that NVD has had from CVE ever. There have been other breaks the last year+, but they were more pedantic and favored NVD > over MITRE/CVE, based on the time of entries becoming public (e.g. NVD published before MITRE did).
:
: We are not absolutely certain what concern you have in the case of the
: RESERVED CVE IDs moving to REJECT status. Please let us know if the
: following explanation does not clear up your concerns.

If you are not "absolutely certain" of anything in this thread, after
NIST's response, after my previous mails, and after "VDB 101" levels of
understanding of our profession, let's just drop it. I stopped engaging
with vulnerability tourists years ago for the most parts. The few times I
do are on this list or in blogs. This reply makes it clear I need to treat
MITRE as the 'blog' kind.

: We have had multiple conversations during Board conference calls

See prior mails. Until you show me a) a majority of the board was on call
and b) the entire transcript of the call was made available to the board,
this is exclusionary. No middle ground there.

: regarding the fact that there are many RESERVED CVE IDs within the
: current CVE list, and there was a general consensus that they should be

So when I have 5 or 6 board members in chat that say "MITRE did wrong", we
can also consider that a general consensus?

: cleaned up (i.e., REJECT or populate). As you are probably aware, there
: are multiple reasons that a CVE ID might be stuck in a RESERVED status.

Quit patronizing me you ass. After all of the emails I send to MITRE
calling out your bad assignments, duplicates, etc? You really think I am
"probably" aware of RESERVED status? What, did you miss the prior public
work where I called that out many times? Did you miss that being a
cornerstone of some commercial VDBs offerings? Did you not see the
T-shirt? (seriously)

: As a first step in tackling the larger cleanup effort, we began
: contacting CNAs in March of this year to determine what CVE IDs they had
: not used from their previously assigned CVE ID blocks. All but a couple

Did you CC the CNA list? If not, why not? I have a pretty solid case
history of bringing CNA issues to you directly. It is clear that some of
us have a vested interest in this and were proactive in coming to you with
issues. Did you forget to include those same people in said discussions,
publicly or privately?

: of CNAs responded and pointed out which CVE IDs were not used. In every
: case, the CVE ID in question moved from a status of RESERVED to a status
: of REJECT. The CVE IDs in question were moved to REJECT status earlier
: today.

Derp, yes. You made that very clear. Half a dozen of us privately said
"what the...", and NIST spoke up on list *quickly*. As they should have,
and I am happy they did, since it saved me one more email.

The patronizing tone of this email is somewhere between enraging and
laughable.

: You are correct and Dave at NIST had sent a message in regards to this

You think?! It was on list, I was citing public record. You don't have to
tell me I am correct.

: first step and he was not clear on exactly what the end result would be.

If you couldn't read between the lines of his mail... again, MITRE isn't
qualified to run CVE. You are clearly too far removed from your
"stakeholders".

: Dave and I spoke on the phone, we cleared up the gaps in understanding,
: and even decided to hold off for a day to give the NIST NVD folks a bit
: more time to analyze the impact.

We saw the email about the one day push. And... can we go back to my mail?
I really don't know how to say this any more simply, I thought the
original mail was clear.

- The Board got ONE DAY warning.
- NIST spoke up and said "whoa wait".
- We now see you had a phone call on the back of the NIST mail
- You pushed the 3k release by ONE day
- You told the public via a CVE mail that few in our industry read
- I said that wasn't sufficient for public warning

Then you send a patronizing mail "innocently" (ignorantly) questioning me
on all of this. Not sure where this attempt at gaslighting is coming from,
other than you forget who the board is. The concern and questions are
legitimate, speak directly to "stakeholders", and are of critical
interest/impact to the CVE offering as affects the industry.

: Dave can correct me if I'm wrong, but we didn't interpret the comment
: "ignored by the NVD" to mean that the NVD team would not publish the
: REJECT CVE entries. Our interpretation is that the NVD team does not see
: a need to analyze the entries and will simply publish them as is, with
: no significant effort on their part.

Seriously? This is the biggest argument to stop these back-alley phone
conversations and to keep things on list, where we see a record of what
was said. This is how NIST replied to the board, in all the glory:

   We have been able to confirm that the rejected CVEs will be ignored by
   the NVD. Thanks for being flexible by pushing this back a day.

You did not "interpret" the comment "ingored by the NVD" to mean they
would not publish the REJECT CVE entries?

Well guess what. Several of us explicitly read that statement to mean they
would ignore them... completely. As in, "don't exist, at all".

As in, other solutions are now involving Dev to figure out how to handle
3k+ new entries, on top of many hundreds of existing, to deliver to their
customers. These are customers who turned their back on CVE, but still
have an "irrational compliance requirement" (a common term from customers)
to ensure that they can explain EVERY single CVE ID that comes up. So
mature VDBs have to handle these REJECTSs, pass it on to clients in a
format they can easily process, and in turn offer to auditors.

By the way, my continued use of "stakeholders" in parens? This is it.
MITRE doesn't have the first clue what a stakeholder is, other than the
very first tier they push the data to. It's 2017, this isn't your father's
/ Christey's CVE. It hasn't been for a long time.

.b
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question about robots.txt

kseifried@redhat.com
On 13/05/17 01:55 AM, jericho wrote:

>
> On Thu, 11 May 2017, Coffin, Chris wrote:
>
> : > That said, after Kurt's mail in December of 2015... in the last ~ 30 -
> : 60 days, I noticed that MITRE finally changed that. Google is now
> : indexing and caching the CVE pages.
> :
> : We made the change to allow indexing back in Feb of 2016, which was a
> : few months after Kurt had pointed out the issue. We apologize to all for
>
> Something I cannot prove, because I don't screenshot my daily "missing
> CVE" searches as far as Google results go. But I would swear this is not
> the case. We'll have to agree you have your official statement, and I have
> my 'anecdotal' evidence as someone who searches on new/missing CVE IDs
> every day.
>

Nope, archive.org has their robots.txt going back to 2001 with pretty
much daily records:

https://web.archive.org/web/*/cve.mitre.org/robots.txt

> : not replying to the original thread at that time. Dan also mentioned the
> : same in a response to you back in April of this year
> : (http://common-vulnerabilities-and-exposures-cve-board.1128451.n5.nabble.com/Re-CVENEW-New-CVE-CANs-2017-04-23-19-00-count-1-td722.html#a727).
>
> You see, I don't have to read that. Kurt mailed in Dec 2015, you say Dan
> replied to me in April 2017. Use your numbers. My point stands about
> MITRE's promise of following up.

You can check archive.org to see the exact day it changed.



--

Kurt Seifried -- Red Hat -- Product Security -- Cloud
PGP A90B F995 7350 148F 66BF 7554 160D 4553 5E26 7993
Red Hat Product Security contact: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Question about robots.txt

Waltermire, David A.
In reply to this post by jericho
Some comments below.

> : We have had multiple conversations during Board conference calls
>
> See prior mails. Until you show me a) a majority of the board was on call and
> b) the entire transcript of the call was made available to the board, this is
> exclusionary. No middle ground there.

This has been brought up a few different times by various board members. The easiest way to address this is to post upcoming plans for CVE program changes to the board list to allow for feedback by ALL board members before the changes are made. It would also be good in the future to give a week or two notice, since some board members may be on vacation or otherwise occupied. In this way, the list is the actual place for discourse. The board calls then become supplemental for deeper conversation if individuals want it.

>
> : regarding the fact that there are many RESERVED CVE IDs within the
> : current CVE list, and there was a general consensus that they should be
>
> So when I have 5 or 6 board members in chat that say "MITRE did wrong", we
> can also consider that a general consensus?

A second benefit of using email lists for feedback is that consensus, the lack of sustained objection, is easily discernable by all involved.

> : As a first step in tackling the larger cleanup effort, we began
> : contacting CNAs in March of this year to determine what CVE IDs they had
> : not used from their previously assigned CVE ID blocks. All but a couple
>
> Did you CC the CNA list? If not, why not? I have a pretty solid case history of
> bringing CNA issues to you directly. It is clear that some of us have a vested
> interest in this and were proactive in coming to you with issues. Did you
> forget to include those same people in said discussions, publicly or privately?

CCing the CNA list would be a good thing to do here.

>
> : first step and he was not clear on exactly what the end result would be.

I wasn't clear. This highlights the need for more transparency and discussion of these things on the board list giving plenty of time to comment.

> We saw the email about the one day push. And... can we go back to my mail?
> I really don't know how to say this any more simply, I thought the original
> mail was clear.
>
> - The Board got ONE DAY warning.
> - NIST spoke up and said "whoa wait".
> - We now see you had a phone call on the back of the NIST mail
> - You pushed the 3k release by ONE day
> - You told the public via a CVE mail that few in our industry read
> - I said that wasn't sufficient for public warning

See previous comments.

>
> Then you send a patronizing mail "innocently" (ignorantly) questioning me on
> all of this. Not sure where this attempt at gaslighting is coming from, other
> than you forget who the board is. The concern and questions are legitimate,
> speak directly to "stakeholders", and are of critical interest/impact to the CVE
> offering as affects the industry.
>
> : Dave can correct me if I'm wrong, but we didn't interpret the comment
> : "ignored by the NVD" to mean that the NVD team would not publish the
> : REJECT CVE entries. Our interpretation is that the NVD team does not see
> : a need to analyze the entries and will simply publish them as is, with
> : no significant effort on their part.

Any CVE entries that are rejected are not analyzed. The entries do appear in our feeds.

>
> Seriously? This is the biggest argument to stop these back-alley phone
> conversations and to keep things on list, where we see a record of what was
> said. This is how NIST replied to the board, in all the glory:
>
>    We have been able to confirm that the rejected CVEs will be ignored by
>    the NVD. Thanks for being flexible by pushing this back a day.

I regret not being more clear and specific in my email. Allowing more time to discuss these types of issues will allow for more robust dialog, which is needed in these cases.

>
> You did not "interpret" the comment "ingored by the NVD" to mean they
> would not publish the REJECT CVE entries?
>
> Well guess what. Several of us explicitly read that statement to mean they
> would ignore them... completely. As in, "don't exist, at all".
>
> As in, other solutions are now involving Dev to figure out how to handle
> 3k+ new entries, on top of many hundreds of existing, to deliver to
> 3k+ their
> customers. These are customers who turned their back on CVE, but still have
> an "irrational compliance requirement" (a common term from customers) to
> ensure that they can explain EVERY single CVE ID that comes up. So mature
> VDBs have to handle these REJECTSs, pass it on to clients in a format they can
> easily process, and in turn offer to auditors.

When making changes like the one being discussed, there is potential impact to the larger ecosystem of consumers. This impact is probably the most important reason why these issues need to be discussed with the board.

Regards,
Dave
Loading...