Autonomous System Taxonomy Repository

The AS taxonomy repository provides:

  1. A classification of ASes into one of the following classes:
    1. large ISPs
    2. small ISPs
    3. customer networks
    4. universities
    5. Internet exchange points
    6. network information centers
  2. The following set of AS attributes calculated for every AS:
    1. organization description records
    2. advertised IP prefixes
    3. inferred relationship with neighboring ASes
The classification in Part I is a result of application of machine learning techniques to the attributes in Part II. The latter are extracted from CAIDA, RouteViews, and Internet Routing Registries data. Users of the repository can view it as a source of the Internet AS-level topology enriched with information related to the Internet economy. As such, the repository aims to promote deeper analysis of the macroscopic Internet structure and to inspire more adequate Internet modeling. The repository is supplementary to the paper "Revealing the Autonomous System Taxonomy: The Machine Learning Approach".

Publication

Revealing the Autonomous System Taxonomy: The Machine Learning Approach (pdf)
Xenofontas Dimitropoulos, Dmitri Krioukov, George Riley, KC Claffy
Passive and Active Measurements Workshop (PAM), Mar. 2006.

Our work was recognized with the PAM best paper award. The received award is based on a new dataset that we release for community use in subsequent research. In this page we make available our dataset.

Data Sources and AS attributes

From this page you can download the following:

AS taxonomy and attributes

The file as2attr.tgz includes the set of AS attributes we extracted from CAIDA, RouteViews, and Internet Routing Registries data. Each line contains the following tab delimited fields: 1) AS number, 2) organization description record, 3) number of inferred providers, 4) number of inferred peers, 5) number of inferred customers, 6) equivalent number of /24 prefixes covering all the advertised IP space, 7) number of advertised IP prefixes, and 8) inferred AS class. The classes are encoded with the following acronyms: "t1" for large ISPs, "t2" for small ISPs, "edu" for Universities, "ix" for IXPs, "nic" for NICs, "comp" for Customers and "abstained" for ASes for which the algorithm did not make a prediction.

AS relationships

The file as_rel.tgz includes the AS graph annotated with inferred AS relationships. Our inference is based on heuristics we developed in our previous work. In particular, customer-to-provider relationships are inferred using the methodology of the paper Inferring AS Relationships: Dead End or Lively Beginning?, while peer-to-peer links are inferred using the methodology of the paper "AS Relationships: Inferance and Validation", which is currently under submission (we hope to post a link here soon). Each line in as_rel.txt is a triplet: A B C, where A B reflects an AS link and C the AS relationship: if (C==0) A B is a p2p link; if (C=-1) A is a customer of B; and if (C==1) A is a provider of B. Each AS link is listed twice as A B and B A. Note that few of the AS numbers listed in as_rel.txt are missing from as2attr.txt, since in the latter we include only the AS numbers for which all six attributes were available.

Contact Information

Xenofontas Dimitropoulos (fontas [you know what] ece.gatech.edu)
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332-0280
Last Modified: Oct. 10, 2005

Valid XHTML 1.0! Valid CSS!

free counter with statistics