The OpenNET Project / Index page

[ новости/++ | форум | wiki | теги ]

Поиск:  Каталог документации | Операционные системы

Comp.os.research: Frequently answered questions [2/3: l/m 13 Aug 1996]

frequent topics of discussion on the operating systems research group
Archive-name: os-research/part2
Version: $Revision: 1.22 $
Posting-Frequency: monthly
Last-Modified: Tue Aug 13 21:03:28 1996

		Answers to frequently asked questions
		  for comp.os.research: part 2 of 3

		       Copyright (C) 1993--1996
			   Bryan O'Sullivan


1.     Available software
1.1.   Where can I find Unix process checkpointing and restoration packages?
1.2.   What threads packages are available for me to use?
1.3.   Can I use distributed shared memory on my Unix system?
1.4.   Where can I find operating systems distributions?
1.4.1. Distributed systems and microkernels
1.4.2. Unix lookalikes
1.4.3. Others

2.     Performance and workload studies
2.1.   TCP internetwork traffic characteristics
2.2.   File system traces
2.3.   Modern Unix file and block sizes
2.3.1. File sizes
2.3.2. Block sizes
2.3.3. Inode ratios

3.     Papers, reports, and bibliographies
3.1.   From where are papers for distributed systems available?
3.2.   Where can I find other papers?
3.3.   Where can I find bibliographies?

4.     General Internet-accessible resources
4.1.   Wide Area Information Service (WAIS) and World-Wide Web (WWW) servers
4.2.   Refdbms---a distributed bibliographic database system
4.3.   Willow -- the information looker-upper
4.4.   Computer science bibliographies and technical reports
4.5.   The comp.os.research archive
4.6.   Miscellaneous resources

5.     Disclaimer and copyright

Subject: [1] Available software
From: Available software

This section covers various software packages, operating systems
distributions, and miscellaneous other such items which may be of
interest to the operating systems research community.  If you have
written, or know of, some software which you believe would be of
fairly wide interest, please get in touch with the FAQ maintainer with
a view to having a short spiel and availability information included

Subject: [1.1] Where can I find Unix process checkpointing and restoration packages?
From: Available software

- [93-01-21-10-18.30] The Condor system is available via anonymous ftp
  from <URL:>.  Condor works entirely at user
  level [no kernel modifications required] but doesn't currently
  support interprocess communication, signals, or fork().  Definitely
  worth a look.

- Bennet S Yee implemented a `mostly portable' checkpoint and restore
  package back around 1987.  When the programmer invokes the
  checkpoint procedure, it saves the state to a file; when a second
  process with the same program (but with different arguments) is
  started which calls the restore procedure, it reads the old state
  from the file.  Available via anonymous ftp from
  This package is known to work for Pmaxen, Sun4's, Sun3's, IBM RTs,
  and VAXen.  Porting it to a new architecture should be relatively
  simple -- look at the README file.

Subject: [1.2] What threads packages are available for me to use?
From: Available software

Now that POSIX has arrived at a standard threads interface, it is
expected that all major Unix vendors will soon release conformant
threads packages.  Currently, vendor-supplied threads packages vary
widely in the interfaces they provide.  Some vendors' packages conform
to various drafts of the POSIX standard, while others provide their
own interfaces.

OS/2, Windows NT and Windows 95 all provide threads interfaces.  None
conforms to the POSIX standard, and neither IBM nor Microsoft has
signalled any intention to provide conformant threads interfaces.

- Michael T. Peterson <> has written a POSIX and DCE
  threads package, called PCthreads, for Intel-based Linux systems.
  See <URL:> for more information.

- Christopher Provenzano <> has written a portable
  implementation of draft 8 of the IEEE Pthreads standard.  See
  <URL:> for further
  details, or fetch the software itself from
  <URL:>.  Currently supported are
  i386/i486/Pentium processors running NetBSD 1.0, FreeBSD 1.1, Linux
  1.0, and BSDi 1.1; DECstations running Ultrix-4.2; SPARCstations
  running SunOS 4.1.3; and HP/PA machines running HP/UX-9.03.

  As far as I can see, development of this library has halted (at
  least temporarily), and it still contains many serious bugs.

- Georgia Tech's OS group has a fairly portable user-level threads
  implementation of the Mach Cthreads package.  It is called Cthreads,
  and can be found at
  It also contains the Falcon integrated monitoring system.

  It currently runs under SunOS 4.1.X, Irix 4.0.5, Irix 5.3, AIX
  3.2.5, Linux 1.0 and higher, and KSR1 and KSR2.  It is a fairly easy
  to port to other architectures.  Current ports in progress are
  Solaris 2.4 and AIX 4.X.

- The POSIX / Ada-Runtime Project (PART) has made available an
  implementation of draft 6 of the POSIX 1003.4a Pthreads
  specification, which runs under SunOS 4.x; the current release is
  version 1.20.  Available using anonymous ftp from

- Elan Feingold has written a threads package called ethreads; I don't
  know anything about it, other than that it is available from

- Stephen Crane has written a `fairly portable' threads package, which
  runs under Sun 3, Sun 4, MIPS/RISCos, Linux, and 386BSD.  It is
  available via anonymous ftp from
  <URL:>, with documentation in
  the same directory named

- QuickThreads is a toolkit for building threads packages, written by
  David Keppel.  It is available via anonymous ftp from
  <URL:>, with an
  accompanying tech report at
  The code as distributed includes ports for the Alpha, x86, 88000,

- On CONVEX SPP Exemplar machines there is a Compiler Parallel Support
  Library (CPSlib), a library of thread management and synchronisation
  routines.  CPSlib is not compatible with anything else, but the
  interface is sufficiently similar to the Solaris threads or pthreads
  interface to allow straight porting.  One special feature of CPSlib
  is the (possible) distiction between "symmetric" and "asymmetric"

A small number of vendors provide DCE threads packages for various
Unix systems.

Subject: [1.3] Can I use distributed shared memory on my Unix system?
From: Available software

- CRL is a simple all-software distributed shared memory system
  intended for use on message-passing multicomputers and distributed
  systems.  CRL 1.0 can be compiled for use on the MIT Alewife
  Machine, Thinking Machine's CM-5, and networks of Sun workstations
  running SunOS 4.1.3 communicating with one another using TCP and
  PVM.  Because CRL requires no functionality from the underlying
  hardware, compiler, or operating system beyond that necessary to
  send and receive messages, porting CRL to other platforms should
  prove to be straightforward.

  General information about CRL can be found at
  <URL:>.  The CRL 1.0 source
  distribution (sources for CRL 1.0 and several applications, user
  documentation, and a postscript version of a paper about CRL to
  appear in this SOSP later this year) is available at

- Ron Minnich <> has implemented a
  distributed shared memory system called MNFS, which is a modified
  version of NFS and runs alongside NFS in the kernel.

  Performance is good; page faults under FreeBSD 2.0R run at about the
  same speed as NFS (~5.9 milliseconds per page).  If you need to
  update a page from one host to many clients, it can be done at a
  cost of 1.2 milliseconds or so per client.  This scales: networks of
  128 nodes running MNFS have been set up, and times should improve
  over faster LANs than Ethernet.

  The MNFS programming model uses mmap'ed files.  Programs map files
  in and then use them as ordinary memory.  Cache consistency of a
  page is maintained by the MNFS servers, ensuring that there is only
  one writeable copy in the network at a time.  The model is not
  strongly coherent; read-only copies of a page are only refreshed by
  an explicit action on the part of the holder of a writeable page
  (using msync).  For those who don't like this style of programming,
  a parallel C compiler has been retargeted to use MNFS on clusters
  and networks of computers running Condor.  Both performance and
  scalability matched explicitly mmap-coded systems.

  The system has been implemented on Sunos 4.1.x, Solaris 2.2 and 2.3,
  IRIX 5.2 and 5.3, and AIX 3.2.  All of these were legally
  encumbered, so the FreeBSD version is currently the only
  freely-available implementation.

  MNFS is available from <>, and may be
  installed either as a set of diffs to the FreeBSD 2.0.5R kernel, or
  installed in-place.  Also included in this directory is a slightly
  out-of-date paper on MNFS, and a more current manual.

  A Linux port of MNFS is in the works.

Subject: [1.4] Where can I find operating systems distributions?
From: Available software

This section covers the availability of several well-known systems;
the only criterion for inclusion of a system here is that it be of
interest to some segment of the OS research community (commercial
systems will be accepted for inclusion, so long as they are pertinent
to research).

Subject: [1.4.1] Distributed systems and microkernels
From: Available software

See part one of the FAQ for further information on some of the systems
listed below.

- [93-03-31-22-49.53] ACE is the distribution, support and sales
  channel for Amoeba.  `Due to overwhelming response from non-profit
  organisations wishing to obtain Amoeba for their research
  activities', VU is offering Amoeba 5.2 to research institutions for
  more or less free (via ftp at no charge, or on tape for $500 on
  Exabyte or $800 on QIC-24).  Amoeba currently supports 68020 and
  68030-based VME board machines, as well at i386- and i486-based AT
  PCs and Sun 3 and 4 machines.

  For further information on `commercial' Amoeba, you can contact ACE
  by email at <>, by phone at +31 20 664 6416, or by
  fax at +31 20 675 0389.  Universities interested in obtaining a
  license should send mail to <>, or fax to
  +31 20 642 7705.

- Chorus Systemes has special programmes for universities interested
  in using Chorus.  For more information on the offerings available,
  conditions, and other details, get the following files:
  - <URL:>
  - <URL:>
  - <URL:>

- The Cronus object-oriented distributed system may be obtained via
  ftp from <URL:>; email
  <> for details of the account name and
  password.  Before attempting to get the Cronus distribution, you
  must obtain, via anonymous ftp,
  <URL:>.  Maintenance,
  hotline support, and training for Cronus are available from BBN.
  Send email to the above address for information on these, or on
  obtaining a commercial license.

- Flux is a Mach-based toolkit for developing operating systems; you
  can find more information about it on the Web at

- Horus is available for research use; contact Ken Birman
  <> or Robbert van Renesse
  <> for details.

- Isis has not been publicly available since 1989, but may (I'm not
  sure) still be obtained using anonymous ftp from
  <URL:> or <URL:>.  After 1989,
  the code was picked up by Isis Distributed Systems, which has
  subsequently developed and supported it.  The commercial version of
  Isis (available `at very low cost' to academic institutions) is
  available from the company.  Email <> for
  information, or call +1-212-979-7729 or +1-607-272-6327.

- Information on obtaining the latest Mach 4 distribution is available
  from the University of Utah's Mach 4 pages, at

- The Plan 9 distribution is now commercially available for $350; it
  consists of a two-volume manual, a CD-ROM with all the sources, and
  four PC diskettes comprising a binary-only installation of a fairly
  complete version of the system that runs on a PC.  For more
  information, <URL:>; this site
  houses ordering information, a browsable copy of all the
  documentation, and the PC binary distribution.

  Kernels exist for the Sun SLC, Sun4Cs of various types,
  NeXTstations, MIPS Magnum 3000, SGI 4D series, AT&T Safari, `a whole
  bunch of' PCs, and the Gnot.

  Sydney University Basser Department of Computer Science has a port
  of Plan 9 underway to the DEC Alpha at the moment.  A port to the
  Sun 3 has been completed.  Contact <> for

  The Plan 9 user mailing list may be subscribed to by sending mail to

- QNX is available for academic applications through an education
  support programme run by QNX Software Systems, whereby QNX systems
  can be obtained for educational purposes at very low cost.  For
  commercial and education availability and pricing, contact:
	QNX Software Systems		QNX Software Systems
	175 Terrence Matthews Cr.	Westendstr. 19
	Kanata, Ontario K2M 1W8		6000 Frankfurt am Main 1
	Canada				Germany

	1 800 363 9001			+49 69 9754 6156 x299
	+1 (613) 591 0931
	+1 (613) 591 3579 (fax)		+49 69 9754 6110 (fax)
  Versions after 4.2 of QNX run on the i386 and later processors, with
  a 16-bit kernel included for i286 machines.  Native optimisations
  and a compiler for the Pentium are also included.  Further marketing
  information can be obtained on the World Wide Web from

- The 1.1 Research Distribution of the Spring distributed object
  oriented operating system is available.  Spring is a highly modular,
  object-oriented operating system, which is focused around a uniform
  interface definition language (IDL).  The system is intrinsically
  distributed, with all system interfaces being accessible both
  locally and remotely.

  The 1.1 Research Distribution adds a number of fixes and
  improvements, including a Spring-Java IDL system that facilitates
  writing Java applets that can talk across Spring IDL interfaces.

  The Spring SRD 1.1 Binary CDROM is $75 to Universities and $750 to
  commercial research institutions.  This includes all of the software
  and documentation necessary for installing, running, and developing
  new system modules and applications in Spring.  All binaries, IDL
  files, development tools, key exemplary sources, and course teaching
  materials are included.  A standard full source license and source
  CDROM is also available for $100 to Universities and $1000 to
  commercial research institutions.

  For more details and ordering information, see

- [93-02-07-16-03.48] The Sprite Network Operating System is available
  on CD-ROM.  The disc contains the source code and documentation for
  Sprite, a research operating system developed at the University of
  California, Berkeley.  All the research papers from the Sprite
  project are also included on the disc.  This software on this disc
  is primarily intended for research purposes, and is not really
  intended to be used as a production system.  Boot images are
  provided for Sun SPARCstations and DECstations.  The CD-ROM is in
  ISO-9660 format with Rock Ridge extensions.  The disc contains about
  550 megabytes of software.

  You can get an overview of the Sprite Project, and a complete list
  of what is on this disc, by anonymous ftp from

  If you would like a CD-ROM please send $25.  Add $4.95 if you would
  like a caddy too.  S&H is $5 (per order, not per disc) for
  US/Can/Mex, and $10 for overseas.  If you live in California, please
  add sales tax.  You can send a check or money order, or you can
  order with Mastercard/Visa/AmEx.
	Bob Bruce <>
	Walnut Creek CDROM
	1547 Palos Verdes Mall, Suite 260
	Walnut Creek, CA 94596
	United States

	   1 800 786-9907 (USA only)
	  +1 510 947-5996
	  +1 510 947-1644 (fax)

- VSTa is a copylefted system written by Andrew Valencia
  <> which uses ideas from several research
  operating systems in its implementation.  It is currently in an
  `experimental but usable' state, and supports `lots of' POSIX, and
  runs on a number of different PC configurations.  For further
  information, send mail to <>, or ftp to

[Chorus, Clouds?, Choices?]

Subject: [1.4.2] Unix lookalikes
From: Available software

- FreeBSD is available via ftp from
  <URL:>, and
  <URL:>.  The latest
  version is derived from 4.4BSD Lite, and contains many extensions.
  See <URL:> for further information.

- NetBSD is available via ftp from
  <URL:>, and is also derived from
  4.4BSD Lite.  See <URL:> for more information.

- Linux is available via anonymous ftp from
  <URL:>, <URL:>,
  and <URL:>.  It is a freely-distributable
  System V compatible Unix, and is covered by the GNU General Public
  License.  Linux runs almost all PCs with i386 or better CPUs and at
  least 4 megabytes of memory.  See <URL:> for further

- 386BSD is available via ftp from
  <URL:> or
  <URL:>.  It lies mid-way between
  4.3BSD Reno and 4.4BSD internally, and contains no AT&T-copyrighted
  code.  386BSD runs on ISA bus PCs with i386 or better CPUs.  Use of
  386BSD is not recommended, since it is unstable and has long since
  been superseded by FreeBSD and NetBSD.

- The Hurd is the GNU operating system, being written by Michael
  Bushnell.  It is based on Mach 3.0, and should be available on most
  systems to which Mach has been ported.  A preliminary runnable image
  may be fetched from
  <URL:>.  Trent
  A. Fisher <> runs an unofficial Hurd page
  at <URL:>.

- Lites is a free 4.4BSD-based Unix server which runs on top of Mach.
  Lites provides binary compatibility with 4.4 BSD. NetBSD (0.8, 0.9,
  and 1.0), FreeBSD (1.1.5 and 2.0), 386BSD, UX (4.3BSD) and Linux on
  the i386 platform.  It has also been ported to the pc532, and
  PA-RISC. Preliminary ports to the R3000 and Alpha processors have
  also been made.  For more information, see the Lites home page at
  <URL:>, and see also

Subject: [1.4.3] Others
From: Available software

[93-03-18-10-19.02] Microsoft is making sources of Windows NT
available under license to universities and research laboratories.
You should have the appropriate officials contact
<> to get started on this process.

Patrick Bridges' operating systems home page at
<URL:> is an
excellent source of information on a variety of other operating

Subject: [2] Performance and workload studies
From: Performance and workload studies

This section covers various different publicly-available traces and
studies, libraries and source distributions, which may be of use.

Subject: [2.1] TCP internetwork traffic characteristics
From: Performance and workload studies

- The Internet Traffic Archive is a moderated repository to support
  widespread access to traces of Internet network traffic.  The traces
  can be used to study network dynamics, usage characteristics, and
  growth patterns, as well as providing the grist for trace-driven
  simulations.  The archive is also open to programs for reducing raw
  trace data to more manageable forms, for generating synthetic
  traces, and for analyzing traces.  The archive is available on the
  Web at <URL:>.
  There you will find a description of the archive, its associated
  mailing lists, the moderation policy and submission guidelines, and
  the contents of the archive (traces and programs).

- [92-10-20-15-04.39] Peter Danzig and Sugih Jamin of USC have made
  available a report and a source library which simulates realistic
  day-to-day network traffic between nodes.  The library, tcplib, `is
  motivated by our observation that present-day wide-area tcp/ip
  traffic cannot be accurately modeled with simple analytical
  expressions, but instead requires a combination of detailed
  knowledge of the end-user applications responsible for the traffic
  and certain measured probability distributions'.

  The technical report and the source library it describes are
  available via anonymous ftp from
  <URL:>.  All you need to
  transfer to use the library are: README, brkdn_dist.h, tcpapps.h,
  tcplib.1, and one of libtcp* that matches your setup.  You need
  tcplib.tar.Z only if you must generate the library yourself.  The
  file is the PostScript version of the report.  The
  authors may be contacted at <>.

- [93-08-09-15-15.54] Vern Paxson of Lawrence Berkeley Laboratories
  has a report available via anonymous ftp which describes analytic
  models for wide-area TCP connections based upon a set of wide-area
  traffic traces.  The report may be obtained from

- [93-05-13-10-54.09] Vern Paxson also has made available another
  report, <URL:>, which
  provides an analysis of the growth trends of a medium-sized research
  laboratory's wide-area TCP connections over a period of more than
  two years.

Subject: [2.2] File system traces
From: Performance and workload studies

- Randy Appleton <> has a set of filesystem traces
  which detail every operation performed during a period of more than
  a week (several hundred thousand events).  Timestamps on the traces
  are accurate to under a millisecond.  For more details, contact the
  author, or visit <URL:>.

- Chris Ruemmler has done a study on low-level disk access patterns
  for a workstation, a server, and a time-shared system which appeared
  in the Winter 1993 USENIX proceedings.  A copy may be obtained via
  anonymous ftp from <URL:>.

- Stephen Russell <> has instrumented the SunOS 4.1.x
  kernel running on Sun 3 machines.  The system allows time-stamped
  event records to be obtained from various points in the kernel.
  Events can be categorised (eg, paging, file system, etc), and are
  read via pseudo-devices.  Ioctl calls allow substreams to be
  enabled/disabled, buffer status checked, etc.  An external high
  resolution timer is used for timestamping.

- [93-05-09-09-23.32] The traces used in `Measurements of a
  distributed file system' (SOSP 1991) may be obtained from

Subject: [2.3] Modern Unix file and block sizes
From: Performance and workload studies

The following sections are lifted more or less verbatim from a number
of traces which were co-ordinated and analysed by Gordon Irlam
<>.  The numbers quoted below are based on Unix
file size data for 12 million files, residing on 1000 file systems,
with a total size of 250 gigabytes.

Further information may be obtained on the World Wide Web at

Subject: [2.3.1] File sizes
From: Performance and workload studies

There is no such thing as an average file system.  Some file systems
have lots of little files.  Others have a few big files.  However as a
mental model the notion of an average file system is invaluable.

The following table gives a break down of file sizes and the amount of
space they consume.

   file size       #files  %files  %files   disk space  %space  %space
(max. bytes)                        cumm.         (Mb)           cumm.
           0       147479     1.2     1.2          0.0     0.0     0.0
           1         3288     0.0     1.2          0.0     0.0     0.0
           2         5740     0.0     1.3          0.0     0.0     0.0
           4        10234     0.1     1.4          0.0     0.0     0.0
           8        21217     0.2     1.5          0.1     0.0     0.0
          16        67144     0.6     2.1          0.9     0.0     0.0
          32       231970     1.9     4.0          5.8     0.0     0.0
          64       282079     2.3     6.3         14.3     0.0     0.0
         128       278731     2.3     8.6         26.1     0.0     0.0
         256       512897     4.2    12.9         95.1     0.0     0.1
         512      1284617    10.6    23.5        566.7     0.2     0.3
        1024      1808526    14.9    38.4       1442.8     0.6     0.8
        2048      2397908    19.8    58.1       3554.1     1.4     2.2
        4096      1717869    14.2    72.3       4966.8     1.9     4.1
        8192      1144688     9.4    81.7       6646.6     2.6     6.7
       16384       865126     7.1    88.9      10114.5     3.9    10.6
       32768       574651     4.7    93.6      13420.4     5.2    15.8
       65536       348280     2.9    96.5      16162.6     6.2    22.0
      131072       194864     1.6    98.1      18079.7     7.0    29.0
      262144       112967     0.9    99.0      21055.8     8.1    37.1
      524288        58644     0.5    99.5      21523.9     8.3    45.4
     1048576        32286     0.3    99.8      23652.5     9.1    54.5
     2097152        16140     0.1    99.9      23230.4     9.0    63.5
     4194304         7221     0.1   100.0      20850.3     8.0    71.5
     8388608         2475     0.0   100.0      14042.0     5.4    77.0
    16777216          991     0.0   100.0      11378.8     4.4    81.3
    33554432          479     0.0   100.0      11456.1     4.4    85.8
    67108864          258     0.0   100.0      12555.9     4.8    90.6
   134217728           61     0.0   100.0       5633.3     2.2    92.8
   268435456           29     0.0   100.0       5649.2     2.2    95.0
   536870912           12     0.0   100.0       4419.1     1.7    96.7
  1073741824            7     0.0   100.0       5004.5     1.9    98.6
  2147483647            3     0.0   100.0       3620.8     1.4   100.0

A number of observations can be made:
  - the distribution is heavily skewed towards small files
  - but it has a very long tail
  - the average file size is 22k
  - pick a file at random: it is probably smaller than 2k
  - pick a byte at random: it is probably in a file larger than 512k
  - 89% of files take up 11% of the disk space
  - 11% of files take up 89% of the disk space

Such a heavily skewed distribution of file sizes suggests that, if one
were to design a file system from scratch, it might make sense to
employ radically different strategies for small and large files.

The seductive power of mathematics allows us treat a 200 byte and a
2MB file in the same way.  But do we really want to?  Are there any
problems in engineering where the same techniques would be used in
handling physical objects that span 6 orders of magnitude?

A quote from sci.physics that has stuck with me: `When things change
by 2 orders of magnitude, you are actually dealing with fundamentally
different problems'.

People I trust say they would have expected the tail of the above
distribution to have been even longer.  There are at least some files
in the 1-2G range.  They point out that DBMS shops with really large
files might have been less inclined to respond to a survey like this
than some other sites.  This would bias the disk space figures, but it
would have no appreciable effect on file counts.  The results gathered
would still be valuable because many static disk layout issues are
determined by the distribution of small files and are largely
independent of the potential existence of massive files.

(It should be noted that many popular DBMSs, such as Oracle, Sybase,
 and Informix, use raw disk partitions instead of Unix file systems
 for storing data, hence the difficulty in gathering data about them
 in a uniform way.)

Subject: [2.3.2] Block sizes
From: Performance and workload studies

The last block of a file is normally only partially occupied, and so
as block sizes are increased so too will the the amount of wasted disk

The following historical values for the design of the BSD FFS are
given in `Design and implementation of the 4.3BSD Unix operating

fragment size   overhead
   (bytes)        (%)
      512         4.2
     1024         9.1
     2048        19.7
     4096        42.9

Files have clearly gotten larger since then; I obtained the following

fragment size   overhead
   (bytes)        (%)
      128         0.3
      256         0.6
      512         1.1
     1024         2.5
     2048         5.4
     4096        12.3
     8192        27.8
    16384        61.2

By default the BSD FFS typically uses a 1k fragment size.  Perhaps
this size is no longer optimal and should be increased.

(The FFS block size is constrained to be no more than 8 times the
 fragment size.  Clustering is a good way to improve throughput for
 FFS based file systems, but it doesn't do very much to reduce the not
 insignificant FFS computational overhead.)

It is interesting to note that even though most files are less than 2K
in size, having a 2K block size wastes very little space, because disk
space consumption is so totally dominated by large files.

Subject: [2.3.3] Inode ratios
From: Performance and workload studies

The BSD FFS statically allocates inodes.  By default one inode is
allocated for every 2K of disk space.  Since an inode consumes 128
bytes this means that by default 6.25% of disk space is consumed by

It is important not to run out of inodes since any remaining disk
space is then effectively wasted.  Despite this allocating 1 inode for
every 2K is excessive.

For each file system studied I worked out the minimum sized disk it
could be placed on.  Most disks needed to be only marginally larger
than the size of their files, but a few disks, having much smaller
files than average, needed a much larger disk---a small disk had
insufficient inodes.

bytes per   overhead
  inode       (%)
   1024      12.5
   2048       6.3
   3072       4.5
   4096       4.2
   5120       4.4
   6144       4.9
   7168       5.5
   8192       6.3
   9216       7.2
  10240       8.3
  11264       9.5
  12288      10.9
  13312      12.7
  14336      14.6
  15360      16.7
  16384      19.1
  17408      21.7
  18432      24.4
  19456      27.4
  20480      30.5

Clearly, the current default of one inode for every 2K of data is too
small.  Earlier results suggested that allocating one inode for every
5-6k was in some sense optimal, and allocating one inode for every 8k
would only be 0.4% worse.  The new data suggests one inode for every
4k is optimal, and allocating one inode for every 8k would be 2.1%

The analysis technique I used is very sensitive to even a few file
systems with very small files.

The main source of file systems with lots of small files would appear
to be netnews servers.  The typical Usenet message would appear to be
1-2k in length.  Ignoring such file systems would drastically alter
the conclusions I reach.  If, as I believe might already be the case,
news servers are manually tuned to have a lower than normal bytes per
inode ratio, it would then be possible to justify setting the default
ratio much higher.

Clearly it is best if the file system dynamically allocate inodes; I
believe AIX does this for instance.  Systems that statically allocate
inodes should probably increase the bytes per inode ratio, but it is
not clear to exactly what value.  The engineer in me says `it is
important to play this one conservatively: stick to 6k', the artist
goes `as Chris Torek says: aesthetics, 8k'.

Subject: [3] Papers, reports, and bibliographies
From: Papers, reports, and bibliographies

Network-available documents are listed in this section.  I'd like to
see information for obtaining other sets of reports which aren't
electronically-available included here as well, at some stage.

Subject: [3.1] From where are papers for distributed systems available?
From: Papers, reports, and bibliographies

















Plan 9





X kernel / Scout

Papers covering Amoeba, Choices, Chorus, Clouds, the Hurd, Guide,
Mach, Mars, NonStop, and Plan 9 are also available via anonymous ftp
from <URL:>.

[I'd like to find the authoritative home for V---Mars and NonStop are
 a bit more obscure, I think; they certainly aren't asked after much]

Subject: [3.2] Where can I find other papers?
From: Papers, reports, and bibliographies



Cache kernel





QNX [93-09-19-22-22.26]

Solaris 2.x [93-02-23-12-12.43]




Windows NT [92-09-18-11-46.16]

Subject: [3.3] Where can I find bibliographies?
From: Papers, reports, and bibliographies

Distributed shared memory

Load balancing

Mobile computing

Multimedia operating systems [94-04-15-23-29.51]

Object-oriented operating systems

Parallel and distributed I/O

Sprite network operating system

See also the section on General Net Resources.

[There's quite a lot more at <URL:>, if
 anyone wants to add more to this list.]

Subject: [4] General Internet-accessible resources
From: General Internet-accessible resources

This section contains information about a variety of services
available to the OS research community via the Internet.

Subject: [4.1] Wide Area Information Service (WAIS) and World-Wide Web (WWW) servers
From: General Internet-accessible resources

[92-09-21-16-38.23] Loughborough University high-performance
networking and distributed systems archive may be accessed via the
World Wide Web at <URL:>.  This archive
contains, according to Jon Knight <>, the

- Technical reports and papers written at LUT by the networks and
  distributed systems researchers in the Department of Computer

- Technical reports, papers and theses which have been produced at
  other sites and then made available for public electronic access.

- Software which is of use in research or which has been produced by a
  specific research project.

- Details of relevant conferences, collected from a variety of sources
  (USENET, email, flyers, etc).

- Information on ongoing research projects.

- Bibliographies that have been generated for research at LUT and also
  access to other WAIS indexed bibliographies, both at LUT and

- A list of contacts in the field, with details of their research
  interests.  This is entirely voluntary (i.e. people have agreed to
  Jon entering their details rather than him just rooting round the
  Internet to build up the information).

Bibliographies in the comp.os.research collection are accessible via
	 :version  3 
	 :ip-address ""
	 :ip-name ""
	 :tcp-port 210
	 :database-name "os-bibliographies"
	 :cost 0.00 
	 :cost-unit :free 
	 :maintainer ""
	 :description "Server created with WAIS release 8 b5
		on Jul 9 22:38:27 1992 by
		The files of type bibtex used in the index
		were: /home/ftp/pub/bib"

Subject: [4.2] Refdbms---a distributed bibliographic database system
From: General Internet-accessible resources

[92-10-01-11-39.32] The 13th alpha release of refdbms version 3,
developed by John Wilkes of the Concurrent Systems Project at
Hewlett-Packard Laboratories and Richard Golding of the Concurrent
Systems Laboratory at UC Santa Cruz, is now available.  It can be
obtained by anonymous ftp from <URL:>.
The system has been tested on Sun 3 and 4 systems running SunOS 4.1.x,
and on DECstations running Ultrix 4.1.  It is an experiment in
building weak-consistency wide-area distributed applications, and the
databases currently available for the system have a good systems

The system includes tools to query the database, to produce
bibliographies for LaTeX documents, and to enter new references into
the database.  It is part of ongoing research into wide-area
distributed information systems on the Internet.

Features include:

- Distributed databases: a reference database can be shared among
  multiple sites.  Updates can be entered at any site, and will be
  propagated to the other sites holding a replica of the database.

- Multiple databases: every database has a name, and users specify the
  order in which databases will be searched.

- Private databases: databases can be private, available site-wide, or
  they can be made available to other sites.

- Database query by keyword, author, and title word.

- Translator for refer-format databases.

- Usable with LaTeX documents: the internal refdbms format can be
  translated into a special BibTeX format.

An up-to-date list of bibliographies exported by various institutions
may be obtained using anonymous ftp from

Subject: [4.3] Willow -- the information looker-upper
From: General Internet-accessible resources

The University of Washington's Willow system provides a Motif-based
user interface to a heterogeneous collection of on-line bibliographic
databases.  It will compile and run on most systems which provide a
Motif library.

For further information, see the Willow home page at

Subject: [4.4] Computer science bibliographies and technical reports
From: General Internet-accessible resources

- A collection of bibliographies in various fields of computer science
  is available via anonymous ftp and the World Wide Web.  The
  bibliographies contain about 260,000 references, most of which are
  references to journal articles, conference papers or technical
  reports.  The collection has been formed by using various freely
  accessible services in the Internet (anonymous ftp, mailserver,
  wais, telnet) and converting each bibliography into a uniform BibTeX
  format.  It is organised in files containing references to a (more
  or less) specific area within computer science.

  The database has been organised by Alf-Christian Achilles
  <>.  It may be accessed on the Web at
  <URL:>, via ftp from
  <URL:>, and through a
  more useful search mechanism on the Web at

- As part of the ARPA Electronic Library Project, the Database Group
  at Stanford is providing a Selective Dissemination of Information
  (SDI) service to disseminate information about computer science
  technical reports.  You can have a server email you periodic
  announcements of new papers on topics that interest you.

  See <URL:> for details, or
  contact Tak Yan <> or the mail server itself
  at <>.

Subject: [4.5] The comp.os.research archive
From: General Internet-accessible resources

[93-02-18-21-18.31] An archive of all messages posted to
comp.os.research since 1988 is maintained at UC Santa Cruz.  It may be
accessed via anonymous ftp at
<URL:>.  The archive is
organised by year.

Postings may also be found via WAIS at UCSC's Computer Science gopher
	 :version  3 
	 :ip-address ""
	 :ip-name ""
	 :tcp-port 210
	 :database-name "comp-os-research"
	 :cost 0.00 
	 :cost-unit :free 
	 :maintainer ""

	 :description "Server created with WAIS release 8 b5
		on Jul 9 03:51:11 1992 by
		The files of type netnews used in the index
		were: /home/ftp/pub/comp.os.research"

Subject: [4.6] Miscellaneous resources
From: General Internet-accessible resources

- Paul Harrington <> maintains a World
  Wide Web page on checkpointing, at
- Jay Lepreau <> has made available an
  electronic version of the proceedings of OSDI '94 at
  <URL:>.  Available are such
  things as
  - Papers:               abstracts, papers, slides, bibtex entries,
                          and for most, the actual software.
  - Keynote:              audio and slides
  - Extensible OS panel:  audio, slides, project URLs
  - Insularity panel:     audio
  - Mach/Chorus workshop: TRs for most, slides, some software
  - Tutorials:            slides for half, descriptions for all
  - Miscellaneous:        summary report from ;login, list of works-in-progress
                          talks, hard-copy proceedings ordering info, CFP,
                          proceedings introduction, list of referees.

Subject: [5] Disclaimer and copyright
From: Disclaimer and copyright

Note that this document is provided as is.  The information in it is
not warranted to be correct; you use it at your own risk.
Following recent reports on the <> list I
think it wise to change the copyright:


Answers to Frequently Asked Questions for comp.os.research (hereafter
referred to as These Articles) are Copyright (C) 1993, 1994, 1995, and 1996
by Bryan O'Sullivan <>.  They may be reproduced and
distributed in whole or in part, subject to the following conditions:
- This copyright and permission notice must be retained on all
  complete or partial copies of These Articles.

- These Articles may be copied or distributed in part or in full for
  personal or educational use.  Any translation, derivative work, or
  copies made for other purposes must be approved by the copyright
  holder before distribution, unless otherwise stated.

- If you distribute These Articles, instructions for obtaining the
  complete current versions of them free or at cost price must be
  included.  Redistributors must make reasonable efforts to maintain
  current copies of These Articles.

Exceptions to these rules may be granted, and I shall be happy to
answer any questions about this copyright notice -- write to Bryan
O'Sullivan, PO Box 62215, Sunnyvale, CA 94088-2215, USA or email
<>.  These restrictions are here to protect the
contributors, not to restrict you as educators and learners.

  Закладки на сайте
  Проследить за страницей
Created 1996-2017 by Maxim Chirkov  
Hosting by Ihor