|
IDS Forum
Re: Are there issues when using RAW dbspaces o....
Posted By: Art Kagel Date: Wednesday, 28 July 2010, at 5:11 p.m.
In Response To: RE: Are there issues when using RAW dbspaces o.... (jim-cramer@uiowa.edu)
ESX has to be setup to pass along IO sync operations to the host OS for
filesystems. For RAW devices, it should not even be a concern, but check
with VMWare support.
Art
Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
IIUG Board of Directors (art@iiug.org)
Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Advanced DataTools, the IIUG, nor any other
organization with which I am associated either explicitly, implicitly, or by
inference. Neither do those opinions reflect those of other individuals
affiliated with any entity with which I am affiliated nor those of the
entities themselves.
On Wed, Jul 28, 2010 at 11:40 AM, jim-cramer@uiowa.edu <jim-cramer@uiowa.edu
> wrote:
> Art,
>
> Incredible, and useful information for now and for the future.
>
> Luckily I have convinced everyone involved here into the
> wisdom of going with Raw db chunks.
>
> One Sys Admin did mention a new angle that I had not mentioned
> previously.
>
> I don't know if you do much with IDS in VMWare instances, but
> we will have ours on an ESX Server. The Admin's concern,
> which regards scenarios of power failure or failure of some
> sort with the ESX Server, is loss of data and/or data
> integrity due to the buffering that takes place between the VM, containing
> my dbspaces, and the physical ESX Server host.
>
> Any info on this?
>
> Thanks again for your help,
>
> Jim
>
> -----Original Message-----
> From: ids-bounces@iiug.org [mailto:ids-bounces@iiug.org] On Behalf Of Art
> Kagel
> Sent: Wednesday, July 28, 2010 7:49 AM
> To: ids@iiug.org
> Subject: Re: Are there issues when using RAW dbspaces o.... [20668]
>
> My last comments on journaling. On meta-data change journaling:
>
> - This (logical metadata only journaling) is the method used by EXT3,
>
> EXT4, and ZFS
>
> - All three use block relocation instead of physical block journaling.
>
> This means that on write a block is always written to a new location rather
>
> than overwriting the existing block on disk. A properly designed JFS
>
> (Journaled File System) will commit the new version of the disk block
> before
>
> updating the metadata or the logical journal (that's the problem with EXT4
> -
>
> and EXT3 with write-back enabled they write the metadata first, then the
>
> journal entry before actually committing the physical change to disk). Once
>
> the write and journal are completed the FS metadata is updated. This means
>
> that on a crash there are three possibilities:
>
> - The new block version was partially or completely written but the
>
> journal entry was not written.
>
> - The new block version and journal entry were written and committed.
>
> - The new block version, journal, and metadata were written and
>
> committed.
>
> In the first case, after recovery, the file remains unchanged. In the
> second case, after recovery, the FS makes the missing metadata entries and
> the file is modified during recovery and the original block version is
> freed
>
> for reuse. In the third case all was well before the crash and the original
> version of the block was released for reuse.
>
> The problem with EXT4 (and EXT3 with write-back enabled) is that the
> application (meaning in this case Informix) thinks everything is hunky dory
> since the FS acknowledged the change as committed. However, immediately
> after the acknowledgement the physically modified block is still ONLY in
> cache and only the metadata and journal entry have been saved to disk. At
> this point if there is a crash, the file is actually unrecoverable! The
> metadata and the journal entry say the block has been moved to a new
> location and rewritten, but the new location has garbage in it from some
> previous block. This one made Linus Torvalds absolutely livid and he tore
> the EXT4 designers a new one over the design. Last I heard you could not
> disable the write-back behavior of EXT4 - Linus was pushing to have that
> fixed, but I don't know if it ever was.
>
> EXT3 in default mode and ZFS at least are safe, but the problem with them
> is
>
> just the fact of the block relocations. There is the performance problem of
> rewriting a whole block every time the database changes a single page
> within
>
> the block and so negating much of the gains of caching and there is the
> bigger problem that the file is no longer even as contiguous as a
> non-journaled filesystem would have it be. Standard UNIX filesystems
> allocate blocks of contiguous space and try to leave free space that is
> contiguous with those allocated blocks unused when allocating space for
> other files so that as a file grows it remains mostly contiguous in
> multi-block chunks. This fragments the free space in an FS making it
> difficult to write vvery large files (like Informix chunks) that are
> contiguous, but if you keep the chunks on an FS that's dedicated to
> Informix
>
> chunks that's not a real problem (at least currently) since Informix does
> not currently extend existing chunks over time. JFS's break that rule
> keeping the contiguous bits of a file the same as the block level. Even if
> a chunk were allocated as contiguous initially, over time the JFS will
> cause
>
> the file to become fragmented. If you make the FS block size smaller to
> alleviate the costs of multiple block rewrites, you make the file
> fragmentation worse.
>
> These problems don't affect filesystems and normal files as much as
> databases because the nature of the IO to files is different than IO to
> databases. When you write to a flat file, you write mostly sequentially,
> your rarely rewrite a portion of the file (unless you rewrite the entire
> file) and you never sync the file to disk before you close the file. That
> means that the cache will coalesce all writes until an entire block has
> been
>
> written out before the FS and OS cause a flush and sync of the cache to
> disk. That means that the FS has the ability to try to keep the rewritten
> blocks contiguous by allocating the replacement blocks contiguously.
> Essentially the file is relocated whole if it is rewritten.
>
> Databases don't work that way. Informix writes every block to a COOKED
> device or file either under O_SYNC or O_DIRECT control both of which force
> the single write (and Informix only ever writes a single page or eight
> contiguous pages at a time) to be physically written and committed before
> the write() call returns. That means that the coalescing features of the FS
> and OS cache management are bypassed in favor of data safety. That means
> that if the engine performs what it thinks is a sequential scan, it is
> actually performing a random read of the file swinging the read/write heads
> back and forth across the disk. If the physical structure is shared with
> other applications (can you say massive SAN?) that will also be competing
> with those other applications for head positioning. In normal sequential
> scanning (ie RAW or COOKED device or non-JFS files) the read ahead reduces
> the performance impact of this head contention somewhat. In a JFS, it
> cannot help at all. So I guess I have to change my mantra:
>
> NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO
> RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO RAID5!!!
> NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO
> RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO RAID5!!!
> NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO
> RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO RAID5!!!
> NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO RAID5!!! NO JFS, NO
> RAID5!!! NO JFS, NO RAID5!!!
>
> Art
>
> Art S. Kagel
> Advanced DataTools (www.advancedatatools.com)
> IIUG Board of Directors (art@iiug.org)
>
> Disclaimer: Please keep in mind that my own opinions are my own opinions
> and
>
> do not reflect on my employer, Advanced DataTools, the IIUG, nor any other
> organization with which I am associated either explicitly, implicitly, or
> by
>
> inference. Neither do those opinions reflect those of other individuals
> affiliated with any entity with which I am affiliated nor those of the
> entities themselves.
>
> On Wed, Jul 28, 2010 at 4:18 AM, Fernando Nunes
> <domusonline@gmail.com>wrote:
>
> > I think nobody explains it better than Art, but I'd like to stress out
> two
>
> > points:
> >
> > 1- Every sysadmin talks about the advantages of journaling, but it's
> > amazing
> > how these people forget about reality. Let's start with this Wikipedia
> > article:
> > http://en.wikipedia.org/wiki/Journaling_file_system
> > They split the journaling system into two: Physical and Logical. The
> first
>
> > logs all blocks (data blocks also) and the second only file metadata.
> > Let's dig into this: Physical has a lot of performance impact (which is
> > obvious) and it's ABSOLUTELY useless for databases, since the databases
> > MUST
> > do this.
> > The logical journaling only stores metadata changes. PLEASE, can anybody
> > ask
> > the sysadmin what kind of metadata is changes in a filesystem where only
> > Informix chunks are stored?! We don't (currently at least) change the
> size
>
> > of chunks...
> >
> > 2- The backup argument is alarming. And it does because this is not
> > sysadmin
> > task or responsability, and the DBA must make sure this is his
> > responsability. Surely no one is thinking about doing filesystem backups
> of
> > database chunks with a live database...
> >
> > Regards.
> >
> > On Tue, Jul 27, 2010 at 10:59 PM, Art Kagel <art.kagel@gmail.com> wrote:
> >
> > > YOW! Cooked spaces, Journaled filesystems (bet they want to use EXT4 to
> > > boot), AND VM images. You've got the triple crown there Jim! Lord God
> in
>
> > > Heaven, tell me they're not also saddling you with RAID5, RAID6, or
> RAIDZ
> > > on
> > > top of all that!
> > >
> > > 1) ESX doesn't care about RAW spaces AFAIK and VMs and RAW disk work
> > about
> > > as well as VMs and any other form of storage, which is to say VERY
> BADLY!
> > > My testing for a major developer of highly embedded systems show that
> IO
>
> > > under a VM performs 10x SLOWER than IO performance on the underlying
> > > hardware/OS. I would seriously consider running your server on a
> > commodity
> > > Linux box instead.
> > >
> > > 2) You are correct, of course. Any backup of Informix chunks made at
> the
>
> > > operating system level, especially if they are made, as seems to be the
> > > intent of your SAs, at the level of the underlying host OS, will be
> > > completely useless for restoring the database unless you put the engine
> > > into
> > > external back up mode and block transactions for the duration of the
> > > backup. Otherwise only an ontape or onbar backup will be usable to
> > restore
> > > the engine.
> > >
> > > 3) That "little bit" is 10-20% performance increase of RAW device
> versus
>
> > > COOKED device, and an additional 5-10% for non-journaled filesystem
> > chunks
> > > versus COOKED devices. That is without O_DIRECT enabled, but the cuts
> the
> > > cost down to about 5-10% RAW over COOKED and another 2-5% for
> > filesystems,
> > > which while it is MUCH better, is not trivial. Add to that the extra
> cost
> > > of journaling (at least 5% but usually more like 15%) and the cost of
> > doing
> > > all of this on a VM versus raw hardware (about 90% in my testing).
> > >
> > > 4) The journaling, as I've already stated is redundant and therefore a
> > > performance hit that buys you NOTHING! Informix's logical and physical
> > > logging is FAR more efficient at recovering the database after a crash
> > and
> > > adding the filesystem recovery to that will only delay the beginning of
> > the
> > > engine's fast recovery mechanism.
> > >
> > > 5) Run! Run fast and run far. You do NOT want to be associated with
> this
>
> > > system once it's rolled out.
> > >
> > > Art
> > >
> > > Art S. Kagel
> > > Advanced DataTools (www.advancedatatools.com)
> > > IIUG Board of Directors (art@iiug.org)
> > >
> > > Disclaimer: Please keep in mind that my own opinions are my own
> opinions
>
> > > and
> > > do not reflect on my employer, Advanced DataTools, the IIUG, nor any
> > other
> > > organization with which I am associated either explicitly, implicitly,
> or
> > > by
> > > inference. Neither do those opinions reflect those of other individuals
> > > affiliated with any entity with which I am affiliated nor those of the
> > > entities themselves.
> > >
> > > On Tue, Jul 27, 2010 at 3:14 PM, Jim Cramer <jim-cramer@uiowa.edu>
> > wrote:
> > >
> > > > HELP!
> > > >
> > > > Here is another angle to the recent questions, and explanations
> > > > by Art, et. al, regarding using RAW instead of COOKED dbspaces
> > > > on IDS 11.5 running on Linux.
> > > >
> > > > For reasons cited by Art and the others, I (the DBA) wish to use
> > > > RAW dbspaces when I move my instances to Linux (SUSE 11 SLES)
> > > > Virtual Machines on an ESX Server running VMWare.
> > > >
> > > > But my Sys Admins here are refusing to allow RAW spaces and citing
> > > > all kinds of vague, generalized reasons why, such as:
> > > >
> > > > 1) the "host administration utilities" will not fly right
> > > > with Raw Spaces but will not be specific about what would
> > > > go wrong. In general, my sense is that they feel that
> > > > the Management Console and Tools for the ESX host might
> > > > see the Raw Dbspace and, because it does not contain not a formatted
> > > > filesystem, allocate it to something on the box.
> > > >
> > > > 2) using raw space will not allow the VMs containing the IDS
> > > > dbspaces to be backed up or fit into their backup strategy
> > > > and backup utility.
> > > >
> > > > They cannot seem to understand that a normal backup utility
> > > > would not know how to deal with the dbspaces even if they
> > > > were Cooked.
> > > >
> > > > 3) that the "little bit" of performance that they claim I might
> > > > get with Raw will not pay back for the increased Sys Admin
> > > > overhead.
> > > >
> > > > 4) that they will use a Journaled File System (along with Cooked
> > > > dbspaces) because it is more robust, fault-tolerant, comes
> > > > back up quicker after a crash, etc.
> > > >
> > > > Can anybody provide me with any concrete information/experience
> > > > that is related to the above points, particularly (1) .
> > > >
> > > > Does anyone know if raw dbspaces can cause problems in a Virtual
> > > > Machine on a VMWare ESX Server.
> > > >
> > > > If you have some info and have time to send it soon, that would
> > > > be appreciated because I am about to do battle over this with
> > > > our Sys Admins.
> > > >
> > > > Thanks much in advance,
> > > >
> > > > Jim Cramer
> > > > Database Administrator and
> > > > Applications Developer III
> > > > University of Iowa
> > > > College of Engineering
> > > > Engineering Computer Systems Support
> > > > 1256 SC
> > > > Iowa City, Iowa 52245
> > > > (319)-335-5757
> > > > jim-cramer@uiowa.edu
> > > > jcramer@engineering.uiowa.edu
> > > > http://css.engineering.uiowa.edu
> > > > http://www.engineering.uiowa.edu
> > > > http://www.uiowa.edu
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
> ****************************************************************************
> ***
> > > > Forum Note: Use "Reply" to post a response in the discussion forum.
> > > >
> > > >
> > >
> > > --0016e659f7fea2c12b048c659cf6
> > >
> > >
> > >
> > >
> >
> >
>
> ****************************************************************************
> ***
> > > Forum Note: Use "Reply" to post a response in the discussion forum.
> > >
> > >
> >
> > --
> > Fernando Nunes
> > Portugal
> >
> > http://informix-technology.blogspot.com
> > My email works... but I don't check it frequently...
> >
> > --0016364585e4393f9a048c6e4358
> >
> >
> >
> >
>
> ****************************************************************************
> ***
> > Forum Note: Use "Reply" to post a response in the discussion forum.
> >
> >
>
> --000e0cd28d429a2ba1048c720899
>
>
> ****************************************************************************
> ***
> Forum Note: Use "Reply" to post a response in the discussion forum.
>
>
>
> *******************************************************************************
> Forum Note: Use "Reply" to post a response in the discussion forum.
>
>
--90e6ba4fc2e2fafba0048c790fa1
Messages In This Thread
IDS Forum is maintained by Administrator with WebBBS 5.12.
|
|