Join IIUG
 for   
 

Informix News
18 Nov 13 - ZDNet - Top 20 mobile skills in demand... Read
09 Sep 13 - telecompaper - Shaspa and Tatung have shown a new smart home platform at Ifa in Berlin. Powered by the IBM Informix software... Read
06 Sep 13 - IBM data magazine - Mission Accomplished - Miami, Florida will be the backdrop for the 2014 IIUG Informix Conference... Read
01 Feb 13 - IBM Data Magazine - Are your database backups safe? Lester Knutsen (IBM Champion) writes about database back up safety using "archecker"... Read
14 Nov 12 - IBM - IBM's Big Data For Smart Grid Goes Live In Texas... Read
3 Oct 12 - The Financial - IBM and TransWorks Collaborate to Help Louisiana-Pacific Corporation Achieve Supply Chain Efficiency... Read
28 Aug 12 - techCLOUD9 - Splunk kicks up a SaaS Storm... Read
10 Aug 12 - businessCLOUD9 - Is this the other half of Cloud monitoring?... Read
3 Aug 12 - IBM data management - Supercharging the data warehouse while keeping costs down IBM Informix Warehouse Accelerator (IWA) delivers superior performance for in-memory analytics processing... Read
2 Aug 12 - channelbiz - Oninit Group launches Pay Per Pulse cloud-based service... Read
28 May 12 - Bloor - David Norfolk on the recent Informix benchmark "pretty impressive results"... Read
23 May 12 - DBTA - Informix Genero: A Way to Modernize Informix 4GL Applications... Read
9 Apr 12 - Mastering Data Management - Upping the Informix Ante: Advanced Data Tools... Read
22 Mar 12 - developerWorks - Optimizing Informix database access... Read
14 Mar 12 - BernieSpang.com - International Informix User Group set to meet in San Diego... Read
1 Mar 12 - IBM Data Management - IIUG Heads West for 2012 - Get ready for sun and sand in San Diego... Read
1 Mar 12 - IBM Data Management - Running Informix on Solid-State Drives.Speed Up Database Access... Read
26 Feb 12 - BernieSpan.com - Better results, lower cost for a broad set of new IBM clients and partners... Read
24 Feb 12 - developerWorks - Informix Warehouse Accelerator: Continuous Acceleration during Data Refresh... Read
6 Feb 12 - PRLOG - Informix port delivers unlimited database scalability for popular SaaS application ... Read
2 Feb 12 - developerWorks - Loading data with the IBM Informix TimeSeries Plug-in for Data Studio... Read
1 Feb 12 - developerWorks - 100 Tech Tips, #47: Log-in to Fix Central... Read
13 Jan 12 - MC Press online - Informix Dynamic Server Entices New Users with Free Production Edition ... Read
11 Jan 12 - Computerworld - Ecologic Analytics and Landis+Gyr -- Suitors Decide to Tie the Knot... Read
9 Jan 12 - planetIDS.com - DNS impact on Informix / Impacto do DNS no Informix... Read
8 Sep 11 - TMCnet.com - IBM Offers Database Solution to Enable Smart Meter Data Capture... Read
1 Aug 11 - IBM Data Management Magazine - IIUG user view: Happy 10th anniversary to IBM and Informix... Read
8 Jul 11 - Database Trends and Applications - Managing Time Series Data with Informix... Read
31 May 11 - Smart Grid - The meter data management pitfall utilities are overlooking... Read
27 May 11 - IBM Data Management Magazine - IIUG user view: Big data, big time ( Series data, warehouse acceleration, and 4GLs )... Read
16 May 11 - Business Wire - HiT Software Announces DBMoto for Enterprise Integration, Adds Informix. Log-based Change Data Capture... Read
21 Mar 11 - Yahoo! Finance - IBM and Cable&Wireless Worldwide Announce UK Smart Energy Cloud... Read
14 Mar 11 - MarketWatch - Fuzzy Logix and IBM Unveil In-Database Analytics for IBM Informix... Read
11 Mar 11 - InvestorPlace - It's Time to Give IBM Props: How many tech stocks are up 53% since the dot-com boom?... Read
9 Mar 11 - DBTA - Database Administration and the Goal of Diminishing Downtime... Read
2 Feb 11 - DBTAs - Informix 11.7 Flexible Grid Provides a Different Way of Looking at Database Servers... Read
27 Jan 11 - exactsolutions - Exact to Add Informix Support to Database Replay, SQL Monitoring Solutions... Read
25 Jan 11 - PR Newswire - Bank of China in the UK Works With IBM to Become a Smarter, Greener Bank... Read
12 Oct 10 - Database Trends and Applications - Informix 11.7: The Beginning of the Next Decade of IBM Informix... Read
20 Sep 10 - planetIDS.com - ITG analyst paper: Cost/Benefit case for IBM Informix as compared to Microsoft SQL Server... Read
20 Jul 10 - IBM Announcements - IBM Informix Choice Edition V11.50 helps deploy low-cost scalable and reliable solutions for Apple Macintosh and Microsoft Windows... Read
20 Jul 10 - IBM Announcements - Software withdrawal: Elite Support for Informix Ultimate-C Edition... Read
24 May 10 - eWeek Europe - IBM Supplies Database Tech For EU Smart Grid... Read
23 May 10 - SiliconIndia - IBM's smart metering system allows wise use of energy... Read
21 May 10 - CNET - IBM to help people monitor energy use... Read
20 May 10 - ebiz - IBM Teams With Hildebrand To Bring Smart Metering To Homes Across Britain... Read
19 May 10 - The New Blog Times - Misurare il consumo energetico: DEHEMS è pronto... Read
19 May 10 - ZDNet - IBM software in your home? Pact enables five-city smart meter pilot in Europe... Read
17 March 10 - ZDNet (blog) David Morgenstern - TCO: New research finds Macs in the enterprise easier, cheaper to manage than... Read
17 March 2010 - Virtualization Review - ...key components of Big Blue's platform to the commercial cloud such as its WebSphere suite of application ser vers and its DB2 and Informix databases... Read
10 February 2010 - The Wall Street Journal - International Business Machines is expanding an initiative to win over students and professors on its products. How do they lure the college crowd?... Read


End of Support Dates

IIUG on Facebook IIUG on Twitter

[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

IDS Forum

RE: Deleting duplicate rows from a table.

Posted By: Jonathan Leffler
Date: Friday, 28 January 2005, at 2:40 p.m.

In Response To: RE: Deleting duplicate rows from a table. (manoj wadhwa )

Various comments - not all helpful.

First off, when the HPL failed, why on earth didn't you undo the previous
broken load before completing the reload? This would have avoided the
problem.

Secondly, are you sure it isn't the quickest way to deal with the problem
- drop the existing data and reload empty tables from good data?

Next - are your tables fragmented by round robin or by expression?

If you've got round robin, you've made life very difficult. It isn't much
easier with expressions, but at least you can divide your data into 24
smaller subsets that are processed in turn - you know that whenever
duplicates exist, they are found in the same fragment.

You've presumably got fairly significant numbers of duplicate rows - you
did a whole lot of loading, then had to add a bunch more space, etc.

What are your primary keys? I'm guessing you don't have indexes in place
because unique indexes would have prevented the chaos.

Identifying the rows with duplicates will require a statement such as:

SELECT COUNT(*), a, b, c, ..., z
FROM YourTable
HAVING COUNT(*) > 1;

I'd probably arrange to have enough space available to store these into a
temp table. Then you've got the DUP rows, and one copy of the dups in the
temp table, so you can do a delete. That may be tricky too, but would
involve selecting rows from the main table that match the values in the
temp table:

DELETE FROM YourTable
WHERE EXISTS (SELECT * FROM TempTable
WHERE YourTable.A = TempTable.A
AND YourTable.B = TempTable.B
AND ...
AND YourTable.Z = TempTable.Z);

You can optimize the condition to include just the 'should be primary key'
columns.

This gives you a table with no instances of the duplicates at all.

You can now transfer everything except the count column from the temp
table back into the main table.

With expression fragmentation, you can apply the fragment filters and deal
with small portions of the table at once. With round robin fragmentation,
you probably have to deal with everything. Don't forget to create an
index on the temp table and to run update statistics on it. You might
want to do an update statistics on the big table too - at least LOW. I
hope you've got lots of logical log space.

Are you sure this is going to be quicker than reloading the data?

--
Jonathan Leffler (jleffler@us.ibm.com)
STSM, Informix Database Engineering, IBM Information Management Division
4100 Bohannon Drive, Menlo Park, CA 94025
Tel: +1 650-926-6921 Tie-Line: 630-6921
"I don't suffer from insanity; I enjoy every minute of it!"



forum.subscriber@iiug.org wrote on 01/28/2005 04:13:15 AM:
> Thanks alot for the suggestions.
> Three major problems that i faced is :
> a) Creation of another table and loading that with
> unique rows from first table would require me to
> allocate another 150 GB of diskspace which i dont'
> have.
> 2) Making self join for a table containing 350 million
> records, with no meaningfull index present, will take
> lot of time.
> 3) Since the table is fragmented (24 frags), rowid
> solutions also don't work.
>
> currently i'm planning to drop the table and recreate
> the indexes. Would appreciate any other
> solution/suggestions.
>
> Thanks alot,
> Manoj
> --- Bob Allan <allan_bob@hotmail.com> wrote:
>
> > Unload the table selecting using the select unique *
> > from table-a
> > drop table
> > recreate table
> > reload info.
> >
> > If you cannot do that then create another table with
> > the same field
> > definition and insert into that table selecting
> > unique from the origirnal
> > table.
> > Then drop original table and rename new table to old
> > table
> >
> >
> > >From: "manoj wadhwa " <itm_manoj@yahoo.com>
> > >To: ids@iiug.org
> > >Subject: Deleting duplicate rows from a table.
> > [4097] Date: Tue, 25 Jan
> > >2005 04:17:26 -0500 (EST)
> > >
> > >I have got a big table containing about 350 million
> > >records. There are some duplicate records in the
> > >table (around 80,000). Please suggest some way to
> > >delete the duplicate records keeping one copy of them.
> > >
> > >
> > >Background info : We loaded the data using HPL.
> > >Because of space crunch, the HPL stopped
> > >in between.
> > >After adding extra chunks, when we restarted it, it
> > >loaded the same unl files again which are causing
> > >the trouble.


Messages In This Thread

[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

IDS Forum is maintained by Administrator with WebBBS 5.12.