Hello John & All,
Thanks for your findings and suggestions:
We have increase the memory allocated to informix by 20%
VP_MEMORY_CACHE_KB 100000
FASTPOLL 1
The cpuvps are now spining equally,
I have not yet changed the NETTYPE as you have suggested but planning to do the same.
"onstat -g glo"
IBM Informix Dynamic Server Version 11.50.FC8W2 -- On-Line (Prim) -- Up 21:20:38 -- 48491748 Kbytes
MT global info:
sessions threads vps lngspins
150 396 31 476042
sched calls thread switches yield 0 yield n yield forever
total: 4271890802 3158684977 1602166661 88514249 1111429488
per sec: 19111 17171 3100 127 6927
Virtual processor summary:
class vps usercpu syscpu total
cpu 12 212858.52 48052.73 260911.25
aio 2 15.00 9.15 24.15
lio 1 0.41 1.34 1.75
pio 1 0.41 1.42 1.83
adm 1 24.10 9.17 33.27
soc 12 6184.48 27931.41 34115.89
msc 1 11.83 36.24 48.07
fifo 1 0.44 1.41 1.85
total 31 219095.19 76042.87 295138.06
Individual virtual processors:
vp pid class usercpu syscpu total Thread Eff
1 26636 cpu 16223.04 3613.99 19837.03 73673.50 26%
2 26660 adm 24.10 9.17 33.27 0.00 0%
3 26661 cpu 16310.08 3584.71 19894.79 73769.35 26%
4 26662 cpu 16277.52 3550.26 19827.78 73455.25 26%
5 26663 cpu 16295.98 3564.09 19860.07 73469.38 27%
6 26664 cpu 16185.37 3574.83 19760.20 73392.84 26%
7 26666 cpu 16092.24 3529.55 19621.79 73240.15 26%
8 26667 cpu 16109.05 3542.62 19651.67 73135.20 26%
9 26668 cpu 16208.22 3580.38 19788.60 73038.28 27%
10 26669 cpu 16212.33 3570.44 19782.77 72821.30 27%
11 26670 cpu 16196.87 3572.45 19769.32 72615.25 27%
12 26671 cpu 25524.45 6176.27 31700.72 73032.27 43%
13 26672 cpu 25223.37 6193.14 31416.51 73003.89 43%
14 26673 lio 0.41 1.34 1.75 1.75 100%
15 26675 pio 0.41 1.42 1.83 1.83 100%
16 26676 aio 14.39 7.66 22.05 45.84 48%
17 26682 msc 11.83 36.24 48.07 362.77 13%
18 26684 fifo 0.44 1.41 1.85 3.50 52%
19 26685 aio 0.61 1.49 2.10 14.80 14%
20 26692 soc 441.02 1918.77 2359.79 NA NA
21 26693 soc 477.24 2101.20 2578.44 NA NA
22 26694 soc 568.53 2577.46 3145.99 NA NA
23 26695 soc 581.32 2532.65 3113.97 NA NA
24 26696 soc 503.46 2369.91 2873.37 NA NA
25 26697 soc 549.97 2474.70 3024.67 NA NA
26 26698 soc 410.71 1973.43 2384.14 NA NA
27 26699 soc 434.09 2004.05 2438.14 NA NA
28 26700 soc 577.60 2605.60 3183.20 NA NA
29 26701 soc 505.45 2275.32 2780.77 NA NA
30 26702 soc 521.36 2381.52 2902.88 NA NA
31 26703 soc 613.73 2716.80 3330.53 NA NA
tot 219095.19 76042.87 295138.06
I'm not getting any output for a long time on "onstat -g spi | sort -nr | head -40", will try that again today.
"onstat -p" out put: looks bad
Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
504311229 10016348623 22052667526 97.71 76619086 215015214 440272465 82.61
isamtot open start read write rewrite delete commit rollbk
22489503332 2285255193 526590408 9733788648 82470747 6907611 30739767 6002097 7590
gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs
0 0 0 0 0 0 0
ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes
0 0 0 220062.42 76363.97 273 684657
bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans
80458879 16178 4630432460 0 0 37612 9310904 137866205
ixda-RA idx-RA da-RA RA-pgsused lchwaits
139265452 31704218 8709641 179649528 7284251
sequential scans are killing
Using the sysmaster query in below threads I am able to see the big tables with high no of sequential scans however when I manully run (with explain on)the queries against these tables I see a good "INDEX PATH" and NOT sequential or index sequential.
Identified some tables with high no of extents (consdering to reorg) with the following query:
SELECT st.dbsname, st.tabname, COUNT(*) as num_exts, st2.tabname as parent_table
FROM systabnames st, sysptnext spn, sysptnhdr sp, systabnames st2
WHERE st.partnum = spn.pe_partnum
AND spn.pe_partnum = sp.partnum
AND sp.lockid = st2.partnum
GROUP BY st.dbsname, st.tabname, st2.tabname
HAVING COUNT(*) > 32 -- It.s not magic, but need a start, so
ORDER BY num_exts DESC;
Just want to check if above query holds good for tables with paritioning in different dbspaces? (I think every partition in every dbspace will have an extent)
also checking with unix admin for periodic iostat on our dics.
Regards,
Vikas
***********************************************************************
Just a few thoughts, the answer to your question is here:
1. You are on HPUX which has some unique property. First is its abilit=
y to
downgrade a
process if it deems a process is consuming to much cpu cycles. Since
Informix has one process
for many many users the Informix cpu vps are subject to this. While no=
age
does help I often find
it does not do enough. I would suggest moving the cpu vps into the re=
al
time priority. You can
accomplish this by using the rtprio command
rtsched -s SCHED_RTPRIO -p 127 -P {PID of CPU VPS}
I have see this help many customers on HP. It can be done while the
informix system is
running. It helps with system not utilizing their entire cpu and also =
with
contention.
2. First HPUX historically has not done well with large number of shar=
ed
memory segments. While
they have made great improvements,I still would recommend minimizing th=
e
number of shared
memory segments. One simple way to accomplish this is to re-configure=
your NETTYPE.
Currently your NETTYPE ipcshm,10,30,CPU. Each shared memory poll threa=
d
required 1
shared memory segment. This mean with this configuration paramater alo=
ng
you require 10
shared memory segments. I would suggest NETTYPE ipcshm,2,200,CPU
Now I do not see an onstat -g glo in all the different emails, but the=
onstat -p provide does
have the user and system cpu times. These do not look good.
usercpu 99930.89 system cpu 49477.95 This is almost a 2 to 1 ratio
generally not want
you want to see. There could be a few reason, latching and network is=
the
most probably
and these two are related.
3) Optimizinig network settings. Please make sure you have the onconf=
ig
paramater FASTPOLL
enabled (or ON). Second NETTYPE soctcp,12,150,NET. I would change thi=
s to
the following
NETTYPE soctcp 4,450,CPU. The FASTPOLL drastically improves the
efficiency of the poll thread
while you do want some parallelism, HP historically has not done well w=
hen
many process calling
poll()/select() system call in parallel. In addition, I always like
running my poll threads on cpu vp.
Now you asked the question as to why my first 10 cpu vp have NO busy wa=
its.
This is do to the
optimization or running your poll threads on the cpu vps. Instead of
spinning wasting cpu cycles
looking for threads to schedule, the cpu vps do a quick check to see if=
there is pending network
messages.
4) I would make sure that you have the VP_MEMORY_CACHE_KB set to a cou=
ple
of MB.
Lastly, if you could supply a few items it would help look at your syst=
em
more.
1. ontat -g glo (help check the system and user time)
2. onstat -g spi | sort -nr | head -40
John F. Miller III
STSM, Embedability Architect
miller3@us.ibm.com
503-578-5645
IBM Informix Dynamic Server (IDS)
ids-bounces@iiug.org wrote on 08/03/2011 10:17:17 PM:
> From: "VIKAS HIVARKAR" <vikas.hivarkar@gmail.com>
> To: ids@iiug.org
> Date: 08/03/2011 10:18 PM
> Subject: Re: Performance Issues [24533]
> Sent by: ids-bounces@iiug.org
>
> Hello All,
>
> "onstat -g sch" (We now have the vpuvps =3D cpu cores ) but
>
> VP Scheduler Statistics:
> vp pid class semops busy waits spins/wait
> 1 20655 cpu 0 0 0
> 2 20795 adm 0 0 0
> 3 20796 cpu 0 0 0
> 4 20797 cpu 0 0 0
> 5 20798 cpu 0 0 0
> 6 20799 cpu 0 0 0
> 7 20800 cpu 0 0 0
> 8 20801 cpu 0 0 0
> 9 20802 cpu 0 0 0
> 10 20803 cpu 0 0 0
> 11 20804 cpu 0 0 0
> 12 20809 cpu 33767045 34198356 997
> 13 20810 cpu 32838062 33260684 997
> 14 20812 lio 0 0 0
> 15 20813 pio 0 0 0
> 16 20814 aio 794912 0 0
> 17 20815 msc 1219209 0 0
> 18 20816 fifo 0 0 0
> 19 20817 aio 19127 0 0
> 20 20818 soc 1704 14257 698
> 21 20819 soc 1213 6589 568
> 22 20820 soc 1450 6469 679
> 23 20821 soc 1166 6740 674
> 24 20822 soc 1290 6546 631
> 25 20823 soc 883 3925 671
> 26 20824 soc 1701 8528 660
> 27 20825 soc 2223 11920 607
> 28 20826 soc 1484 6617 642
> 29 20827 soc 1572 8573 649
> 30 20828 soc 10311 14680 845
> 31 20829 soc 10929 24908 821
>
> I do not understand why most of the initial cpuvps not spining or doi=
ng
any
> work?
>
> We will be adding more buffers this weekend and currently looking at =
the
hot
> spots on the disc.
>
> Regards,
>
>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
> Hello All,
>
> Yes was have an HDR and SDS (primary + 1node)
>
> Database running with UNBUFFERED log mode
>
> increased the no of cores to 12, now suits 12 cpuvps and considering =
to
> increase the buffers too
>
> "onstat -l" output:
>
> Physical Logging
> Buffer bufused bufsize numpages numwrits pages/io
> P-2 72 512 15225310 104294 145.98
>
> phybegin physize phypos phyused %used
>
> 62:53 19995000 3137280 582604 2.91
>
> Logical Logging
> Buffer bufused bufsize numrecs numpages numwrits recs/pages pages/io
> L-1 0 512 69423775 2990157 719686 23.2 4.2
>
> Subsystem numrecs Log Space used
>
> OLDRSAM 69391925 5290671676
>
> HA 31850 1410892
>
> Buffer Waiting
> Buffer ioproc flags
> L-1 0 0x1 0
> L-3 0 0x1 0
>
> I will be checking for seqential scans on big tables with the below q=
uery
> through the day:
>
> "onstat -g dic"
> I am trying to check the tables with high refcnt, if that is what I
should be
> checking?
>
> Dictionary Cache: Number of lists: 503, Maximum list size: 10
> Total number of dictionary entries: 158
>
> Any more checks or suggesstions?
>
> Regards,
> Vikas
>
>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
> Are your main databases using BUFFERED LOG? That might explain the
> contention for the logical log buffers. The BTR indicates that you co=
uld
> use a significant increase in the number of buffers. Also over 9% of =
your
> queries seem to include a sequential scan [(seqscans / (isam starts))=
*
> 100]. See if those are being performed against large tables.
>
> Art
>
>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
> 1) Are you using HDR?
>
> 2) Are you doing unbuffered logging? What is the output of onstat -l?=
> Just because you have a large log buffer does not mean that you are g=
oi=3D
> ng
> to use the full buffer. That would be the case with unbuffered loggin=
g=3D
> ..
>
> 3) I'm not sure that I would be setting the number of cpuvps greater =
t=3D
> han
> the number of cores. That can cause lock inversion.
>
> 4) To find the sequential scans...
>
> select dbsname, tabname , pf_seqscans , npdata, npused
> from sysptntab, systabnames, sysptnhdr
> where pf_seqscans > 0
> and sysptnhdr.npdata > 16
> and sysptnhdr.npused > 16
> and sysptnhdr.partnum =3D3D systabnames.partnum =3D
>
> and systabnames.partnum =3D3D sysptntab.partnum =3D
>
> order by pf_seqscans DESC
>
> Also - what is the output of onstat -g dic.
>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
> Do you use HR?
> Can you paste the top 20 lines of "onstat -l"?
>
>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
> =3D
>
> From: "VIKAS HIVARKAR" <vikas.hivarkar@gmail.com> =3D
>
> =3D
>
> To: ids@iiug.org =3D
>
> =3D
>
> Date: 07/30/2011 10:38 AM =3D
>
> =3D
>
> Subject: Performance Issues [24479] =3D
>
> =3D
>
> Sent by: ids-bounces@iiug.org =3D
>
> =3D
>
> Hello All,
>
> IDS 11.50.FC8W2 on HP-UX 11.23, 8 cores
>
> Facing performance issue and hence drilling through the following
>
> NETTYPE ipcshm,10,30,CPU
> NETTYPE soctcp,12,150,NET
> VPCLASS cpu,num=3D3D10,noage
>
> "onstat -g sch" shows all the cpuvps spinning equally & 4-5 busy wait=
s =3D
> for
> all
> cpuvps except the first one having 194 busy waits
>
> BUFFERPOOL
> size=3D3D2K,buffers=3D3D6000000,lrus=3D3D512,lru_min_dirty=3D3D50.000=
000,lru_ma=3D
> x_dirty=3D3D60.000000
>
> BUFFERPOOL
> size=3D3D4K,buffers=3D3D2500000,lrus=3D3D512,lru_min_dirty=3D3D50.000=
000,lru_ma=3D
> x_dirty=3D3D60.000000
>
> BUFFERPOOL
> size=3D3D16K,buffers=3D3D600000,lrus=3D3D512,lru_min_dirty=3D3D50.000=
000,lru_ma=3D
> x_dirty=3D3D60.000000
>
> "onstat -u" I see lot many sessions waiting for Log buffer G-BPX--
> and some sessions doing rollbacks frequently :(
>
> We do have good amount of log buffer set LOGBUFF 1024 and huge amount=
o=3D
> f
> logical logs
>
> Some calculations:
> BR: 0.3456
> RAU: 99.94
> BTR: 61.00/hrs, 146/hrs, 610/hrs (calculated for 3 bufferpools differ=
en=3D
> tly)
>
> BTR: 40/hrs (combined for all 3 buffers)
>
> Also pasting the output from "onstat -p"
> Profile
> dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached=
> 178371295 5841005690 6876949471 97.41 29669851 123813498 290614611 89=
.7=3D
> 9
>
> isamtot open start read write rewrite delete commit rollbk
> 7040485791 950978742 312005761 921931088 75467491 7475945 9111695 437=
16=3D
> 88
> 4869
>
> gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs
> 0 0 0 0 0 0 0
>
> ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes
> 0 0 0 99930.89 49477.95 179 460935
>
> bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans=
> 21899422 44236 1696677622 0 0 12779 4298263 30103468
>
> ixda-RA idx-RA da-RA RA-pgsused lchwaits
> 39575656 5892 179906 39742052 9803879
>
> Any suggessions to check why the sessions are waiting so much for buf=
fe=3D
> rs
> and
> logbufers?
>
> Application check:
> The below formula shows atleast 20% of the sql's doing sequential sca=
n,=3D
>
> calculated as
> SSR =3D3D (seqscans / isam starts) * 100.00
> How do I capture or find these SQL's doing sequential scan and or mis=
si=3D
> ng
> indexs
> How do I find out aged out Indexes if any
>
>
>
***********************************************************************