Skip to content

atmel_serial tweaks/systemd compability#4

Closed
marekr wants to merge 3 commits intomasterfrom
unknown repository
Closed

atmel_serial tweaks/systemd compability#4
marekr wants to merge 3 commits intomasterfrom
unknown repository

Conversation

@marekr
Copy link

@marekr marekr commented Dec 20, 2013

This pull request helps fix some of the issues when running the kernel with "systemd" as the init manager. It appears that atmel_serial cannot handle it's behavior without race conditions causing kernel panics on boot.

The source of grief seems to be systemd will cause the driver to fire the startup and shutdown uart ops constantly for every single line it prints, i.e.

ATMEL_STARTUP
[OK] Some Service
ATMEL_SHUTDOWN
ATMEL_STARTUP
[OK] Another Service
ATMEL_SHUTDOWN

Thus an kernel panic tends to occur anytime its printing its services list, no particular service message causes it to fail.

Once systemd finishes starting services, the panics won't occur, one last STARTUP is called when a program like getty launches to take over the port but no spam of the STARTUP/SHUTDOWN funcs afterwards.

There are 2 panics that can occur, one is interrupts occuring in atmel_shutdown after the buffers were destroyed but before the interrupts were fully disabled:

[<c02092b0>] (atmel_tasklet_func+0x514/0x814) from [<c001fd34>] (tasklet_action+0x70/0xa8)
[<c001fd34>] (tasklet_action+0x70/0xa8) from [<c001f60c>] (__do_softirq+0x90/0x144)
[<c001f60c>] (__do_softirq+0x90/0x144) from [<c001fa18>] (irq_exit+0x40/0x4c)
[<c001fa18>] (irq_exit+0x40/0x4c) from [<c000e298>] (handle_IRQ+0x64/0x84)
[<c000e298>] (handle_IRQ+0x64/0x84) from [<c000d6c0>] (__irq_svc+0x40/0x50)
[<c000d6c0>] (__irq_svc+0x40/0x50) from [<c0208060>] (atmel_rx_dma_release+0x88/0xb8)
[<c0208060>] (atmel_rx_dma_release+0x88/0xb8) from [<c0209740>] (atmel_shutdown+0x104/0x160)
[<c0209740>] (atmel_shutdown+0x104/0x160) from [<c0205e8c>] (uart_port_shutdown+0x2c/0x38)
[<c0205e8c>] (uart_port_shutdown+0x2c/0x38) from [<c0206018>] (uart_shutdown+0x9c/0xd4)
[<c0206018>] (uart_shutdown+0x9c/0xd4) from [<c0206c28>] (uart_close+0x7c/0x174)
[<c0206c28>] (uart_close+0x7c/0x174) from [<c01f1488>] (tty_release+0x154/0x428)
[<c01f1488>] (tty_release+0x154/0x428) from [<c007a49c>] (__fput+0xe4/0x1e0)
[<c007a49c>] (__fput+0xe4/0x1e0) from [<c002e374>] (task_work_run+0x48/0x5c)
[<c002e374>] (task_work_run+0x48/0x5c) from [<c000fd2c>] (do_work_pending+0x90/0xa4)
[<c000fd2c>] (do_work_pending+0x90/0xa4) from [<c000da40>] (work_pending+0xc/0x20)

This is solved by simply killing all interrupts at the start of _shutdown and also calling tasklet_kill to make sure nothing is scheduled that could run later. This is commit a368fd0

And the second which has been difficult to track down and driven me insane:

    [<c01efd84>] (tty_wakeup+0x8/0x58) from [<c0208fc0>] (atmel_tasklet_func+0x238/0x80c)
    [<c0208fc0>] (atmel_tasklet_func+0x238/0x80c) from [<c001fd34>] (tasklet_action+0x70/0xa8)
    [<c001fd34>] (tasklet_action+0x70/0xa8) from [<c001f60c>] (__do_softirq+0x90/0x144)
    [<c001f60c>] (__do_softirq+0x90/0x144) from [<c001f728>] (run_ksoftirqd+0x68/0x104)
    [<c001f728>] (run_ksoftirqd+0x68/0x104) from [<c0030870>] (kthread+0x80/0x90)
    [<c0030870>] (kthread+0x80/0x90) from [<c000e348>] (kernel_thread_exit+0x0/0x8)

For some reason atmel_tasklet_func gets scheduled when tty/uart is closed and the tasklet isn't killed. I haven't found what code path could cause this. Unforunately the solution to the second panic is just to return on NULL tty at the start of the tasklet in 103855e. If there's any better input on this, that would be appreciated. Atmel's support solution was the same thing which was to NULL check in the tasklet_func.

Lastly, there is a conflict possible(but I haven't seen) in the atmel_serial_remove function where the tasklet_kill was called after uart_remove_one_port, if a tasklet was scheduled, then it would fire on the kill and proceed to kernel panic on null dereference because the tty/port references were nulled by the first function. Flipping the order of functions should be safer. This fix is in a575aae

Interrupts were being cleaned up late in the shutdown handler, it is possible that an interrupt can occur and schedule a tasklet that runs after the port is cleaned up. A null dereference occured due to this race condition occuring with the following stacktrace:

[<c02092b0>] (atmel_tasklet_func+0x514/0x814) from [<c001fd34>] (tasklet_action+0x70/0xa8)
[<c001fd34>] (tasklet_action+0x70/0xa8) from [<c001f60c>] (__do_softirq+0x90/0x144)
[<c001f60c>] (__do_softirq+0x90/0x144) from [<c001fa18>] (irq_exit+0x40/0x4c)
[<c001fa18>] (irq_exit+0x40/0x4c) from [<c000e298>] (handle_IRQ+0x64/0x84)
[<c000e298>] (handle_IRQ+0x64/0x84) from [<c000d6c0>] (__irq_svc+0x40/0x50)
[<c000d6c0>] (__irq_svc+0x40/0x50) from [<c0208060>] (atmel_rx_dma_release+0x88/0xb8)
[<c0208060>] (atmel_rx_dma_release+0x88/0xb8) from [<c0209740>] (atmel_shutdown+0x104/0x160)
[<c0209740>] (atmel_shutdown+0x104/0x160) from [<c0205e8c>] (uart_port_shutdown+0x2c/0x38)

Signed-off-by: Marek Roszko <mark.roszko@gmail.com>
The _remove callback could be called when a tasklet is scheduled. tasklet_kill was called inside the function in order to free up any scheduled tasklets. However it was called after uart_remove_one_port which destroys  tty references needed in the port for atmel_tasklet_func. Simply putting the tasklet_kill at the start of the function will prevent this  conflict.

Signed-off-by: Marek Roszko <mark.roszko@gmail.com>
The uart timer when it fires will schedule a tasklet. It is possible and has been observed that it can fire inside _shutdown before it is killed in the dma and pdc cleanup.
This is a somewhat lucky condition to hit but its possible if a program spams open/close on the port that the condition can be met, in particular systemd boot messages were causing the panic from this behavior.
This causes a tasklet that exists after the port is shutdown, so when the kernel finally executes it, it panics as the tty port is NULL
Moving the timer deletion to the beginning of the function stops a tasklet from being scheduled unexpectedly.

Signed-off-by: Marek Roszko <mark.roszko@gmail.com>
@marekr
Copy link
Author

marekr commented Jan 8, 2014

I removed the null dereference commit.

Added serial: at91: disable uart timer at start of shutdown which actually fixes the issue rather than avoids it. Basically with enough luck the uart timer could fire before it was killed in _shutdown and so a tasklet got scheduled, after that the kernel was guranteed to panic.

So this new patch combined with "Handle shutdown more safely" which adds a tasklet_kill should take care of any tasklets popping up

@marekr marekr closed this Jan 21, 2014
@marekr marekr deleted the serial-fixes branch January 21, 2014 01:10
noglitch pushed a commit that referenced this pull request May 13, 2014
(This is a VERY IMP change for low level interrupt/exception handling)

-----------------------------------------------------------------------
WHAT
-----------------------------------------------------------------------
* User 25 now saved in pt_regs->user_r25 (vs. tsk->thread_info.user_r25)

* This allows Low level interrupt code to unconditionally save r25
  (vs. the prev version which would only do it for U->K transition).
  Ofcourse for nested interrupts, only the pt_regs->user_r25 of
  bottom-most frame is useful.

* simplifies the interrupt prologue/epilogue

* Needed for ARCv2 ISA code and done here to keep design similar with
  ARCompact event handling

-----------------------------------------------------------------------
WHY
-------------------------------------------------------------------------
With CONFIG_ARC_CURR_IN_REG, r25 is used to cache "current" task pointer
in kernel mode. So when entering kernel mode from User Mode
- user r25 is specially safe-kept (it being a callee reg is NOT part of
  pt_regs which are saved by default on each interrupt/trap/exception)
- r25 loaded with current task pointer.

Further, if interrupt was taken in kernel mode, this is skipped since we
know that r25 already has valid "current" pointer.

With 2 level of interrupts in ARCompact ISA, detecting this is difficult
but still possible, since we could be in kernel mode but r25 not already saved
(in fact the stack itself might not have been switched).

A. User mode
B. L1 IRQ taken
C. L2 IRQ taken (while on 1st line of L1 ISR)

So in #C, although in kernel mode, r25 not saved (infact SP not
switched at all)

Given that ARcompact has manual stack switching, we could use a bit of
trickey - The low level code would make sure that SP is only set to kernel
mode value at the very end (after saving r25). So a non kernel mode SP,
even if in kernel mode, meant r25 was NOT saved.

The same paradigm won't work in ARCv2 ISA since SP is auto-switched so
it's setting can't be delayed/constrained.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
…/kernel/git/vgupta/arc

Pull first batch of ARC changes from Vineet Gupta:
 "There's a second bunch to follow next week - which depends on commits
  on other trees (irq/net).  I'd have preferred the accompanying ARC
  change via respective trees, but it didn't workout somehow.

  Highlights of changes:

   - Continuation of ARC MM changes from 3.10 including

       zero page optimization
       Setting pagecache pages dirty by default
       Non executable stack by default
       Reducing dcache flushes for aliasing VIPT config

   - Long overdue rework of pt_regs machinery - removing the unused word
     gutters and adding ECR register to baseline (helps cleanup lot of
     low level code)

   - Support for ARC gcc 4.8

   - Few other preventive fixes, cosmetics, usage of Kconfig helper..

  The diffstat is larger than normal primarily because of arcregs.h
  header split as well as beautification of macros in entry.h"

* tag 'arc-v3.11-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (32 commits)
  ARC: warn on improper stack unwind FDE entries
  arc: delete __cpuinit usage from all arc files
  ARC: [tlb-miss] Fix bug with CONFIG_ARC_DBG_TLB_MISS_COUNT
  ARC: [tlb-miss] Extraneous PTE bit testing/setting
  ARC: Adjustments for gcc 4.8
  ARC: Setup Vector Table Base in early boot
  ARC: Remove explicit passing around of ECR
  ARC: pt_regs update #5: Use real ECR for pt_regs->event vs. synth values
  ARC: stop using pt_regs->orig_r8
  ARC: pt_regs update #4: r25 saved/restored unconditionally
  ARC: K/U SP saved from one location in stack switching macro
  ARC: Entry Handler tweaks: Simplify branch for in-kernel preemption
  ARC: Entry Handler tweaks: Avoid hardcoded LIMMS for ECR values
  ARC: Increase readability of entry handlers
  ARC: pt_regs update #3: Remove unused gutter at start of callee_regs
  ARC: pt_regs update #2: Remove unused gutter at start of pt_regs
  ARC: pt_regs update #1: Align pt_regs end with end of kernel stack page
  ARC: pt_regs update #0: remove kernel stack canary
  ARC: [mm] Remove @Write argument to do_page_fault()
  ARC: [mm] Make stack/heap Non-executable by default
  ...
noglitch pushed a commit that referenced this pull request May 13, 2014
Jiri managed to trigger this warning:

 [] ======================================================
 [] [ INFO: possible circular locking dependency detected ]
 [] 3.10.0+ #228 Tainted: G        W
 [] -------------------------------------------------------
 [] p/6613 is trying to acquire lock:
 []  (rcu_node_0){..-...}, at: [<ffffffff810ca797>] rcu_read_unlock_special+0xa7/0x250
 []
 [] but task is already holding lock:
 []  (&ctx->lock){-.-...}, at: [<ffffffff810f2879>] perf_lock_task_context+0xd9/0x2c0
 []
 [] which lock already depends on the new lock.
 []
 [] the existing dependency chain (in reverse order) is:
 []
 [] -> #4 (&ctx->lock){-.-...}:
 [] -> #3 (&rq->lock){-.-.-.}:
 [] -> #2 (&p->pi_lock){-.-.-.}:
 [] -> #1 (&rnp->nocb_gp_wq[1]){......}:
 [] -> #0 (rcu_node_0){..-...}:

Paul was quick to explain that due to preemptible RCU we cannot call
rcu_read_unlock() while holding scheduler (or nested) locks when part
of the read side critical section was preemptible.

Therefore solve it by making the entire RCU read side non-preemptible.

Also pull out the retry from under the non-preempt to play nice with RT.

Reported-by: Jiri Olsa <jolsa@redhat.com>
Helped-out-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
commit 2f7021a "cpufreq: protect 'policy->cpus' from offlining
during __gov_queue_work()" caused a regression in CPU hotplug,
because it lead to a deadlock between cpufreq governor worker thread
and the CPU hotplug writer task.

Lockdep splat corresponding to this deadlock is shown below:

[   60.277396] ======================================================
[   60.277400] [ INFO: possible circular locking dependency detected ]
[   60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744 Not tainted
[   60.277411] -------------------------------------------------------
[   60.277417] bash/2225 is trying to acquire lock:
[   60.277422]  ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280
[   60.277444] but task is already holding lock:
[   60.277449]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
[   60.277465] which lock already depends on the new lock.

[   60.277472] the existing dependency chain (in reverse order) is:
[   60.277477] -> #2 (cpu_hotplug.lock){+.+.+.}:
[   60.277490]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.277503]        [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410
[   60.277514]        [<ffffffff81042cbc>] get_online_cpus+0x3c/0x60
[   60.277522]        [<ffffffff814b842a>] gov_queue_work+0x2a/0xb0
[   60.277532]        [<ffffffff814b7891>] cs_dbs_timer+0xc1/0xe0
[   60.277543]        [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0
[   60.277552]        [<ffffffff81063d31>] worker_thread+0x121/0x3a0
[   60.277560]        [<ffffffff8106ae2b>] kthread+0xdb/0xe0
[   60.277569]        [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0
[   60.277580] -> #1 (&j_cdbs->timer_mutex){+.+...}:
[   60.277592]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.277600]        [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410
[   60.277608]        [<ffffffff814b785d>] cs_dbs_timer+0x8d/0xe0
[   60.277616]        [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0
[   60.277624]        [<ffffffff81063d31>] worker_thread+0x121/0x3a0
[   60.277633]        [<ffffffff8106ae2b>] kthread+0xdb/0xe0
[   60.277640]        [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0
[   60.277649] -> #0 ((&(&j_cdbs->work)->work)){+.+...}:
[   60.277661]        [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30
[   60.277669]        [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.277677]        [<ffffffff810621ed>] flush_work+0x3d/0x280
[   60.277685]        [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120
[   60.277693]        [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20
[   60.277701]        [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0
[   60.277709]        [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20
[   60.277719]        [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100
[   60.277728]        [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0
[   60.277737]        [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c
[   60.277747]        [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110
[   60.277759]        [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10
[   60.277768]        [<ffffffff815a0a68>] _cpu_down+0x88/0x330
[   60.277779]        [<ffffffff815a0d46>] cpu_down+0x36/0x50
[   60.277788]        [<ffffffff815a2748>] store_online+0x98/0xd0
[   60.277796]        [<ffffffff81452a28>] dev_attr_store+0x18/0x30
[   60.277806]        [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150
[   60.277818]        [<ffffffff8116806d>] vfs_write+0xbd/0x1f0
[   60.277826]        [<ffffffff811686fc>] SyS_write+0x4c/0xa0
[   60.277834]        [<ffffffff815bbbbe>] tracesys+0xd0/0xd5
[   60.277842] other info that might help us debug this:

[   60.277848] Chain exists of:
  (&(&j_cdbs->work)->work) --> &j_cdbs->timer_mutex --> cpu_hotplug.lock

[   60.277864]  Possible unsafe locking scenario:

[   60.277869]        CPU0                    CPU1
[   60.277873]        ----                    ----
[   60.277877]   lock(cpu_hotplug.lock);
[   60.277885]                                lock(&j_cdbs->timer_mutex);
[   60.277892]                                lock(cpu_hotplug.lock);
[   60.277900]   lock((&(&j_cdbs->work)->work));
[   60.277907]  *** DEADLOCK ***

[   60.277915] 6 locks held by bash/2225:
[   60.277919]  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff81168173>] vfs_write+0x1c3/0x1f0
[   60.277937]  #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811d9e3c>] sysfs_write_file+0x3c/0x150
[   60.277954]  #2:  (s_active#61){.+.+.+}, at: [<ffffffff811d9ec3>] sysfs_write_file+0xc3/0x150
[   60.277972]  #3:  (x86_cpu_hotplug_driver_mutex){+.+...}, at: [<ffffffff81024cf7>] cpu_hotplug_driver_lock+0x17/0x20
[   60.277990]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff815a0d32>] cpu_down+0x22/0x50
[   60.278007]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60
[   60.278023] stack backtrace:
[   60.278031] CPU: 3 PID: 2225 Comm: bash Not tainted 3.10.0-rc7-dbg-01385-g241fd04-dirty #1744
[   60.278037] Hardware name: Acer             Aspire 5741G    /Aspire 5741G    , BIOS V1.20 02/08/2011
[   60.278042]  ffffffff8204e110 ffff88014df6b9f8 ffffffff815b3d90 ffff88014df6ba38
[   60.278055]  ffffffff815b0a8d ffff880150ed3f60 ffff880150ed4770 3871c4002c8980b2
[   60.278068]  ffff880150ed4748 ffff880150ed4770 ffff880150ed3f60 ffff88014df6bb00
[   60.278081] Call Trace:
[   60.278091]  [<ffffffff815b3d90>] dump_stack+0x19/0x1b
[   60.278101]  [<ffffffff815b0a8d>] print_circular_bug+0x2b6/0x2c5
[   60.278111]  [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30
[   60.278123]  [<ffffffff81067e08>] ? __kernel_text_address+0x58/0x80
[   60.278134]  [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200
[   60.278142]  [<ffffffff810621b5>] ? flush_work+0x5/0x280
[   60.278151]  [<ffffffff810621ed>] flush_work+0x3d/0x280
[   60.278159]  [<ffffffff810621b5>] ? flush_work+0x5/0x280
[   60.278169]  [<ffffffff810a9b14>] ? mark_held_locks+0x94/0x140
[   60.278178]  [<ffffffff81062d77>] ? __cancel_work_timer+0x77/0x120
[   60.278188]  [<ffffffff810a9cbd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[   60.278196]  [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120
[   60.278206]  [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20
[   60.278214]  [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0
[   60.278225]  [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20
[   60.278234]  [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100
[   60.278244]  [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0
[   60.278255]  [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c
[   60.278265]  [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110
[   60.278275]  [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10
[   60.278284]  [<ffffffff815a0a68>] _cpu_down+0x88/0x330
[   60.278292]  [<ffffffff81024cf7>] ? cpu_hotplug_driver_lock+0x17/0x20
[   60.278302]  [<ffffffff815a0d46>] cpu_down+0x36/0x50
[   60.278311]  [<ffffffff815a2748>] store_online+0x98/0xd0
[   60.278320]  [<ffffffff81452a28>] dev_attr_store+0x18/0x30
[   60.278329]  [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150
[   60.278337]  [<ffffffff8116806d>] vfs_write+0xbd/0x1f0
[   60.278347]  [<ffffffff81185950>] ? fget_light+0x320/0x4b0
[   60.278355]  [<ffffffff811686fc>] SyS_write+0x4c/0xa0
[   60.278364]  [<ffffffff815bbbbe>] tracesys+0xd0/0xd5
[   60.280582] smpboot: CPU 1 is now offline

The intention of that commit was to avoid warnings during CPU
hotplug, which indicated that offline CPUs were getting IPIs from the
cpufreq governor's work items.  But the real root-cause of that
problem was commit a66b2e5 (cpufreq: Preserve sysfs files across
suspend/resume) because it totally skipped all the cpufreq callbacks
during CPU hotplug in the suspend/resume path, and hence it never
actually shut down the cpufreq governor's worker threads during CPU
offline in the suspend/resume path.

Reflecting back, the reason why we never suspected that commit as the
root-cause earlier, was that the original issue was reported with
just the halt command and nobody had brought in suspend/resume to the
equation.

The reason for _that_ in turn, as it turns out, is that earlier
halt/shutdown was being done by disabling non-boot CPUs while tasks
were frozen, just like suspend/resume....  but commit cf7df37
(reboot: migrate shutdown/reboot to boot cpu) which came somewhere
along that very same time changed that logic: shutdown/halt no longer
takes CPUs offline.  Thus, the test-cases for reproducing the bug
were vastly different and thus we went totally off the trail.

Overall, it was one hell of a confusion with so many commits
affecting each other and also affecting the symptoms of the problems
in subtle ways.  Finally, now since the original problematic commit
(a66b2e5) has been completely reverted, revert this intermediate fix
too (2f7021a), to fix the CPU hotplug deadlock.  Phew!

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Peter Wu <lekensteyn@gmail.com>
Cc: 3.10+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
Since commit 2025172 (spi/bitbang: Use core message pump), the following
kernel crash is seen:

Unable to handle kernel NULL pointer dereference at virtual address 0000000d
pgd = 80004000
[0000000d] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 1 PID: 48 Comm: spi32766 Not tainted 3.11.0-rc1+ #4
task: bfa3e580 ti: bfb90000 task.ti: bfb90000
PC is at spi_bitbang_transfer_one+0x50/0x248
LR is at spi_bitbang_transfer_one+0x20/0x248
...

,and also the following build warning:

drivers/spi/spi-bitbang.c: In function 'spi_bitbang_start':
drivers/spi/spi-bitbang.c:436:31: warning: assignment from incompatible pointer type [enabled by default]

In order to fix it, we need to change the first parameter of
spi_bitbang_transfer_one() to 'struct spi_master *master'.

Tested on a mx6qsabrelite by succesfully probing a SPI NOR flash.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always().  We can use the following program to trigger it:

int main(int argc, char *argv[])
{
	int fd;

	fd = open(argv[1], O_TMPFILE, 0666);
	if (fd < 0) {
		perror("open ");
		return -1;
	}
	close(fd);
	return 0;
}

The oops message looks like this:

kernel: kernel BUG at fs/ext3/namei.c:1992!
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: ext4 jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport serio_raw sg dcdbas pcspkr i2c_i801 ehci_pci ehci_hcd button acpi_cpufreq mperf e1000e ptp pps_core ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd
kernel: CPU: 0 PID: 2882 Comm: tst_tmpfile Not tainted 3.11.0-rc1+ #4
kernel: Hardware name: Dell Inc. OptiPlex 780 /0V4W66, BIOS A05 08/11/2010
kernel: task: ffff880112d30050 ti: ffff8801124d4000 task.ti: ffff8801124d4000
kernel: RIP: 0010:[<ffffffffa00db5ae>] [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
kernel: RSP: 0018:ffff8801124d5cc8  EFLAGS: 00010202
kernel: RAX: 0000000000000000 RBX: ffff880111510128 RCX: ffff8801114683a0
kernel: RDX: 0000000000000000 RSI: ffff880111510128 RDI: ffff88010fcf65a8
kernel: RBP: ffff8801124d5d18 R08: 0080000000000000 R09: ffffffffa00d3b7f
kernel: R10: ffff8801114683a0 R11: ffff8801032a2558 R12: 0000000000000000
kernel: R13: ffff88010fcf6800 R14: ffff8801032a2558 R15: ffff8801115100d8
kernel: FS:  00007f5d172b5700(0000) GS:ffff880117c00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
kernel: CR2: 00007f5d16df15d0 CR3: 0000000110b1d000 CR4: 00000000000407f0
kernel: Stack:
kernel: 000000000000000c ffff8801048a7dc8 ffff8801114685a8 ffffffffa00b80d7
kernel: ffff8801124d5e38 ffff8801032a2558 ffff88010ce24d68 0000000000000000
kernel: ffff88011146b300 ffff8801124d5d44 ffff8801124d5d78 ffffffffa00db7e1
kernel: Call Trace:
kernel: [<ffffffffa00b80d7>] ? journal_start+0x8c/0xbd [jbd]
kernel: [<ffffffffa00db7e1>] ext3_tmpfile+0xb2/0x13b [ext3]
kernel: [<ffffffff821076f8>] path_openat+0x11f/0x5e7
kernel: [<ffffffff821c86b4>] ? list_del+0x11/0x30
kernel: [<ffffffff82065fa2>] ?  __dequeue_entity+0x33/0x38
kernel: [<ffffffff82107cd5>] do_filp_open+0x3f/0x8d
kernel: [<ffffffff82112532>] ? __alloc_fd+0x50/0x102
kernel: [<ffffffff820f9296>] do_sys_open+0x13b/0x1cd
kernel: [<ffffffff820f935c>] SyS_open+0x1e/0x20
kernel: [<ffffffff82398c02>] system_call_fastpath+0x16/0x1b
kernel: Code: 39 c7 0f 85 67 01 00 00 0f b7 03 25 00 f0 00 00 3d 00 40 00 00 74 18 3d 00 80 00 00 74 11 3d 00 a0 00 00 74 0a 83 7b 48 00 74 04 <0f> 0b eb fe 49 8b 85 50 03 00 00 4c 89 f6 48 c7 c7 c0 99 0e a0
kernel: RIP  [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3]
kernel: RSP <ffff8801124d5cc8>

Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
noglitch pushed a commit that referenced this pull request May 13, 2014
Commits 6a1c068 and
9356b53, respectively
  'tty: Convert termios_mutex to termios_rwsem' and
  'n_tty: Access termios values safely'
introduced a circular lock dependency with console_lock and
termios_rwsem.

The lockdep report [1] shows that n_tty_write() will attempt
to claim console_lock while holding the termios_rwsem, whereas
tty_do_resize() may already hold the console_lock while
claiming the termios_rwsem.

Since n_tty_write() and tty_do_resize() do not contend
over the same data -- the tty->winsize structure -- correct
the lock dependency by introducing a new lock which
specifically serializes access to tty->winsize only.

[1] Lockdep report

======================================================
[ INFO: possible circular locking dependency detected ]
3.10.0-0+tip-xeon+lockdep #0+tip Not tainted
-------------------------------------------------------
modprobe/277 is trying to acquire lock:
 (&tty->termios_rwsem){++++..}, at: [<ffffffff81452656>] tty_do_resize+0x36/0xe0

but task is already holding lock:
 ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 ((fb_notifier_list).rwsem){.+.+.+}:
       [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
       [<ffffffff8175b797>] down_read+0x47/0x5c
       [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0
       [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20
       [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320
       [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper]
       [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau]
       [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau]
       [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm]
       [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau]
       [<ffffffff813b13db>] local_pci_probe+0x4b/0x80
       [<ffffffff813b1701>] pci_device_probe+0x111/0x120
       [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0
       [<ffffffff81497bab>] __driver_attach+0xab/0xb0
       [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0
       [<ffffffff814971fe>] driver_attach+0x1e/0x20
       [<ffffffff81496cc1>] bus_add_driver+0x111/0x290
       [<ffffffff814982b7>] driver_register+0x77/0x170
       [<ffffffff813b0454>] __pci_register_driver+0x64/0x70
       [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm]
       [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau]
       [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
       [<ffffffff810c54cb>] load_module+0x123b/0x1bf0
       [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120
       [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b

-> #1 (console_lock){+.+.+.}:
       [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
       [<ffffffff810430a7>] console_lock+0x77/0x80
       [<ffffffff8146b2a1>] con_flush_chars+0x31/0x50
       [<ffffffff8145780c>] n_tty_write+0x1ec/0x4d0
       [<ffffffff814541b9>] tty_write+0x159/0x2e0
       [<ffffffff814543f5>] redirected_tty_write+0xb5/0xc0
       [<ffffffff811ab9d5>] vfs_write+0xc5/0x1f0
       [<ffffffff811abec5>] SyS_write+0x55/0xa0
       [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b

-> #0 (&tty->termios_rwsem){++++..}:
       [<ffffffff810b65c3>] __lock_acquire+0x1c43/0x1d30
       [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
       [<ffffffff8175b724>] down_write+0x44/0x70
       [<ffffffff81452656>] tty_do_resize+0x36/0xe0
       [<ffffffff8146c841>] vc_do_resize+0x3e1/0x4c0
       [<ffffffff8146c99f>] vc_resize+0x1f/0x30
       [<ffffffff813e4535>] fbcon_init+0x385/0x5a0
       [<ffffffff8146a4bc>] visual_init+0xbc/0x120
       [<ffffffff8146cd13>] do_bind_con_driver+0x163/0x320
       [<ffffffff8146cfa1>] do_take_over_console+0x61/0x70
       [<ffffffff813e2b93>] do_fbcon_takeover+0x63/0xc0
       [<ffffffff813e67a5>] fbcon_event_notify+0x715/0x820
       [<ffffffff81762f9d>] notifier_call_chain+0x5d/0x110
       [<ffffffff8107aadc>] __blocking_notifier_call_chain+0x6c/0xc0
       [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20
       [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320
       [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper]
       [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau]
       [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau]
       [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm]
       [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau]
       [<ffffffff813b13db>] local_pci_probe+0x4b/0x80
       [<ffffffff813b1701>] pci_device_probe+0x111/0x120
       [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0
       [<ffffffff81497bab>] __driver_attach+0xab/0xb0
       [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0
       [<ffffffff814971fe>] driver_attach+0x1e/0x20
       [<ffffffff81496cc1>] bus_add_driver+0x111/0x290
       [<ffffffff814982b7>] driver_register+0x77/0x170
       [<ffffffff813b0454>] __pci_register_driver+0x64/0x70
       [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm]
       [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau]
       [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
       [<ffffffff810c54cb>] load_module+0x123b/0x1bf0
       [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120
       [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

Chain exists of:
  &tty->termios_rwsem --> console_lock --> (fb_notifier_list).rwsem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((fb_notifier_list).rwsem);
                               lock(console_lock);
                               lock((fb_notifier_list).rwsem);
  lock(&tty->termios_rwsem);

 *** DEADLOCK ***

7 locks held by modprobe/277:
 #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff81497b5b>] __driver_attach+0x5b/0xb0
 #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff81497b69>] __driver_attach+0x69/0xb0
 #2:  (drm_global_mutex){+.+.+.}, at: [<ffffffffa008a6dd>] drm_get_pci_dev+0xbd/0x2a0 [drm]
 #3:  (registration_lock){+.+.+.}, at: [<ffffffff813d93f5>] register_framebuffer+0x25/0x320
 #4:  (&fb_info->lock){+.+.+.}, at: [<ffffffff813d8116>] lock_fb_info+0x26/0x60
 #5:  (console_lock){+.+.+.}, at: [<ffffffff813d95a4>] register_framebuffer+0x1d4/0x320
 #6:  ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0

stack backtrace:
CPU: 0 PID: 277 Comm: modprobe Not tainted 3.10.0-0+tip-xeon+lockdep #0+tip
Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 04/30/2012
 ffffffff8213e5e0 ffff8802aa2fb298 ffffffff81755f19 ffff8802aa2fb2e8
 ffffffff8174f506 ffff8802aa2fa000 ffff8802aa2fb378 ffff8802aa2ea8e8
 ffff8802aa2ea910 ffff8802aa2ea8e8 0000000000000006 0000000000000007
Call Trace:
 [<ffffffff81755f19>] dump_stack+0x19/0x1b
 [<ffffffff8174f506>] print_circular_bug+0x1fb/0x20c
 [<ffffffff810b65c3>] __lock_acquire+0x1c43/0x1d30
 [<ffffffff810b775e>] ? mark_held_locks+0xae/0x120
 [<ffffffff810b78d5>] ? trace_hardirqs_on_caller+0x105/0x1d0
 [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0
 [<ffffffff81452656>] ? tty_do_resize+0x36/0xe0
 [<ffffffff8175b724>] down_write+0x44/0x70
 [<ffffffff81452656>] ? tty_do_resize+0x36/0xe0
 [<ffffffff81452656>] tty_do_resize+0x36/0xe0
 [<ffffffff8146c841>] vc_do_resize+0x3e1/0x4c0
 [<ffffffff8146c99f>] vc_resize+0x1f/0x30
 [<ffffffff813e4535>] fbcon_init+0x385/0x5a0
 [<ffffffff8146a4bc>] visual_init+0xbc/0x120
 [<ffffffff8146cd13>] do_bind_con_driver+0x163/0x320
 [<ffffffff8146cfa1>] do_take_over_console+0x61/0x70
 [<ffffffff813e2b93>] do_fbcon_takeover+0x63/0xc0
 [<ffffffff813e67a5>] fbcon_event_notify+0x715/0x820
 [<ffffffff81762f9d>] notifier_call_chain+0x5d/0x110
 [<ffffffff8107aadc>] __blocking_notifier_call_chain+0x6c/0xc0
 [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20
 [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320
 [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper]
 [<ffffffff8173cbcb>] ? kmemleak_alloc+0x5b/0xc0
 [<ffffffff81198874>] ? kmem_cache_alloc_trace+0x104/0x290
 [<ffffffffa01035e1>] ? drm_fb_helper_single_add_all_connectors+0x81/0xf0 [drm_kms_helper]
 [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau]
 [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau]
 [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm]
 [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau]
 [<ffffffff8175f162>] ? _raw_spin_unlock_irqrestore+0x42/0x80
 [<ffffffff813b13db>] local_pci_probe+0x4b/0x80
 [<ffffffff813b1701>] pci_device_probe+0x111/0x120
 [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0
 [<ffffffff81497bab>] __driver_attach+0xab/0xb0
 [<ffffffff81497b00>] ? driver_probe_device+0x3a0/0x3a0
 [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0
 [<ffffffff814971fe>] driver_attach+0x1e/0x20
 [<ffffffff81496cc1>] bus_add_driver+0x111/0x290
 [<ffffffffa022a000>] ? 0xffffffffa0229fff
 [<ffffffff814982b7>] driver_register+0x77/0x170
 [<ffffffffa022a000>] ? 0xffffffffa0229fff
 [<ffffffff813b0454>] __pci_register_driver+0x64/0x70
 [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm]
 [<ffffffffa022a000>] ? 0xffffffffa0229fff
 [<ffffffffa022a000>] ? 0xffffffffa0229fff
 [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau]
 [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
 [<ffffffff810c54cb>] load_module+0x123b/0x1bf0
 [<ffffffff81399a50>] ? ddebug_proc_open+0xb0/0xb0
 [<ffffffff813855ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120
 [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer
which leads to a panic (from Srivatsa S. Bhat):

BUG: unable to handle kernel paging request at ffff882018552020
IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
+ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
Workqueue: netns cleanup_net
task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000
RIP: 0010:[<ffffffffa0366b02>]  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
RSP: 0018:ffff881039367bd8  EFLAGS: 00010286
RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200
RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68
RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222
R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040
R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0
Stack:
 ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000
 ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0
 ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0
Call Trace:
 [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6]
 [<ffffffff815bdecb>] inet_release+0xfb/0x220
 [<ffffffff815bddf2>] ? inet_release+0x22/0x220
 [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6]
 [<ffffffff8151c1d9>] sock_release+0x29/0xa0
 [<ffffffff81525520>] sk_release_kernel+0x30/0x70
 [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6]
 [<ffffffff8152fff9>] ops_exit_list+0x39/0x60
 [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0
 [<ffffffff81075e3a>] process_one_work+0x1da/0x610
 [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
 [<ffffffff81076390>] worker_thread+0x120/0x3a0
 [<ffffffff81076270>] ? process_one_work+0x610/0x610
 [<ffffffff8107da2e>] kthread+0xee/0x100
 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06 <4c> 39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65
RIP  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
 RSP <ffff881039367bd8>
CR2: ffff882018552020

Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
noglitch pushed a commit that referenced this pull request May 13, 2014
We used to keep the port's char device structs and the /sys entries
around till the last reference to the port was dropped.  This is
actually unnecessary, and resulted in buggy behaviour:

1. Open port in guest
2. Hot-unplug port
3. Hot-plug a port with the same 'name' property as the unplugged one

This resulted in hot-plug being unsuccessful, as a port with the same
name already exists (even though it was unplugged).

This behaviour resulted in a warning message like this one:

-------------------8<---------------------------------------
WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
Hardware name: KVM
sysfs: cannot create duplicate filename
'/devices/pci0000:00/0000:00:04.0/virtio0/virtio-ports/vport0p1'

Call Trace:
 [<ffffffff8106b607>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106b6f6>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff811f2319>] ? sysfs_add_one+0xc9/0x130
 [<ffffffff811f23e8>] ? create_dir+0x68/0xb0
 [<ffffffff811f2469>] ? sysfs_create_dir+0x39/0x50
 [<ffffffff81273129>] ? kobject_add_internal+0xb9/0x260
 [<ffffffff812733d8>] ? kobject_add_varg+0x38/0x60
 [<ffffffff812734b4>] ? kobject_add+0x44/0x70
 [<ffffffff81349de4>] ? get_device_parent+0xf4/0x1d0
 [<ffffffff8134b389>] ? device_add+0xc9/0x650

-------------------8<---------------------------------------

Instead of relying on guest applications to release all references to
the ports, we should go ahead and unregister the port from all the core
layers.  Any open/read calls on the port will then just return errors,
and an unplug/plug operation on the host will succeed as expected.

This also caused buggy behaviour in case of the device removal (not just
a port): when the device was removed (which means all ports on that
device are removed automatically as well), the ports with active
users would clean up only when the last references were dropped -- and
it would be too late then to be referencing char device pointers,
resulting in oopses:

-------------------8<---------------------------------------
PID: 6162   TASK: ffff8801147ad500  CPU: 0   COMMAND: "cat"
 #0 [ffff88011b9d5a90] machine_kexec at ffffffff8103232b
 #1 [ffff88011b9d5af0] crash_kexec at ffffffff810b9322
 #2 [ffff88011b9d5bc0] oops_end at ffffffff814f4a50
 #3 [ffff88011b9d5bf0] die at ffffffff8100f26b
 #4 [ffff88011b9d5c20] do_general_protection at ffffffff814f45e2
 #5 [ffff88011b9d5c50] general_protection at ffffffff814f3db5
    [exception RIP: strlen+2]
    RIP: ffffffff81272ae2  RSP: ffff88011b9d5d00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff880118901c18  RCX: 0000000000000000
    RDX: ffff88011799982c  RSI: 00000000000000d0  RDI: 3a303030302f3030
    RBP: ffff88011b9d5d38   R8: 0000000000000006   R9: ffffffffa0134500
    R10: 0000000000001000  R11: 0000000000001000  R12: ffff880117a1cc10
    R13: 00000000000000d0  R14: 0000000000000017  R15: ffffffff81aff700
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff88011b9d5d00] kobject_get_path at ffffffff8126dc5d
 #7 [ffff88011b9d5d40] kobject_uevent_env at ffffffff8126e551
 #8 [ffff88011b9d5dd0] kobject_uevent at ffffffff8126e9eb
 #9 [ffff88011b9d5de0] device_del at ffffffff813440c7

-------------------8<---------------------------------------

So clean up when we have all the context, and all that's left to do when
the references to the port have dropped is to free up the port struct
itself.

CC: <stable@vger.kernel.org>
Reported-by: chayang <chayang@redhat.com>
Reported-by: YOGANANTH SUBRAMANIAN <anantyog@in.ibm.com>
Reported-by: FuXiangChun <xfu@redhat.com>
Reported-by: Qunfang Zhang <qzhang@redhat.com>
Reported-by: Sibiao Luo <sluo@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
noglitch pushed a commit that referenced this pull request May 13, 2014
Otherwise we get flooded by the kernel warning us that we are doing
long sequences of IO without serialisation. For example,

 WARNING: CPU: 0 PID: 11136 at drivers/gpu/drm/i915/intel_sideband.c:40 vlv_sideband_rw+0x48/0x1ef()
 Modules linked in:
 CPU: 0 PID: 11136 Comm: kworker/u2:0 Tainted: G        W    3.11.0-rc2+ #4
 Call Trace:
  [<c2028564>] ?  warn_slowpath_common+0x63/0x78
  [<c227ad43>] ?  vlv_sideband_rw+0x48/0x1ef
  [<c20285dd>] ?  warn_slowpath_null+0xf/0x13
  [<c227ad43>] ?  vlv_sideband_rw+0x48/0x1ef
  [<c227b060>] ?  vlv_dpio_write+0x1c/0x21
  [<c2262b3b>] ?  intel_dp_set_signal_levels+0x24a/0x385
  [<c2264909>] ?  intel_dp_complete_link_train+0x25/0x1d1
  [<c2264c55>] ?  intel_dp_check_link_status+0xf7/0x106
  [<c2238ced>] ?  i915_hotplug_work_func+0x17b/0x221
  [<c203a204>] ?  process_one_work+0x12e/0x210
  [<c203a5e4>] ?  worker_thread+0x116/0x1ad
  [<c203a4ce>] ?  rescuer_thread+0x1cb/0x1cb
  [<c203d8f5>] ?  kthread+0x67/0x6c
  [<c2457ebb>] ?  ret_from_kernel_thread+0x1b/0x30
  [<c203d88e>] ?  init_completion+0x18/0x18

v2: Retire the locking in vlv_crtc_enable() and do it close to the meat.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Squash in a s/mutex_lock/mutex_unlock/ fixup spotted by the 0
day kernel build/coccinelle and reported by Dan Carpenter.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
noglitch pushed a commit that referenced this pull request May 13, 2014
Since commit 29a241c (ACPICA: Add argument typechecking for all
predefined ACPI names), _DSM parameters are validated which trigger the
following warning:

    ACPI Warning: \_SB_.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20130517/nsarguments-95)
    ACPI Warning: \_SB_.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20130517/nsarguments-95)
    ACPI Warning: \_SB_.PCI0.P0P2.PEGP._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20130517/nsarguments-95)
    ACPI Warning: \_SB_.PCI0.P0P2.PEGP._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20130517/nsarguments-95)

As the Intel _DSM method seems to ignore this parameter, let's comply to
the ACPI spec and use a Package instead.

Signed-off-by: Peter Wu <lekensteyn@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=32602
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
noglitch pushed a commit that referenced this pull request May 13, 2014
We met lockdep warning when enable and disable the bearer for commands such as:

tipc-config -netid=1234 -addr=1.1.3 -be=eth:eth0
tipc-config -netid=1234 -addr=1.1.3 -bd=eth:eth0

---------------------------------------------------

[  327.693595] ======================================================
[  327.693994] [ INFO: possible circular locking dependency detected ]
[  327.694519] 3.11.0-rc3-wwd-default #4 Tainted: G           O
[  327.694882] -------------------------------------------------------
[  327.695385] tipc-config/5825 is trying to acquire lock:
[  327.695754]  (((timer))#2){+.-...}, at: [<ffffffff8105be80>] del_timer_sync+0x0/0xd0
[  327.696018]
[  327.696018] but task is already holding lock:
[  327.696018]  (&(&b_ptr->lock)->rlock){+.-...}, at: [<ffffffffa02be58d>] bearer_disable+  0xdd/0x120 [tipc]
[  327.696018]
[  327.696018] which lock already depends on the new lock.
[  327.696018]
[  327.696018]
[  327.696018] the existing dependency chain (in reverse order) is:
[  327.696018]
[  327.696018] -> #1 (&(&b_ptr->lock)->rlock){+.-...}:
[  327.696018]        [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870
[  327.696018]        [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670
[  327.696018]        [<ffffffff810b4453>] lock_acquire+0x103/0x130
[  327.696018]        [<ffffffff814d65b1>] _raw_spin_lock_bh+0x41/0x80
[  327.696018]        [<ffffffffa02c5d48>] disc_timeout+0x18/0xd0 [tipc]
[  327.696018]        [<ffffffff8105b92a>] call_timer_fn+0xda/0x1e0
[  327.696018]        [<ffffffff8105bcd7>] run_timer_softirq+0x2a7/0x2d0
[  327.696018]        [<ffffffff8105379a>] __do_softirq+0x16a/0x2e0
[  327.696018]        [<ffffffff81053a35>] irq_exit+0xd5/0xe0
[  327.696018]        [<ffffffff81033005>] smp_apic_timer_interrupt+0x45/0x60
[  327.696018]        [<ffffffff814df4af>] apic_timer_interrupt+0x6f/0x80
[  327.696018]        [<ffffffff8100b70e>] arch_cpu_idle+0x1e/0x30
[  327.696018]        [<ffffffff810a039d>] cpu_idle_loop+0x1fd/0x280
[  327.696018]        [<ffffffff810a043e>] cpu_startup_entry+0x1e/0x20
[  327.696018]        [<ffffffff81031589>] start_secondary+0x89/0x90
[  327.696018]
[  327.696018] -> #0 (((timer))#2){+.-...}:
[  327.696018]        [<ffffffff810b33fe>] check_prev_add+0x43e/0x4b0
[  327.696018]        [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870
[  327.696018]        [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670
[  327.696018]        [<ffffffff810b4453>] lock_acquire+0x103/0x130
[  327.696018]        [<ffffffff8105bebd>] del_timer_sync+0x3d/0xd0
[  327.696018]        [<ffffffffa02c5855>] tipc_disc_delete+0x15/0x30 [tipc]
[  327.696018]        [<ffffffffa02be59f>] bearer_disable+0xef/0x120 [tipc]
[  327.696018]        [<ffffffffa02be74f>] tipc_disable_bearer+0x2f/0x60 [tipc]
[  327.696018]        [<ffffffffa02bfb32>] tipc_cfg_do_cmd+0x2e2/0x550 [tipc]
[  327.696018]        [<ffffffffa02c8c79>] handle_cmd+0x49/0xe0 [tipc]
[  327.696018]        [<ffffffff8143e898>] genl_family_rcv_msg+0x268/0x340
[  327.696018]        [<ffffffff8143ed30>] genl_rcv_msg+0x70/0xd0
[  327.696018]        [<ffffffff8143d4c9>] netlink_rcv_skb+0x89/0xb0
[  327.696018]        [<ffffffff8143e617>] genl_rcv+0x27/0x40
[  327.696018]        [<ffffffff8143d21e>] netlink_unicast+0x15e/0x1b0
[  327.696018]        [<ffffffff8143ddcf>] netlink_sendmsg+0x22f/0x400
[  327.696018]        [<ffffffff813f7836>] __sock_sendmsg+0x66/0x80
[  327.696018]        [<ffffffff813f7957>] sock_aio_write+0x107/0x120
[  327.696018]        [<ffffffff8117f76d>] do_sync_write+0x7d/0xc0
[  327.696018]        [<ffffffff8117fc56>] vfs_write+0x186/0x190
[  327.696018]        [<ffffffff811803e0>] SyS_write+0x60/0xb0
[  327.696018]        [<ffffffff814de852>] system_call_fastpath+0x16/0x1b
[  327.696018]
[  327.696018] other info that might help us debug this:
[  327.696018]
[  327.696018]  Possible unsafe locking scenario:
[  327.696018]
[  327.696018]        CPU0                    CPU1
[  327.696018]        ----                    ----
[  327.696018]   lock(&(&b_ptr->lock)->rlock);
[  327.696018]                                lock(((timer))#2);
[  327.696018]                                lock(&(&b_ptr->lock)->rlock);
[  327.696018]   lock(((timer))#2);
[  327.696018]
[  327.696018]  *** DEADLOCK ***
[  327.696018]
[  327.696018] 5 locks held by tipc-config/5825:
[  327.696018]  #0:  (cb_lock){++++++}, at: [<ffffffff8143e608>] genl_rcv+0x18/0x40
[  327.696018]  #1:  (genl_mutex){+.+.+.}, at: [<ffffffff8143ed66>] genl_rcv_msg+0xa6/0xd0
[  327.696018]  #2:  (config_mutex){+.+.+.}, at: [<ffffffffa02bf889>] tipc_cfg_do_cmd+0x39/ 0x550 [tipc]
[  327.696018]  #3:  (tipc_net_lock){++.-..}, at: [<ffffffffa02be738>] tipc_disable_bearer+ 0x18/0x60 [tipc]
[  327.696018]  #4:  (&(&b_ptr->lock)->rlock){+.-...}, at: [<ffffffffa02be58d>]             bearer_disable+0xdd/0x120 [tipc]
[  327.696018]
[  327.696018] stack backtrace:
[  327.696018] CPU: 2 PID: 5825 Comm: tipc-config Tainted: G           O 3.11.0-rc3-wwd-    default #4
[  327.696018] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  327.696018]  00000000ffffffff ffff880037fa77a8 ffffffff814d03dd 0000000000000000
[  327.696018]  ffff880037fa7808 ffff880037fa77e8 ffffffff810b1c4f 0000000037fa77e8
[  327.696018]  ffff880037fa7808 ffff880037e4db40 0000000000000000 ffff880037e4e318
[  327.696018] Call Trace:
[  327.696018]  [<ffffffff814d03dd>] dump_stack+0x4d/0xa0
[  327.696018]  [<ffffffff810b1c4f>] print_circular_bug+0x10f/0x120
[  327.696018]  [<ffffffff810b33fe>] check_prev_add+0x43e/0x4b0
[  327.696018]  [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870
[  327.696018]  [<ffffffff81087a28>] ? sched_clock_cpu+0xd8/0x110
[  327.696018]  [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670
[  327.696018]  [<ffffffff810b4453>] lock_acquire+0x103/0x130
[  327.696018]  [<ffffffff8105be80>] ? try_to_del_timer_sync+0x70/0x70
[  327.696018]  [<ffffffff8105bebd>] del_timer_sync+0x3d/0xd0
[  327.696018]  [<ffffffff8105be80>] ? try_to_del_timer_sync+0x70/0x70
[  327.696018]  [<ffffffffa02c5855>] tipc_disc_delete+0x15/0x30 [tipc]
[  327.696018]  [<ffffffffa02be59f>] bearer_disable+0xef/0x120 [tipc]
[  327.696018]  [<ffffffffa02be74f>] tipc_disable_bearer+0x2f/0x60 [tipc]
[  327.696018]  [<ffffffffa02bfb32>] tipc_cfg_do_cmd+0x2e2/0x550 [tipc]
[  327.696018]  [<ffffffff81218783>] ? security_capable+0x13/0x20
[  327.696018]  [<ffffffffa02c8c79>] handle_cmd+0x49/0xe0 [tipc]
[  327.696018]  [<ffffffff8143e898>] genl_family_rcv_msg+0x268/0x340
[  327.696018]  [<ffffffff8143ed30>] genl_rcv_msg+0x70/0xd0
[  327.696018]  [<ffffffff8143ecc0>] ? genl_lock+0x20/0x20
[  327.696018]  [<ffffffff8143d4c9>] netlink_rcv_skb+0x89/0xb0
[  327.696018]  [<ffffffff8143e608>] ? genl_rcv+0x18/0x40
[  327.696018]  [<ffffffff8143e617>] genl_rcv+0x27/0x40
[  327.696018]  [<ffffffff8143d21e>] netlink_unicast+0x15e/0x1b0
[  327.696018]  [<ffffffff81289d7c>] ? memcpy_fromiovec+0x6c/0x90
[  327.696018]  [<ffffffff8143ddcf>] netlink_sendmsg+0x22f/0x400
[  327.696018]  [<ffffffff813f7836>] __sock_sendmsg+0x66/0x80
[  327.696018]  [<ffffffff813f7957>] sock_aio_write+0x107/0x120
[  327.696018]  [<ffffffff813fe29c>] ? release_sock+0x8c/0xa0
[  327.696018]  [<ffffffff8117f76d>] do_sync_write+0x7d/0xc0
[  327.696018]  [<ffffffff8117fa24>] ? rw_verify_area+0x54/0x100
[  327.696018]  [<ffffffff8117fc56>] vfs_write+0x186/0x190
[  327.696018]  [<ffffffff811803e0>] SyS_write+0x60/0xb0
[  327.696018]  [<ffffffff814de852>] system_call_fastpath+0x16/0x1b

-----------------------------------------------------------------------

The problem is that the tipc_link_delete() will cancel the timer disc_timeout() when
the b_ptr->lock is hold, but the disc_timeout() still call b_ptr->lock to finish the
work, so the dead lock occurs.

We should unlock the b_ptr->lock when del the disc_timeout().

Remove link_timeout() still met the same problem, the patch:

http://article.gmane.org/gmane.network.tipc.general/4380

fix the problem, so no need to send patch for fix link_timeout() deadlock warming.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
noglitch pushed a commit that referenced this pull request May 13, 2014
Fix a NULL pointer deference while removing an empty directory, which
was introduced by commit 3704412 ("[readdir] convert ocfs2").

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: [<(null)>]           (null)
  PGD 6da85067 PUD 6da89067 PMD 0
  Oops: 0010 [#1] SMP
  CPU: 0 PID: 6564 Comm: rmdir Tainted: G           O 3.11.0-rc1 #4
  RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
  Call Trace:
    ocfs2_dir_foreach+0x49/0x50 [ocfs2]
    ocfs2_empty_dir+0x12c/0x3e0 [ocfs2]
    ocfs2_unlink+0x56e/0xc10 [ocfs2]
    vfs_rmdir+0xd5/0x140
    do_rmdir+0x1cb/0x1e0
    SyS_rmdir+0x16/0x20
    system_call_fastpath+0x16/0x1b
  Code:  Bad RIP value.
  RIP  [<          (null)>]           (null)
  RSP <ffff88006daddc10>
  CR2: 0000000000000000

[dan.carpenter@oracle.com: fix pointer math]
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reported-by: David Weber <wb@munzinger.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
ACPI 5.0 specification requires the fourth parameter to the _DSM (Device
Specific Method) to be of type package instead of integer. Failing to do
that we get following warning on the console:

  ACPI Warning: \_SB_.PCI0.I2C1.TPL0._DSM: Argument #4 type mismatch - Found [Integer],
                ACPI requires [Package] (20130517/nsarguments-95)

Fix this by passing an empty package to the _DSM method. The HID over I2C
specification doesn't require any specific values to be passed with this
parameter.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
noglitch pushed a commit that referenced this pull request May 13, 2014
When booting secondary CPUs, announce_cpu() is called to show which cpu has
been brought up. For example:

[    0.402751] smpboot: Booting Node   0, Processors  #1 #2 #3 #4 #5 OK
[    0.525667] smpboot: Booting Node   1, Processors  #6 #7 #8 #9 #10 #11 OK
[    0.755592] smpboot: Booting Node   0, Processors  #12 #13 #14 #15 #16 #17 OK
[    0.890495] smpboot: Booting Node   1, Processors  #18 #19 #20 #21 #22 #23

But the last "OK" is lost, because 'nr_cpu_ids-1' represents the maximum
possible cpu id. It should use the maximum present cpu id in case not all
CPUs booted up.

Signed-off-by: Libin <huawei.libin@huawei.com>
Cc: <guohanjun@huawei.com>
Cc: <wangyijing@huawei.com>
Cc: <fenghua.yu@intel.com>
Cc: <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1378378676-18276-1-git-send-email-huawei.libin@huawei.com
[ tweaked the changelog, removed unnecessary line break, tweaked the format to align the fields vertically. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
Not all I/O ASIC versions have the free-running counter implemented, an
early revision used in the 5000/1xx models aka 3MIN and 4MIN did not have
it.  Therefore we cannot unconditionally use it as a clock source.
Fortunately if not implemented its register slot has a fixed value so it
is enough if we check for the value at the end of the calibration period
being the same as at the beginning.

This also means we need to look for another high-precision clock source on
the systems affected.  The 5000/1xx can have an R4000SC processor
installed where the CP0 Count register can be used as a clock source.
Unfortunately all the R4k DECstations suffer from the missed timer
interrupt on CP0 Count reads erratum, so we cannot use the CP0 timer as a
clock source and a clock event both at a time.  However we never need an
R4k clock event device because all DECstations have a DS1287A RTC chip
whose periodic interrupt can be used as a clock source.

This gives us the following four configuration possibilities for I/O ASIC
DECstations:

1. No I/O ASIC counter and no CP0 timer, e.g. R3k 5000/1xx (3MIN).

2. No I/O ASIC counter but the CP0 timer, i.e. R4k 5000/150 (4MIN).

3. The I/O ASIC counter but no CP0 timer, e.g. R3k 5000/240 (3MAX+).

4. The I/O ASIC counter and the CP0 timer, e.g. R4k 5000/260 (4MAX+).

For #1 and #2 this change stops the I/O ASIC free-running counter from
being installed as a clock source of a 0Hz frequency.  For #2 it also
arranges for the CP0 timer to be used as a clock source rather than a
clock event device, because having an accurate wall clock is more
important than a high-precision interval timer.  For #3 there is no
change.  For #4 the change makes the I/O ASIC free-running counter
installed as a clock source so that the CP0 timer can be used as a clock
event device.

Unfortunately the use of the CP0 timer as a clock event device relies on a
succesful completion of c0_compare_interrupt.  That never happens, because
while waiting for a CP0 Compare interrupt to happen the function spins in
a loop reading the CP0 Count register.  This makes the CP0 Count erratum
trigger reliably causing the interrupt waited for to be lost in all cases.
As a result #4 resorts to using the CP0 timer as a clock source as well,
just as #2.  However we want to keep this separate arrangement in case
(hope) c0_compare_interrupt is eventually rewritten such that it avoids
the erratum.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5825/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
When parsing lines from objdump a line containing source code starting
with a numeric label is mistaken for a line of disassembly starting with
a memory address.

Current validation fails to recognise that the "memory address" is out
of range and calculates an invalid offset which later causes this
segfault:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50)
    at util/annotate.c:631
631				hits += h->addr[offset++];
(gdb) bt
 #0  0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50)
    at util/annotate.c:631
 #1  0x00000000004d65e3 in annotate_browser__calc_percent (browser=0x7fffffffd130, evsel=0xa01da0) at ui/browsers/annotate.c:364
 #2  0x00000000004d7433 in annotate_browser__run (browser=0x7fffffffd130, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:672
 #3  0x00000000004d80c9 in symbol__tui_annotate (sym=0xc989a0, map=0xa02660, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:962
 #4  0x00000000004d7aa0 in hist_entry__tui_annotate (he=0xdf73f0, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:823
 #5  0x00000000004dd648 in perf_evsel__hists_browse (evsel=0xa01da0, nr_events=1, helpline=
    0x58b768 "For a higher level overview, try: perf report --sort comm,dso", ev_name=0xa02cd0 "cycles", left_exits=false, hbt=
    0x0, min_pcnt=0, env=0xa011e0) at ui/browsers/hists.c:1659
 #6  0x00000000004de372 in perf_evlist__tui_browse_hists (evlist=0xa01520, help=
    0x58b768 "For a higher level overview, try: perf report --sort comm,dso", hbt=0x0, min_pcnt=0, env=0xa011e0)
    at ui/browsers/hists.c:1950
 #7  0x000000000042cf6b in __cmd_report (rep=0x7fffffffd6c0) at builtin-report.c:581
 #8  0x000000000042e25d in cmd_report (argc=0, argv=0x7fffffffe4b0, prefix=0x0) at builtin-report.c:965
 #9  0x000000000041a0e1 in run_builtin (p=0x801548, argc=1, argv=0x7fffffffe4b0) at perf.c:319
 #10 0x000000000041a319 in handle_internal_command (argc=1, argv=0x7fffffffe4b0) at perf.c:376
 #11 0x000000000041a465 in run_argv (argcp=0x7fffffffe38c, argv=0x7fffffffe380) at perf.c:420
 #12 0x000000000041a707 in main (argc=1, argv=0x7fffffffe4b0) at perf.c:521

After the fix is applied the symbol can be annotated showing the
problematic line "1:      rep"

copy_user_generic_string  /usr/lib/debug/lib/modules/3.9.10-100.fc17.x86_64/vmlinux
             */
            ENTRY(copy_user_generic_string)
                    CFI_STARTPROC
                    ASM_STAC
                    andl %edx,%edx
              and    %edx,%edx
                    jz 4f
              je     37
                    cmpl $8,%edx
              cmp    $0x8,%edx
                    jb 2f           /* less than 8 bytes, go to byte copy loop */
              jb     33
                    ALIGN_DESTINATION
              mov    %edi,%ecx
              and    $0x7,%ecx
              je     28
              sub    $0x8,%ecx
              neg    %ecx
              sub    %ecx,%edx
        1a:   mov    (%rsi),%al
              mov    %al,(%rdi)
              inc    %rsi
              inc    %rdi
              dec    %ecx
              jne    1a
                    movl %edx,%ecx
        28:   mov    %edx,%ecx
                    shrl $3,%ecx
              shr    $0x3,%ecx
                    andl $7,%edx
              and    $0x7,%edx
            1:      rep
100.00        rep    movsq %ds:(%rsi),%es:(%rdi)
                    movsq
            2:      movl %edx,%ecx
        33:   mov    %edx,%ecx
            3:      rep
              rep    movsb %ds:(%rsi),%es:(%rdi)
                    movsb
            4:      xorl %eax,%eax
        37:   xor    %eax,%eax
              data32 xchg %ax,%ax
                    ASM_CLAC
                    ret
              retq

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1379009721-27667-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
…CIe cards

The band selection and PE control code for the
RT3593 chipsets only handles USB based devices
currently. Due to this limitation RT3593 based
PCIe cards are not working correctly.

On PCIe cards band selection is controlled via
GPIO #8 which is identical to the USB devices.
The LNA PE control is slightly different, all
LNA PEs are controlled by GPIO #4.

Update the code to configure the GPIO_CTRL register
correctly on PCIe devices.

Cc: Steven Liu <steven.liu@mediatek.com>
Cc: JasonYS Cheng <jasonys.cheng@mediatek.com>
Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
Michael Semon reported that xfs/299 generated this lockdep warning:

=============================================
[ INFO: possible recursive locking detected ]
3.12.0-rc2+ #2 Not tainted
---------------------------------------------
touch/21072 is trying to acquire lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

but task is already holding lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xfs_dquot_other_class);
  lock(&xfs_dquot_other_class);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

7 locks held by touch/21072:
 #0:  (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
 #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
 #2:  (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
 #3:  (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
 #4:  (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
 #5:  (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
 #6:  (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

The lockdep annotation for dquot lock nesting only understands
locking for user and "other" dquots, not user, group and quota
dquots. Fix the annotations to match the locking heirarchy we now
have.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
Michael Semon reported that xfs/299 generated this lockdep warning:

=============================================
[ INFO: possible recursive locking detected ]
3.12.0-rc2+ #2 Not tainted
---------------------------------------------
touch/21072 is trying to acquire lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

but task is already holding lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xfs_dquot_other_class);
  lock(&xfs_dquot_other_class);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

7 locks held by touch/21072:
 #0:  (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
 #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
 #2:  (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
 #3:  (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
 #4:  (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
 #5:  (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
 #6:  (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

The lockdep annotation for dquot lock nesting only understands
locking for user and "other" dquots, not user, group and quota
dquots. Fix the annotations to match the locking heirarchy we now
have.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit f112a04)
noglitch pushed a commit that referenced this pull request May 13, 2014
Booting a mx6 with CONFIG_PROVE_LOCKING we get:

======================================================
[ INFO: possible circular locking dependency detected ]
3.12.0-rc4-next-20131009+ #34 Not tainted
-------------------------------------------------------
swapper/0/1 is trying to acquire lock:
 (&imx_drm_device->mutex){+.+.+.}, at: [<804575a8>] imx_drm_encoder_get_mux_id+0x28/0x98

but task is already holding lock:
 (&crtc->mutex){+.+...}, at: [<802fe778>] drm_modeset_lock_all+0x40/0x54

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&crtc->mutex){+.+...}:
       [<800777d0>] __lock_acquire+0x18d4/0x1c24
       [<80077fec>] lock_acquire+0x68/0x7c
       [<805ead5c>] _mutex_lock_nest_lock+0x58/0x3a8
       [<802fec50>] drm_crtc_init+0x48/0xa8
       [<80457c88>] imx_drm_add_crtc+0xd4/0x144
       [<8045e2e8>] ipu_drm_probe+0x114/0x1fc
       [<80312278>] platform_drv_probe+0x20/0x50
       [<80310c68>] driver_probe_device+0x110/0x22c
       [<80310e20>] __driver_attach+0x9c/0xa0
       [<8030f218>] bus_for_each_dev+0x5c/0x90
       [<80310750>] driver_attach+0x20/0x28
       [<8031034c>] bus_add_driver+0xdc/0x1dc
       [<803114d8>] driver_register+0x80/0xfc
       [<80312198>] __platform_driver_register+0x50/0x64
       [<808172fc>] ipu_drm_driver_init+0x18/0x20
       [<800088c0>] do_one_initcall+0xfc/0x160
       [<807e7c5c>] kernel_init_freeable+0x104/0x1d4
       [<805e2930>] kernel_init+0x10/0xec
       [<8000ea68>] ret_from_fork+0x14/0x2c

-> #1 (&dev->mode_config.mutex){+.+.+.}:
       [<800777d0>] __lock_acquire+0x18d4/0x1c24
       [<80077fec>] lock_acquire+0x68/0x7c
       [<805eb100>] mutex_lock_nested+0x54/0x3a4
       [<802fe758>] drm_modeset_lock_all+0x20/0x54
       [<802fead4>] drm_encoder_init+0x20/0x7c
       [<80457ae4>] imx_drm_add_encoder+0x88/0xec
       [<80459838>] imx_ldb_probe+0x344/0x4fc
       [<80312278>] platform_drv_probe+0x20/0x50
       [<80310c68>] driver_probe_device+0x110/0x22c
       [<80310e20>] __driver_attach+0x9c/0xa0
       [<8030f218>] bus_for_each_dev+0x5c/0x90
       [<80310750>] driver_attach+0x20/0x28
       [<8031034c>] bus_add_driver+0xdc/0x1dc
       [<803114d8>] driver_register+0x80/0xfc
       [<80312198>] __platform_driver_register+0x50/0x64
       [<8081722c>] imx_ldb_driver_init+0x18/0x20
       [<800088c0>] do_one_initcall+0xfc/0x160
       [<807e7c5c>] kernel_init_freeable+0x104/0x1d4
       [<805e2930>] kernel_init+0x10/0xec
       [<8000ea68>] ret_from_fork+0x14/0x2c

-> #0 (&imx_drm_device->mutex){+.+.+.}:
       [<805e510c>] print_circular_bug+0x74/0x2e0
       [<80077ad0>] __lock_acquire+0x1bd4/0x1c24
       [<80077fec>] lock_acquire+0x68/0x7c
       [<805eb100>] mutex_lock_nested+0x54/0x3a4
       [<804575a8>] imx_drm_encoder_get_mux_id+0x28/0x98
       [<80459a98>] imx_ldb_encoder_prepare+0x34/0x114
       [<802ef724>] drm_crtc_helper_set_mode+0x1f0/0x4c0
       [<802f0344>] drm_crtc_helper_set_config+0x828/0x99c
       [<802ff270>] drm_mode_set_config_internal+0x5c/0xdc
       [<802eebe0>] drm_fb_helper_set_par+0x50/0xb4
       [<802af580>] fbcon_init+0x490/0x500
       [<802dd104>] visual_init+0xa8/0xf8
       [<802df414>] do_bind_con_driver+0x140/0x37c
       [<802df764>] do_take_over_console+0x114/0x1c4
       [<802af65c>] do_fbcon_takeover+0x6c/0xd4
       [<802b2b30>] fbcon_event_notify+0x7c8/0x818
       [<80049954>] notifier_call_chain+0x4c/0x8c
       [<80049cd8>] __blocking_notifier_call_chain+0x50/0x68
       [<80049d10>] blocking_notifier_call_chain+0x20/0x28
       [<802a75f0>] fb_notifier_call_chain+0x1c/0x24
       [<802a9224>] register_framebuffer+0x188/0x268
       [<802ee994>] drm_fb_helper_initial_config+0x2bc/0x4b8
       [<802f118c>] drm_fbdev_cma_init+0x7c/0xec
       [<80817288>] imx_fb_helper_init+0x54/0x90
       [<800088c0>] do_one_initcall+0xfc/0x160
       [<807e7c5c>] kernel_init_freeable+0x104/0x1d4
       [<805e2930>] kernel_init+0x10/0xec
       [<8000ea68>] ret_from_fork+0x14/0x2c

other info that might help us debug this:

Chain exists of:
  &imx_drm_device->mutex --> &dev->mode_config.mutex --> &crtc->mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&crtc->mutex);
                               lock(&dev->mode_config.mutex);
                               lock(&crtc->mutex);
  lock(&imx_drm_device->mutex);

 *** DEADLOCK ***

6 locks held by swapper/0/1:
 #0:  (registration_lock){+.+.+.}, at: [<802a90bc>] register_framebuffer+0x20/0x268
 #1:  (&fb_info->lock){+.+.+.}, at: [<802a7a90>] lock_fb_info+0x20/0x44
 #2:  (console_lock){+.+.+.}, at: [<802a9218>] register_framebuffer+0x17c/0x268
 #3:  ((fb_notifier_list).rwsem){.+.+.+}, at: [<80049cbc>] __blocking_notifier_call_chain+0x34/0x68
 #4:  (&dev->mode_config.mutex){+.+.+.}, at: [<802fe758>] drm_modeset_lock_all+0x20/0x54
 #5:  (&crtc->mutex){+.+...}, at: [<802fe778>] drm_modeset_lock_all+0x40/0x54

In order to avoid this lockdep warning, remove the locking from
imx_drm_encoder_get_mux_id() and imx_drm_crtc_panel_format_pins().

Tested on a mx6sabrelite and mx53qsb.

Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
If EM Transmit bit is busy during init ata_msleep() is called.  It is
wrong - msleep() should be used instead of ata_msleep(), because if EM
Transmit bit is busy for one port, it will be busy for all other ports
too, so using ata_msleep() causes wasting tries for another ports.

The most common scenario looks like that now
(six ports try to transmit a LED meaasege):
- port #0 tries for the 1st time and succeeds
- ports #1-5 try for the 1st time and sleeps
- port #1 tries for the 2nd time and succeeds
- ports #2-5 try for the 2nd time and sleeps
- port #2 tries for the 3rd time and succeeds
- ports #3-5 try for the 3rd time and sleeps
- port #3 tries for the 4th time and succeeds
- ports #4-5 try for the 4th time and sleeps
- port #4 tries for the 5th time and succeeds
- port #5 tries for the 5th time and sleeps

At this moment port #5 wasted all its five tries and failed to
initialize.  Because there are only 5 (EM_MAX_RETRY) tries available
usually only five ports succeed to initialize. The sixth port and next
ones usually will fail.

If msleep() is used instead of ata_msleep() the first port succeeds to
initialize in the first try and next ones usually succeed to
initialize in the second try.

tj: updated comment

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
When a task enters call_refreshresult with status 0 from call_refresh and
!rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting
or max number of retries.

Instead of trying forever, make use of the retry path that other errors use.

This only seems to be possible when the crrefresh callback is gss_refresh_null,
which only happens when destroying the context.

To reproduce:

1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific
   operations).

2) reboot - the client will be stuck and will need to be hard rebooted

BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46]
Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy
irq event stamp: 195724
hardirqs last  enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30
hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276
softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a
CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
Workqueue: rpciod rpc_async_schedule [sunrpc]
task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000
RIP: 0010:[<ffffffffa0064fd4>]  [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc]
RSP: 0018:ffff880079003d18  EFLAGS: 00000246
RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007
RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900
RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7
R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e
R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900
FS:  0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0
Stack:
 ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830
 ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260
 ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000
Call Trace:
 [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1
 [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc]
 [<ffffffff81052974>] process_one_work+0x211/0x3a5
 [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5
 [<ffffffff81052eeb>] worker_thread+0x134/0x202
 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
 [<ffffffff810584a0>] kthread+0xc9/0xd1
 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
 [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00

And the output of "rpcdebug -m rpc -s all":

RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refresh (status 0)
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org # 2.6.37+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
Jon Maloy says:

====================
tipc: link setup and failover improvements

This series consists of four unrelated commits with different purposes.

- Commit #1 is purely cosmetic and pedagogic, hopefully making the
  failover/tunneling logics slightly easier to understand.
- Commit #2 fixes a bug that has always been in the code, but was not
  discovered until very recently.
- Commit #3 fixes a non-fatal race issue in the neighbour discovery
  code.
- Commit #4 removes an unnecessary indirection step during link
  startup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
noglitch pushed a commit that referenced this pull request May 13, 2014
Avoid circular mutex lock by pushing the dev->lock to the .fini
callback on each extension.

As em28xx-dvb, em28xx-alsa and em28xx-rc have their own data
structures, and don't touch at the common structure during .fini,
only em28xx-v4l needs to be locked.

[   90.994317] ======================================================
[   90.994356] [ INFO: possible circular locking dependency detected ]
[   90.994395] 3.13.0-rc1+ #24 Not tainted
[   90.994427] -------------------------------------------------------
[   90.994458] khubd/54 is trying to acquire lock:
[   90.994490]  (&card->controls_rwsem){++++.+}, at: [<ffffffffa0177b08>] snd_ctl_dev_free+0x28/0x60 [snd]
[   90.994656]
[   90.994656] but task is already holding lock:
[   90.994688]  (&dev->lock){+.+.+.}, at: [<ffffffffa040db81>] em28xx_close_extension+0x31/0x90 [em28xx]
[   90.994843]
[   90.994843] which lock already depends on the new lock.
[   90.994843]
[   90.994874]
[   90.994874] the existing dependency chain (in reverse order) is:
[   90.994905]
-> #1 (&dev->lock){+.+.+.}:
[   90.995057]        [<ffffffff810b8fa3>] __lock_acquire+0xb43/0x1330
[   90.995121]        [<ffffffff810b9f82>] lock_acquire+0xa2/0x120
[   90.995182]        [<ffffffff816a5b6c>] mutex_lock_nested+0x5c/0x3c0
[   90.995245]        [<ffffffffa0422cca>] em28xx_vol_put_mute+0x1ba/0x1d0 [em28xx_alsa]
[   90.995309]        [<ffffffffa017813d>] snd_ctl_elem_write+0xfd/0x140 [snd]
[   90.995376]        [<ffffffffa01791c2>] snd_ctl_ioctl+0xe2/0x810 [snd]
[   90.995442]        [<ffffffff811db8b0>] do_vfs_ioctl+0x300/0x520
[   90.995504]        [<ffffffff811dbb51>] SyS_ioctl+0x81/0xa0
[   90.995568]        [<ffffffff816b1929>] system_call_fastpath+0x16/0x1b
[   90.995630]
-> #0 (&card->controls_rwsem){++++.+}:
[   90.995780]        [<ffffffff810b7a47>] check_prevs_add+0x947/0x950
[   90.995841]        [<ffffffff810b8fa3>] __lock_acquire+0xb43/0x1330
[   90.995901]        [<ffffffff810b9f82>] lock_acquire+0xa2/0x120
[   90.995962]        [<ffffffff816a762b>] down_write+0x3b/0xa0
[   90.996022]        [<ffffffffa0177b08>] snd_ctl_dev_free+0x28/0x60 [snd]
[   90.996088]        [<ffffffffa017a255>] snd_device_free+0x65/0x140 [snd]
[   90.996154]        [<ffffffffa017a751>] snd_device_free_all+0x61/0xa0 [snd]
[   90.996219]        [<ffffffffa0173af4>] snd_card_do_free+0x14/0x130 [snd]
[   90.996283]        [<ffffffffa0173f14>] snd_card_free+0x84/0x90 [snd]
[   90.996349]        [<ffffffffa0423397>] em28xx_audio_fini+0x97/0xb0 [em28xx_alsa]
[   90.996411]        [<ffffffffa040dba6>] em28xx_close_extension+0x56/0x90 [em28xx]
[   90.996475]        [<ffffffffa040f639>] em28xx_usb_disconnect+0x79/0x90 [em28xx]
[   90.996539]        [<ffffffff814a06e7>] usb_unbind_interface+0x67/0x1d0
[   90.996620]        [<ffffffff8142920f>] __device_release_driver+0x7f/0xf0
[   90.996682]        [<ffffffff814292a5>] device_release_driver+0x25/0x40
[   90.996742]        [<ffffffff81428b0c>] bus_remove_device+0x11c/0x1a0
[   90.996801]        [<ffffffff81425536>] device_del+0x136/0x1d0
[   90.996863]        [<ffffffff8149e0c0>] usb_disable_device+0xb0/0x290
[   90.996923]        [<ffffffff814930c5>] usb_disconnect+0xb5/0x1d0
[   90.996984]        [<ffffffff81495ab6>] hub_port_connect_change+0xd6/0xad0
[   90.997044]        [<ffffffff814967c3>] hub_events+0x313/0x9b0
[   90.997105]        [<ffffffff81496e95>] hub_thread+0x35/0x170
[   90.997165]        [<ffffffff8108ea2f>] kthread+0xff/0x120
[   90.997226]        [<ffffffff816b187c>] ret_from_fork+0x7c/0xb0
[   90.997287]
[   90.997287] other info that might help us debug this:
[   90.997287]
[   90.997318]  Possible unsafe locking scenario:
[   90.997318]
[   90.997348]        CPU0                    CPU1
[   90.997378]        ----                    ----
[   90.997408]   lock(&dev->lock);
[   90.997497]                                lock(&card->controls_rwsem);
[   90.997607]                                lock(&dev->lock);
[   90.997697]   lock(&card->controls_rwsem);
[   90.997786]
[   90.997786]  *** DEADLOCK ***
[   90.997786]
[   90.997817] 5 locks held by khubd/54:
[   90.997847]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff81496564>] hub_events+0xb4/0x9b0
[   90.998025]  #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff81493076>] usb_disconnect+0x66/0x1d0
[   90.998204]  #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff8142929d>] device_release_driver+0x1d/0x40
[   90.998383]  #3:  (em28xx_devlist_mutex){+.+.+.}, at: [<ffffffffa040db77>] em28xx_close_extension+0x27/0x90 [em28xx]
[   90.998567]  #4:  (&dev->lock){+.+.+.}, at: [<ffffffffa040db81>] em28xx_close_extension+0x31/0x90 [em28xx]

Reviewed-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Tested-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
Fix broken inline assembly contraints for cmpxchg64 on 32bit.

Fixes this crash:

specification exception: 0006 [#1] SMP
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0 #4
task: 005a16c8 ti: 00592000 task.ti: 00592000
Krnl PSW : 070ce000 8029abd6 (lockref_get+0x3e/0x9c)
...
Krnl Code: 8029abcc: a71a0001           ahi     %r1,1
           8029abd0: 1852               lr      %r5,%r2
          #8029abd2: bb40f064           cds     %r4,%r0,100(%r15)
          >8029abd6: 1943               cr      %r4,%r3
           8029abd8: 1815               lr      %r1,%r5
Call Trace:
([<0000000078e01870>] 0x78e01870)
 [<000000000021105a>] sysfs_mount+0xd2/0x1c8
 [<00000000001b551e>] mount_fs+0x3a/0x134
 [<00000000001ce768>] vfs_kern_mount+0x44/0x11c
 [<00000000001ce864>] kern_mount_data+0x24/0x3c
 [<00000000005cc4b8>] sysfs_init+0x74/0xd4
 [<00000000005cb5b4>] mnt_init+0xe0/0x1fc
 [<00000000005cb16a>] vfs_caches_init+0xb6/0x14c
 [<00000000005be794>] start_kernel+0x318/0x33c
 [<000000000010001c>] _stext+0x1c/0x80

Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
skb_kill_datagram() does not dequeue the skb when MSG_PEEK is unset.
This leaves a free'd skb on the queue, resulting a double-free later.

Without this, the following oops can occur:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8154fcf7>] skb_dequeue+0x47/0x70
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: af_rxrpc ...
CPU: 0 PID: 1191 Comm: listen Not tainted 3.12.0+ #4
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8801183536b0 ti: ffff880035c92000 task.ti: ffff880035c92000
RIP: 0010:[<ffffffff8154fcf7>] skb_dequeue+0x47/0x70
RSP: 0018:ffff880035c93db8  EFLAGS: 00010097
RAX: 0000000000000246 RBX: ffff8800d2754b00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8800d254c084
RBP: ffff880035c93dd0 R08: ffff880035c93cf0 R09: ffff8800d968f270
R10: 0000000000000000 R11: 0000000000000293 R12: ffff8800d254c070
R13: ffff8800d254c084 R14: ffff8800cd861240 R15: ffff880119b39720
FS:  00007f37a969d740(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 00000000d4413000 CR4: 00000000000006f0
Stack:
 ffff8800d254c000 ffff8800d254c070 ffff8800d254c2c0 ffff880035c93df8
 ffffffffa041a5b8 ffff8800cd844c80 ffffffffa04385a0 ffff8800cd844cb0
 ffff880035c93e18 ffffffff81546cef ffff8800d45fea00 0000000000000008
Call Trace:
 [<ffffffffa041a5b8>] rxrpc_release+0x128/0x2e0 [af_rxrpc]
 [<ffffffff81546cef>] sock_release+0x1f/0x80
 [<ffffffff81546d62>] sock_close+0x12/0x20
 [<ffffffff811aaba1>] __fput+0xe1/0x230
 [<ffffffff811aad3e>] ____fput+0xe/0x10
 [<ffffffff810862cc>] task_work_run+0xbc/0xe0
 [<ffffffff8106a3be>] do_exit+0x2be/0xa10
 [<ffffffff8116dc47>] ? do_munmap+0x297/0x3b0
 [<ffffffff8106ab8f>] do_group_exit+0x3f/0xa0
 [<ffffffff8106ac04>] SyS_exit_group+0x14/0x20
 [<ffffffff8166b069>] system_call_fastpath+0x16/0x1b


Signed-off-by: Tim Smith <tim@electronghost.co.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
Given now we have 2 spinlock for management of delayed refs,
CONFIG_DEBUG_SPINLOCK=y helped me find this,

[ 4723.413809] BUG: spinlock wrong CPU on CPU#1, btrfs-transacti/2258
[ 4723.414882]  lock: 0xffff880048377670, .magic: dead4ead, .owner: btrfs-transacti/2258, .owner_cpu: 2
[ 4723.417146] CPU: 1 PID: 2258 Comm: btrfs-transacti Tainted: G        W  O 3.12.0+ #4
[ 4723.421321] Call Trace:
[ 4723.421872]  [<ffffffff81680fe7>] dump_stack+0x54/0x74
[ 4723.422753]  [<ffffffff81681093>] spin_dump+0x8c/0x91
[ 4723.424979]  [<ffffffff816810b9>] spin_bug+0x21/0x26
[ 4723.425846]  [<ffffffff81323956>] do_raw_spin_unlock+0x66/0x90
[ 4723.434424]  [<ffffffff81689bf7>] _raw_spin_unlock+0x27/0x40
[ 4723.438747]  [<ffffffffa015da9e>] btrfs_cleanup_one_transaction+0x35e/0x710 [btrfs]
[ 4723.443321]  [<ffffffffa015df54>] btrfs_cleanup_transaction+0x104/0x570 [btrfs]
[ 4723.444692]  [<ffffffff810c1b5d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 4723.450336]  [<ffffffff810c1c2d>] ? trace_hardirqs_on+0xd/0x10
[ 4723.451332]  [<ffffffffa015e5ee>] transaction_kthread+0x22e/0x270 [btrfs]
[ 4723.452543]  [<ffffffffa015e3c0>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[ 4723.457833]  [<ffffffff81079efa>] kthread+0xea/0xf0
[ 4723.458990]  [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
[ 4723.460133]  [<ffffffff81692aac>] ret_from_fork+0x7c/0xb0
[ 4723.460865]  [<ffffffff81079e10>] ? kthread_create_on_node+0x140/0x140
[ 4723.496521] ------------[ cut here ]------------

----------------------------------------------------------------------

The reason is that we get to call cond_resched_lock(&head_ref->lock) while
still holding @delayed_refs->lock.

So it's different with __btrfs_run_delayed_refs(), where we do drop-acquire
dance before and after actually processing delayed refs.

Here we don't drop the lock, others are not able to add new delayed refs to
head_ref, so cond_resched_lock(&head_ref->lock) is not necessary here.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
Guenter Roeck has got the following call trace on a p2020 board:
  Kernel stack overflow in process eb3e5a00, r1=eb79df90
  CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
  task: eb3e5a00 ti: c0616000 task.ti: ef440000
  NIP: c003a420 LR: c003a410 CTR: c0017518
  REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
  MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
  GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
  GPR08: 00000000 020b8000 00000000 00000000 44008442
  NIP [c003a420] __do_softirq+0x94/0x1ec
  LR [c003a410] __do_softirq+0x84/0x1ec
  Call Trace:
  [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
  [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
  [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
  [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
  [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
  --- Exception: 501 at 0xfcda524
      LR = 0x10024900
  Instruction dump:
  7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
  5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
  Kernel panic - not syncing: kernel stack overflow
  CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
  Call Trace:

The reason is that we have used the wrong register to calculate the
ksp_limit in commit cbc9565 (powerpc: Remove ksp_limit on ppc64).
Just fix it.

As suggested by Benjamin Herrenschmidt, also add the C prototype of the
function in the comment in order to avoid such kind of errors in the
future.

Cc: stable@vger.kernel.org # 3.12
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
noglitch pushed a commit that referenced this pull request May 13, 2014
Currently we're using GFP_KERNEL, however there are some path(s) where we
can hold some spinlocks, specifically bond->curr_slave_lock:

[    4.722916] BUG: sleeping function called from invalid context at mm/slub.c:965
[    4.724438] in_atomic(): 1, irqs_disabled(): 0, pid: 940, name: ifup-eth
[    4.726034] 5 locks held by ifup-eth/940:
...snip...
[    4.734646]  #4:  (&bond->curr_slave_lock){+...+.}, at: [<ffffffffa00badc6>] bond_enslave+0xda6/0xdd0 [bonding]
...snip...
[    4.759081]  [<ffffffffa00b6f11>] bond_change_active_slave+0x191/0x3b0 [bonding]
[    4.760917]  [<ffffffffa00b7227>] bond_select_active_slave+0xf7/0x1d0 [bonding]
[    4.762751]  [<ffffffffa00badce>] bond_enslave+0xdae/0xdd0 [bonding]
...snip...

As it's out of hot path and is a really rare event - change the gfp_t flags
to GFP_ATOMIC to avoid sleeping under spinlock.

v2: convert new notify calls to GFP_ATOMIC.

CC: Thomas Glanzmann <thomas@glanzmann.de>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
noglitch pushed a commit that referenced this pull request May 13, 2014
Zoltan Kiss says:

====================
xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower AMD
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 IP stack through deliver_skb, which is due to this [2] patch. This affects
DomU->Dom0 IP traffic and when Dom0 does routing/NAT for the guest. That's a bit
unfortunate, but luckily it doesn't cause a major regression for this usecase.
In the future we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Use skb->cb to store pending_idx
2: Some refactoring
3: Change RX path for mapped SKB fragments (moved here to keep bisectability,
review it after #4)
4: Introduce TX grant mapping
5: Remove old TX grant copy definitons and fix indentations
6: Add stat counters for zerocopy
7: Handle guests with too many frags
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.

v4: Now we are using a new grant mapping API to avoid m2p_override. The RX queue
timeout logic changed also.

v5: Only minor fixes based on Wei's comments

v6: Important bugfixes for xenvif_poll exit path and zerocopy callback, see
first 2 patches. Also rework of handling packets with too many slots, and
reorder the series a bit.

v7: Small fixes in comments/log messages/error paths, and merging the frag
overflow stats patch into its parent.

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363
====================

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
noglitch pushed a commit that referenced this pull request May 13, 2014
[13654.480669] ======================================================
[13654.480905] [ INFO: possible circular locking dependency detected ]
[13654.481003] 3.12.0+ #4 Tainted: G        W  O
[13654.481060] -------------------------------------------------------
[13654.481060] btrfs-transacti/9347 is trying to acquire lock:
[13654.481060]  (&(&root->ordered_extent_lock)->rlock){+.+...}, at: [<ffffffffa02d30a1>] btrfs_cleanup_transaction+0x271/0x570 [btrfs]
[13654.481060] but task is already holding lock:
[13654.481060]  (&(&fs_info->ordered_root_lock)->rlock){+.+...}, at: [<ffffffffa02d3015>] btrfs_cleanup_transaction+0x1e5/0x570 [btrfs]
[13654.481060] which lock already depends on the new lock.

[13654.481060] the existing dependency chain (in reverse order) is:
[13654.481060] -> #1 (&(&fs_info->ordered_root_lock)->rlock){+.+...}:
[13654.481060]        [<ffffffff810c4103>] lock_acquire+0x93/0x130
[13654.481060]        [<ffffffff81689991>] _raw_spin_lock+0x41/0x50
[13654.481060]        [<ffffffffa02f011b>] __btrfs_add_ordered_extent+0x39b/0x450 [btrfs]
[13654.481060]        [<ffffffffa02f0202>] btrfs_add_ordered_extent+0x32/0x40 [btrfs]
[13654.481060]        [<ffffffffa02df6aa>] run_delalloc_nocow+0x78a/0x9d0 [btrfs]
[13654.481060]        [<ffffffffa02dfc0d>] run_delalloc_range+0x31d/0x390 [btrfs]
[13654.481060]        [<ffffffffa02f7c00>] __extent_writepage+0x310/0x780 [btrfs]
[13654.481060]        [<ffffffffa02f830a>] extent_write_cache_pages.isra.29.constprop.48+0x29a/0x410 [btrfs]
[13654.481060]        [<ffffffffa02f879d>] extent_writepages+0x4d/0x70 [btrfs]
[13654.481060]        [<ffffffffa02d9f68>] btrfs_writepages+0x28/0x30 [btrfs]
[13654.481060]        [<ffffffff8114be91>] do_writepages+0x21/0x50
[13654.481060]        [<ffffffff81140d49>] __filemap_fdatawrite_range+0x59/0x60
[13654.481060]        [<ffffffff81140e13>] filemap_fdatawrite_range+0x13/0x20
[13654.481060]        [<ffffffffa02f1db9>] btrfs_wait_ordered_range+0x49/0x140 [btrfs]
[13654.481060]        [<ffffffffa0318fe2>] __btrfs_write_out_cache+0x682/0x8b0 [btrfs]
[13654.481060]        [<ffffffffa031952d>] btrfs_write_out_cache+0x8d/0xe0 [btrfs]
[13654.481060]        [<ffffffffa02c7083>] btrfs_write_dirty_block_groups+0x593/0x680 [btrfs]
[13654.481060]        [<ffffffffa0345307>] commit_cowonly_roots+0x14b/0x20d [btrfs]
[13654.481060]        [<ffffffffa02d7c1a>] btrfs_commit_transaction+0x43a/0x9d0 [btrfs]
[13654.481060]        [<ffffffffa030061a>] btrfs_create_uuid_tree+0x5a/0x100 [btrfs]
[13654.481060]        [<ffffffffa02d5a8a>] open_ctree+0x21da/0x2210 [btrfs]
[13654.481060]        [<ffffffffa02ab6fe>] btrfs_mount+0x68e/0x870 [btrfs]
[13654.481060]        [<ffffffff811b2409>] mount_fs+0x39/0x1b0
[13654.481060]        [<ffffffff811cd653>] vfs_kern_mount+0x63/0xf0
[13654.481060]        [<ffffffff811cfcce>] do_mount+0x23e/0xa90
[13654.481060]        [<ffffffff811d05a3>] SyS_mount+0x83/0xc0
[13654.481060]        [<ffffffff81692b52>] system_call_fastpath+0x16/0x1b
[13654.481060] -> #0 (&(&root->ordered_extent_lock)->rlock){+.+...}:
[13654.481060]        [<ffffffff810c340a>] __lock_acquire+0x150a/0x1a70
[13654.481060]        [<ffffffff810c4103>] lock_acquire+0x93/0x130
[13654.481060]        [<ffffffff81689991>] _raw_spin_lock+0x41/0x50
[13654.481060]        [<ffffffffa02d30a1>] btrfs_cleanup_transaction+0x271/0x570 [btrfs]
[13654.481060]        [<ffffffffa02d35ce>] transaction_kthread+0x22e/0x270 [btrfs]
[13654.481060]        [<ffffffff81079efa>] kthread+0xea/0xf0
[13654.481060]        [<ffffffff81692aac>] ret_from_fork+0x7c/0xb0
[13654.481060] other info that might help us debug this:

[13654.481060]  Possible unsafe locking scenario:

[13654.481060]        CPU0                    CPU1
[13654.481060]        ----                    ----
[13654.481060]   lock(&(&fs_info->ordered_root_lock)->rlock);
[13654.481060]				 lock(&(&root->ordered_extent_lock)->rlock);
[13654.481060]				 lock(&(&fs_info->ordered_root_lock)->rlock);
[13654.481060]   lock(&(&root->ordered_extent_lock)->rlock);
[13654.481060]
 *** DEADLOCK ***
[...]

======================================================

btrfs_destroy_all_ordered_extents()
gets &fs_info->ordered_root_lock __BEFORE__ acquiring &root->ordered_extent_lock,
while btrfs_[add,remove]_ordered_extent()
acquires &fs_info->ordered_root_lock __AFTER__ getting &root->ordered_extent_lock.

This patch fixes the above problem.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
vmxnet3's netpoll driver is incorrectly coded.  It directly calls
vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
controller method doesn't block real napi polls in any way, there is a potential
for race conditions in which the netpoll controller method and the napi poll
method run concurrently.  The result is data corruption causing panics such as this
one recently observed:
PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
 #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
 #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
 #3 [ffff88023abd58e0] die at ffffffff81010e0b
 #4 [ffff88023abd5910] do_trap at ffffffff8152add4
 #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
 #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
    [exception RIP: vmxnet3_rq_rx_complete+1968]
    RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
    RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
    RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
    R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
    R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
 #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
 #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7

The fix is to do as other drivers do, and have the poll controller call the top
half interrupt handler, which schedules a napi poll properly to recieve frames

Tested by myself, successfully.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shreyas Bhatewara <sbhatewara@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: stable@vger.kernel.org
Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
noglitch pushed a commit that referenced this pull request May 13, 2014
…to next/dt

Merge "mvebu dt changes for v3.15 (incremental #4)" from Jason Cooper:

 - dove
    - add system controller node
    - drop pinctrl PMU reg property _before_ it hits mainline and becomes ABI

 - mvebu
    - XP/370
       - change default PCIe apertures
       - switch GP and DB boards internal registers to 0xf1000000
       - correct RAM size on Matrix board
    - 385
       - correct phy connection type for DB board
       - add RD board

* tag 'mvebu-dt-3.15-4' of git://git.infradead.org/linux-mvebu:
  ARM: dove: drop pinctrl PMU reg property
  ARM: mvebu: add Device Tree for the Armada 385 RD board
  ARM: mvebu: use the correct phy connection mode on Armada 385 DB
  ARM: mvebu: the Armada XP Matrix board has 4 GB
  ARM: mvebu: switch the Armada XP GP to use internal registers at 0xf1000000
  ARM: mvebu: switch the Armada XP DB to use internal registers at 0xf1000000
  ARM: mvebu: change the default PCIe apertures for Armada 370/XP
  ARM: dove: add system controller node

Conflicts:
	arch/arm/boot/dts/Makefile

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
noglitch pushed a commit that referenced this pull request May 13, 2014
I'm transitioning maintainership of the xHCI driver to my colleague,
Mathias Nyman.  The xHCI driver is in good shape, and it's time for me
to move on to the next shiny thing. :)

There's a few known outstanding bugs that we have plans for how to fix:

1. Clear Halt issue that means some USB scanners fail after one scan
2. TD fragment issue that means USB ethernet scatter-gather doesn't work
3. xHCI command queue issues that cause the driver to die when a USB
   device doesn't respond to a Set Address control transfer when another
   command is outstanding.
4. USB port power off for Haswell-ULT is a complete disaster.

Mathias is putting the finishing touches on a fix for #3, which will
make it much easier to craft a solution for #1.  Dan William has an
ACKed RFC for #4 that may land in 3.16, after much testing.  I'm working
with Mathias to come up with an architectural solution for #2.

I don't foresee very many big features coming down the pipe for USB
(which is part of the reason it's a good time to change now).  SSIC is
mostly a hardware-level change (perhaps with some PHY drivers needed),
USB 3.1 is again mostly a hardware-level change with some software
engineering to communicate the speed increase to the device drivers, add
new device descriptor parsing to lsusb, but definitely nothing as big as
USB 3.0 was.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@intel.com>
noglitch pushed a commit that referenced this pull request May 13, 2014
There are two problematic situations.

A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.

The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:

       CPU#0                            CPU#1
  ovs_flow_stats_get()
   stats_read()
 +->spin_lock remote CPU#1        ovs_flow_stats_get()
 |  <interrupted>                  stats_read()
 |  ...                       +-->  spin_lock remote CPU#0
 |                            |     <interrupted>
 |  ovs_flow_stats_update()   |     ...
 |   spin_lock local CPU#0 <--+     ovs_flow_stats_update()
 +---------------------------------- spin_lock local CPU#1

This patch disables BH for both cases fixing the deadlocks.
Acked-by: Jesse Gross <jesse@nicira.com>

=================================
[ INFO: inconsistent lock state ]
3.14.0-rc8-00007-g632b06a #1 Tainted: G          I
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
(&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40
[<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
[<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
[<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
[<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
[<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0
[<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0
[<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff816c0798>] genl_rcv+0x28/0x40
[<ffffffff816bf830>] netlink_unicast+0x100/0x1e0
[<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770
[<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0
[<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0
[<ffffffff8166a911>] __sys_sendmsg+0x51/0x90
[<ffffffff8166a962>] SyS_sendmsg+0x12/0x20
[<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b
irq event stamp: 1740726
hardirqs last  enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840
hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840
softirqs last  enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50
softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cpu_stats->lock)->rlock);
  <Interrupt>
    lock(&(&cpu_stats->lock)->rlock);

 *** DEADLOCK ***

5 locks held by swapper/0/0:
 #0:  (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320
 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0
 #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840
 #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0
 #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I  3.14.0-rc8-00007-g632b06a #1
Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
 0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
 ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
 ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
Call Trace:
 <IRQ>  [<ffffffff817cfe3c>] dump_stack+0x4d/0x66
 [<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205
 [<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180
 [<ffffffff810f8963>] mark_lock+0x223/0x2b0
 [<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40
 [<ffffffff810f5707>] ? __lock_is_held+0x57/0x80
 [<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch]
 [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
 [<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40
 [<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
 [<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch]
 [<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch]
 [<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0
 [<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0
 [<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0
 [<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840
 [<ffffffff8168c430>] dev_queue_xmit+0x10/0x20
 [<ffffffff8175d641>] ip6_finish_output2+0x551/0x840
 [<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220
 [<ffffffff8176128a>] ip6_finish_output+0x9a/0x220
 [<ffffffff8176145f>] ip6_output+0x4f/0x1f0
 [<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0
 [<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50
 [<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0
 [<ffffffff810a71e9>] call_timer_fn+0x99/0x320
 [<ffffffff810a7155>] ? call_timer_fn+0x5/0x320
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0
 [<ffffffff8109d47d>] __do_softirq+0x12d/0x480

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
noglitch pushed a commit that referenced this pull request May 13, 2014
virt_to_page() is incredibly inefficient when virt-to-phys patching is
enabled.  This is because we end up with this calculation:

  page = &mem_map[asm virt_to_phys(addr) >> 12 - __pv_phys_offset >> 12]

in assembly.  The asm virt_to_phys() is equivalent this this operation:

  addr - PAGE_OFFSET + __pv_phys_offset

and we can see that because this is assembly, the compiler has no chance
to optimise some of that away.  This should reduce down to:

  page = &mem_map[(addr - PAGE_OFFSET) >> 12]

for the common cases.  Permit the compiler to make this optimisation by
giving it more of the information it needs - do this by providing a
virt_to_pfn() macro.

Another issue which makes this more complex is that __pv_phys_offset is
a 64-bit type on all platforms.  This is needlessly wasteful - if we
store the physical offset as a PFN, we can save a lot of work having
to deal with 64-bit values, which sometimes ends up producing incredibly
horrid code:

     a4c:       e3009000        movw    r9, #0
                        a4c: R_ARM_MOVW_ABS_NC  __pv_phys_offset
     a50:       e3409000        movt    r9, #0          ; r9 = &__pv_phys_offset
                        a50: R_ARM_MOVT_ABS     __pv_phys_offset
     a54:       e3002000        movw    r2, #0
                        a54: R_ARM_MOVW_ABS_NC  __pv_phys_offset
     a58:       e3402000        movt    r2, #0          ; r2 = &__pv_phys_offset
                        a58: R_ARM_MOVT_ABS     __pv_phys_offset
     a5c:       e5999004        ldr     r9, [r9, #4]    ; r9 = high word of __pv_phys_offset
     a60:       e3001000        movw    r1, #0
                        a60: R_ARM_MOVW_ABS_NC  mem_map
     a64:       e592c000        ldr     ip, [r2]        ; ip = low word of __pv_phys_offset

Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
noglitch pushed a commit that referenced this pull request May 13, 2014
With EXT4FS_DEBUG ext4_count_free_clusters() will call
ext4_read_block_bitmap() without s_group_info initialized, so we need to
initialize multi-block allocator before.

And dependencies that must be solved, to allow this:
- multi-block allocator needs in group descriptors
- need to install s_op before initializing multi-block allocator,
  because in ext4_mb_init_backend() new inode is created.
- initialize number of group desc blocks (s_gdb_count) otherwise
  number of clusters returned by ext4_free_clusters_after_init() is not correct.
  (see ext4_bg_num_gdb_nometa())

Here is the stack backtrace:

(gdb) bt
 #0  ext4_get_group_info (group=0, sb=0xffff880079a10000) at ext4.h:2430
 #1  ext4_validate_block_bitmap (sb=sb@entry=0xffff880079a10000,
     desc=desc@entry=0xffff880056510000, block_group=block_group@entry=0,
     bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:358
 #2  0xffffffff81232202 in ext4_wait_block_bitmap (sb=sb@entry=0xffff880079a10000,
     block_group=block_group@entry=0,
     bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:476
 #3  0xffffffff81232eaf in ext4_read_block_bitmap (sb=sb@entry=0xffff880079a10000,
     block_group=block_group@entry=0) at balloc.c:489
 #4  0xffffffff81232fc0 in ext4_count_free_clusters (sb=sb@entry=0xffff880079a10000) at balloc.c:665
 #5  0xffffffff81259ffa in ext4_check_descriptors (first_not_zeroed=<synthetic pointer>,
     sb=0xffff880079a10000) at super.c:2143
 #6  ext4_fill_super (sb=sb@entry=0xffff880079a10000, data=<optimized out>,
     data@entry=0x0 <irq_stack_union>, silent=silent@entry=0) at super.c:3851
     ...

Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
noglitch pushed a commit that referenced this pull request May 27, 2014
There was a deadlock in monitor mode when we were setting the
channel if the channel was not 1.

======================================================
[ INFO: possible circular locking dependency detected ]
3.14.3 #4 Not tainted
-------------------------------------------------------
iw/3323 is trying to acquire lock:
 (&local->chanctx_mtx){+.+.+.}, at: [<ffffffffa062e2f2>] ieee80211_vif_release_channel+0x42/0xb0 [mac80211]

but task is already holding lock:
 (&local->iflist_mtx){+.+...}, at: [<ffffffffa0609e0a>] ieee80211_set_monitor_channel+0x5a/0x1b0 [mac80211]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&local->iflist_mtx){+.+...}:
       [<ffffffff810d95bb>] __lock_acquire+0xb3b/0x13b0
       [<ffffffff810d9ee0>] lock_acquire+0xb0/0x1f0
       [<ffffffff817eb9c8>] mutex_lock_nested+0x78/0x4f0
       [<ffffffffa06225cf>] ieee80211_iterate_active_interfaces+0x2f/0x60 [mac80211]
       [<ffffffffa0518189>] iwl_mvm_recalc_multicast+0x49/0xa0 [iwlmvm]
       [<ffffffffa051822e>] iwl_mvm_configure_filter+0x4e/0x70 [iwlmvm]
       [<ffffffffa05e6d43>] ieee80211_configure_filter+0x153/0x5f0 [mac80211]
       [<ffffffffa05e71f5>] ieee80211_reconfig_filter+0x15/0x20 [mac80211]
       [snip]

-> #1 (&mvm->mutex){+.+.+.}:
       [<ffffffff810d95bb>] __lock_acquire+0xb3b/0x13b0
       [<ffffffff810d9ee0>] lock_acquire+0xb0/0x1f0
       [<ffffffff817eb9c8>] mutex_lock_nested+0x78/0x4f0
       [<ffffffffa0517246>] iwl_mvm_add_chanctx+0x56/0xe0 [iwlmvm]
       [<ffffffffa062ca1e>] ieee80211_new_chanctx+0x13e/0x410 [mac80211]
       [<ffffffffa062d953>] ieee80211_vif_use_channel+0x1c3/0x5a0 [mac80211]
       [<ffffffffa06035ab>] ieee80211_add_virtual_monitor+0x1ab/0x6b0 [mac80211]
       [<ffffffffa06052ea>] ieee80211_do_open+0xe6a/0x15a0 [mac80211]
       [<ffffffffa0605a79>] ieee80211_open+0x59/0x60 [mac80211]
       [snip]

-> #0 (&local->chanctx_mtx){+.+.+.}:
       [<ffffffff810d6cb7>] check_prevs_add+0x977/0x980
       [<ffffffff810d95bb>] __lock_acquire+0xb3b/0x13b0
       [<ffffffff810d9ee0>] lock_acquire+0xb0/0x1f0
       [<ffffffff817eb9c8>] mutex_lock_nested+0x78/0x4f0
       [<ffffffffa062e2f2>] ieee80211_vif_release_channel+0x42/0xb0 [mac80211]
       [<ffffffffa0609ec3>] ieee80211_set_monitor_channel+0x113/0x1b0 [mac80211]
       [<ffffffffa058fb37>] cfg80211_set_monitor_channel+0x77/0x2b0 [cfg80211]
       [<ffffffffa056e0b2>] __nl80211_set_channel+0x122/0x140 [cfg80211]
       [<ffffffffa0581374>] nl80211_set_wiphy+0x284/0xaf0 [cfg80211]
       [snip]

other info that might help us debug this:

Chain exists of:
  &local->chanctx_mtx --> &mvm->mutex --> &local->iflist_mtx

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&local->iflist_mtx);
                               lock(&mvm->mutex);
                               lock(&local->iflist_mtx);
  lock(&local->chanctx_mtx);

 *** DEADLOCK ***

This deadlock actually occurs:
INFO: task iw:3323 blocked for more than 120 seconds.
      Not tainted 3.14.3 #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
iw              D ffff8800c8afcd80  4192  3323   3322 0x00000000
 ffff880078fdb7e0 0000000000000046 ffff8800c8afcd80 ffff880078fdbfd8
 00000000001d5540 00000000001d5540 ffff8801141b0000 ffff8800c8afcd80
 ffff880078ff9e38 ffff880078ff9e38 ffff880078ff9e40 0000000000000246
Call Trace:
 [<ffffffff817ea841>] schedule_preempt_disabled+0x31/0x80
 [<ffffffff817ebaed>] mutex_lock_nested+0x19d/0x4f0
 [<ffffffffa06225cf>] ? ieee80211_iterate_active_interfaces+0x2f/0x60 [mac80211]
 [<ffffffffa06225cf>] ? ieee80211_iterate_active_interfaces+0x2f/0x60 [mac80211]
 [<ffffffffa052a680>] ? iwl_mvm_power_mac_update_mode+0xc0/0xc0 [iwlmvm]
 [<ffffffffa06225cf>] ieee80211_iterate_active_interfaces+0x2f/0x60 [mac80211]
 [<ffffffffa0529357>] _iwl_mvm_power_update_binding+0x27/0x80 [iwlmvm]
 [<ffffffffa0516eb1>] iwl_mvm_unassign_vif_chanctx+0x81/0xc0 [iwlmvm]
 [<ffffffffa062d3ff>] __ieee80211_vif_release_channel+0xdf/0x470 [mac80211]
 [<ffffffffa062e2fa>] ieee80211_vif_release_channel+0x4a/0xb0 [mac80211]
 [<ffffffffa0609ec3>] ieee80211_set_monitor_channel+0x113/0x1b0 [mac80211]
 [<ffffffffa058fb37>] cfg80211_set_monitor_channel+0x77/0x2b0 [cfg80211]
 [<ffffffffa056e0b2>] __nl80211_set_channel+0x122/0x140 [cfg80211]
 [<ffffffffa0581374>] nl80211_set_wiphy+0x284/0xaf0 [cfg80211]

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=75541

Cc: <stable@vger.kernel.org> [3.13+]
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
noglitch pushed a commit that referenced this pull request May 27, 2014
After 96d365e ("cgroup: make css_set_lock a rwsem and rename it
to css_set_rwsem"), css task iterators requires sleepable context as
it may block on css_set_rwsem.  I missed that cgroup_freezer was
iterating tasks under IRQ-safe spinlock freezer->lock.  This leads to
errors like the following on freezer state reads and transitions.

  BUG: sleeping function called from invalid context at /work
 /os/work/kernel/locking/rwsem.c:20
  in_atomic(): 0, irqs_disabled(): 0, pid: 462, name: bash
  5 locks held by bash/462:
   #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff811f0843>] vfs_write+0x1a3/0x1c0
   #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8126d78b>] kernfs_fop_write+0xbb/0x170
   #2:  (s_active#70){.+.+.+}, at: [<ffffffff8126d793>] kernfs_fop_write+0xc3/0x170
   #3:  (freezer_mutex){+.+...}, at: [<ffffffff81135981>] freezer_write+0x61/0x1e0
   #4:  (rcu_read_lock){......}, at: [<ffffffff81135973>] freezer_write+0x53/0x1e0
  Preemption disabled at:[<ffffffff81104404>] console_unlock+0x1e4/0x460

  CPU: 3 PID: 462 Comm: bash Not tainted 3.15.0-rc1-work+ #10
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
   ffff88000916a6d0 ffff88000e0a3da0 ffffffff81cf8c96 0000000000000000
   ffff88000e0a3dc8 ffffffff810cf4f2 ffffffff82388040 ffff880013aaf740
   0000000000000002 ffff88000e0a3de8 ffffffff81d05974 0000000000000246
  Call Trace:
   [<ffffffff81cf8c96>] dump_stack+0x4e/0x7a
   [<ffffffff810cf4f2>] __might_sleep+0x162/0x260
   [<ffffffff81d05974>] down_read+0x24/0x60
   [<ffffffff81133e87>] css_task_iter_start+0x27/0x70
   [<ffffffff8113584d>] freezer_apply_state+0x5d/0x130
   [<ffffffff81135a16>] freezer_write+0xf6/0x1e0
   [<ffffffff8112eb88>] cgroup_file_write+0xd8/0x230
   [<ffffffff8126d7b7>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f0756>] vfs_write+0xb6/0x1c0
   [<ffffffff811f121d>] SyS_write+0x4d/0xc0
   [<ffffffff81d08292>] system_call_fastpath+0x16/0x1b

freezer->lock used to be used in hot paths but that time is long gone
and there's no reason for the lock to be IRQ-safe spinlock or even
per-cgroup.  In fact, given the fact that a cgroup may contain large
number of tasks, it's not a good idea to iterate over them while
holding IRQ-safe spinlock.

Let's simplify locking by replacing per-cgroup freezer->lock with
global freezer_mutex.  This also makes the comments explaining the
intricacies of policy inheritance and the locking around it as the
states are protected by a common mutex.

The conversion is mostly straight-forward.  The followings are worth
mentioning.

* freezer_css_online() no longer needs double locking.

* freezer_attach() now performs propagation simply while holding
  freezer_mutex.  update_if_frozen() race no longer exists and the
  comment is removed.

* freezer_fork() now tests whether the task is in root cgroup using
  the new task_css_is_root() without doing rcu_read_lock/unlock().  If
  not, it grabs freezer_mutex and performs the operation.

* freezer_read() and freezer_change_state() grab freezer_mutex across
  the whole operation and pin the css while iterating so that each
  descendant processing happens in sleepable context.

Fixes: 96d365e ("cgroup: make css_set_lock a rwsem and rename it to css_set_rwsem")
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
ca22e56 (driver-core: implement 'sysdev' functionality for regular
devices and buses) has introduced bus_register macro with a static
key to distinguish different subsys mutex classes.

This however doesn't work for different subsys which use a common
registering function. One example is subsys_system_register (and
mce_device and cpu_device).

In the end this leads to the following lockdep splat:
[  207.271924] ======================================================
[  207.271932] [ INFO: possible circular locking dependency detected ]
[  207.271942] 3.9.0-rc1-0.7-default+ linux4sam#34 Not tainted
[  207.271948] -------------------------------------------------------
[  207.271957] bash/10493 is trying to acquire lock:
[  207.271963]  (subsys mutex){+.+.+.}, at: [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0
[  207.271987]
[  207.271987] but task is already holding lock:
[  207.271995]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60
[  207.272012]
[  207.272012] which lock already depends on the new lock.
[  207.272012]
[  207.272023]
[  207.272023] the existing dependency chain (in reverse order) is:
[  207.272033]
[  207.272033] -> linux4sam#4 (cpu_hotplug.lock){+.+.+.}:
[  207.272044]        [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.272056]        [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[  207.272069]        [<ffffffff81046ba9>] get_online_cpus+0x29/0x40
[  207.272082]        [<ffffffff81185210>] drain_all_stock+0x30/0x150
[  207.272094]        [<ffffffff811853da>] mem_cgroup_reclaim+0xaa/0xe0
[  207.272104]        [<ffffffff8118775e>] __mem_cgroup_try_charge+0x51e/0xcf0
[  207.272114]        [<ffffffff81188486>] mem_cgroup_charge_common+0x36/0x60
[  207.272125]        [<ffffffff811884da>] mem_cgroup_newpage_charge+0x2a/0x30
[  207.272135]        [<ffffffff81150531>] do_wp_page+0x231/0x830
[  207.272147]        [<ffffffff8115151e>] handle_pte_fault+0x19e/0x8d0
[  207.272157]        [<ffffffff81151da8>] handle_mm_fault+0x158/0x1e0
[  207.272166]        [<ffffffff814b6153>] do_page_fault+0x2a3/0x4e0
[  207.272178]        [<ffffffff814b2578>] page_fault+0x28/0x30
[  207.272189]
[  207.272189] -> linux4sam#3 (&mm->mmap_sem){++++++}:
[  207.272199]        [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.272208]        [<ffffffff8114c5ad>] might_fault+0x6d/0x90
[  207.272218]        [<ffffffff811a11e3>] filldir64+0xb3/0x120
[  207.272229]        [<ffffffffa013fc19>] call_filldir+0x89/0x130 [ext3]
[  207.272248]        [<ffffffffa0140377>] ext3_readdir+0x6b7/0x7e0 [ext3]
[  207.272263]        [<ffffffff811a1519>] vfs_readdir+0xa9/0xc0
[  207.272273]        [<ffffffff811a15cb>] sys_getdents64+0x9b/0x110
[  207.272284]        [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[  207.272296]
[  207.272296] -> linux4sam#2 (&type->i_mutex_dir_key#3){+.+.+.}:
[  207.272309]        [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.272319]        [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[  207.272329]        [<ffffffff8119c254>] link_path_walk+0x6f4/0x9a0
[  207.272339]        [<ffffffff8119e7fa>] path_openat+0xba/0x470
[  207.272349]        [<ffffffff8119ecf8>] do_filp_open+0x48/0xa0
[  207.272358]        [<ffffffff8118d81c>] file_open_name+0xdc/0x110
[  207.272369]        [<ffffffff8118d885>] filp_open+0x35/0x40
[  207.272378]        [<ffffffff8135c76e>] _request_firmware+0x52e/0xb20
[  207.272389]        [<ffffffff8135cdd6>] request_firmware+0x16/0x20
[  207.272399]        [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode]
[  207.272416]        [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode]
[  207.272431]        [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode]
[  207.272444]        [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100
[  207.272457]        [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4
[  207.272472]        [<ffffffff81000202>] do_one_initcall+0x42/0x180
[  207.272485]        [<ffffffff810bbeff>] load_module+0x19df/0x1b70
[  207.272499]        [<ffffffff810bc376>] sys_init_module+0xe6/0x130
[  207.272511]        [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[  207.272523]
[  207.272523] -> linux4sam#1 (umhelper_sem){++++.+}:
[  207.272537]        [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.272548]        [<ffffffff814ae9c4>] down_read+0x34/0x50
[  207.272559]        [<ffffffff81062bff>] usermodehelper_read_trylock+0x4f/0x100
[  207.272575]        [<ffffffff8135c7dd>] _request_firmware+0x59d/0xb20
[  207.272587]        [<ffffffff8135cdd6>] request_firmware+0x16/0x20
[  207.272599]        [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode]
[  207.272613]        [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode]
[  207.272627]        [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode]
[  207.272641]        [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100
[  207.272654]        [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4
[  207.272666]        [<ffffffff81000202>] do_one_initcall+0x42/0x180
[  207.272678]        [<ffffffff810bbeff>] load_module+0x19df/0x1b70
[  207.272690]        [<ffffffff810bc376>] sys_init_module+0xe6/0x130
[  207.272702]        [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[  207.272715]
[  207.272715] -> #0 (subsys mutex){+.+.+.}:
[  207.272729]        [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0
[  207.272740]        [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.272751]        [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[  207.272763]        [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0
[  207.272775]        [<ffffffff81349114>] device_del+0x134/0x1f0
[  207.272786]        [<ffffffff813491f2>] device_unregister+0x22/0x60
[  207.272798]        [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad
[  207.272812]        [<ffffffff814b6402>] notifier_call_chain+0x72/0x130
[  207.272824]        [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10
[  207.272839]        [<ffffffff81498f76>] _cpu_down+0x1d6/0x350
[  207.272851]        [<ffffffff81499130>] cpu_down+0x40/0x60
[  207.272862]        [<ffffffff8149cc55>] store_online+0x75/0xe0
[  207.272874]        [<ffffffff813474a0>] dev_attr_store+0x20/0x30
[  207.272886]        [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150
[  207.272900]        [<ffffffff8118e10b>] vfs_write+0xcb/0x130
[  207.272911]        [<ffffffff8118e924>] sys_write+0x64/0xa0
[  207.272923]        [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[  207.272936]
[  207.272936] other info that might help us debug this:
[  207.272936]
[  207.272952] Chain exists of:
[  207.272952]   subsys mutex --> &mm->mmap_sem --> cpu_hotplug.lock
[  207.272952]
[  207.272973]  Possible unsafe locking scenario:
[  207.272973]
[  207.272984]        CPU0                    CPU1
[  207.272992]        ----                    ----
[  207.273000]   lock(cpu_hotplug.lock);
[  207.273009]                                lock(&mm->mmap_sem);
[  207.273020]                                lock(cpu_hotplug.lock);
[  207.273031]   lock(subsys mutex);
[  207.273040]
[  207.273040]  *** DEADLOCK ***
[  207.273040]
[  207.273055] 5 locks held by bash/10493:
[  207.273062]  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff81209049>] sysfs_write_file+0x49/0x150
[  207.273080]  linux4sam#1:  (s_active#150){.+.+.+}, at: [<ffffffff812090c2>] sysfs_write_file+0xc2/0x150
[  207.273099]  linux4sam#2:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81027557>] cpu_hotplug_driver_lock+0x17/0x20
[  207.273121]  linux4sam#3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8149911c>] cpu_down+0x2c/0x60
[  207.273140]  linux4sam#4:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60
[  207.273158]
[  207.273158] stack backtrace:
[  207.273170] Pid: 10493, comm: bash Not tainted 3.9.0-rc1-0.7-default+ linux4sam#34
[  207.273180] Call Trace:
[  207.273192]  [<ffffffff810ab373>] print_circular_bug+0x223/0x310
[  207.273204]  [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0
[  207.273216]  [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0
[  207.273227]  [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[  207.273239]  [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0
[  207.273251]  [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[  207.273263]  [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0
[  207.273274]  [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0
[  207.273286]  [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0
[  207.273298]  [<ffffffff81349114>] device_del+0x134/0x1f0
[  207.273309]  [<ffffffff813491f2>] device_unregister+0x22/0x60
[  207.273321]  [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad
[  207.273332]  [<ffffffff814b6402>] notifier_call_chain+0x72/0x130
[  207.273344]  [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10
[  207.273356]  [<ffffffff81498f76>] _cpu_down+0x1d6/0x350
[  207.273368]  [<ffffffff81027557>] ? cpu_hotplug_driver_lock+0x17/0x20
[  207.273380]  [<ffffffff81499130>] cpu_down+0x40/0x60
[  207.273391]  [<ffffffff8149cc55>] store_online+0x75/0xe0
[  207.273402]  [<ffffffff813474a0>] dev_attr_store+0x20/0x30
[  207.273413]  [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150
[  207.273425]  [<ffffffff8118e10b>] vfs_write+0xcb/0x130
[  207.273436]  [<ffffffff8118e924>] sys_write+0x64/0xa0
[  207.273447]  [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b

Which reports a false possitive deadlock because it sees:
1) load_module -> subsys_interface_register -> mc_deveice_add (*) -> subsys->p->mutex -> link_path_walk -> lookup_slow -> i_mutex
2) sys_write -> _cpu_down -> cpu_hotplug_begin -> cpu_hotplug.lock -> mce_cpu_callback -> mce_device_remove(**) -> device_unregister -> bus_remove_device -> subsys mutex
3) vfs_readdir -> i_mutex -> filldir64 -> might_fault -> might_lock_read(mmap_sem) -> page_fault -> mmap_sem -> drain_all_stock -> cpu_hotplug.lock

but
1) takes cpu_subsys subsys (*) but 2) takes mce_device subsys (**) so
the deadlock is not possible AFAICS.

The fix is quite simple. We can pull the key inside bus_type structure
because they are defined per device so the pointer will be unique as
well. bus_register doesn't need to be a macro anymore so change it
to the inline. We could get rid of __bus_register as there is no other
caller but maybe somebody will want to use a different key so keep it
around for now.

Reported-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
The first backtrace appears on tx path with DMA mapping operations debug
enabled.

[  345.637919] ------------[ cut here ]------------
[  345.637971] WARNING: at lib/dma-debug.c:937 check_unmap+0x4df/0x910()
[  345.637977] Hardware name: System Name
[  345.637987] sis900 0000:00:01.1: DMA-API: device driver failed to check map error[device address=0x000000000d4aed02] [si
ze=60 bytes] [mapped as single]
[  345.637993] Modules linked in: bridge stp llc dmfe sundance 3c59x sis900
[  345.638022] Pid: 0, comm: swapper Not tainted 3.9.0-rc6+ linux4sam#4
[  345.638028] Call Trace:
[  345.638042]  [<c122097f>] ? check_unmap+0x4df/0x910
[  345.638059]  [<c102b19c>] warn_slowpath_common+0x7c/0xa0
[  345.638070]  [<c122097f>] ? check_unmap+0x4df/0x910
[  345.638081]  [<c102b23e>] warn_slowpath_fmt+0x2e/0x30
[  345.638092]  [<c122097f>] check_unmap+0x4df/0x910
[  345.638107]  [<c100bfeb>] ? save_stack_trace+0x2b/0x50
[  345.638120]  [<c107238e>] ? mark_lock+0x31e/0x5d0
[  345.638132]  [<c1072b2c>] ? __lock_acquire+0x4ec/0x7d0
[  345.638143]  [<c1220f6d>] debug_dma_unmap_page+0x6d/0x80
[  345.638166]  [<cf834dec>] sis900_interrupt+0x49c/0x860 [sis900]
[  345.638195]  [<c1094b73>] handle_irq_event_percpu+0x43/0x1c0
[  345.638206]  [<c1094d1e>] ? handle_irq_event+0x2e/0x60
[  345.638217]  [<c1094d27>] handle_irq_event+0x37/0x60
[  345.638235]  [<c10973f0>] ? irq_set_chip_data+0x40/0x40
[  345.638246]  [<c1097442>] handle_level_irq+0x52/0xa0
[  345.638251]  <IRQ>  [<c1003629>] ? do_IRQ+0x39/0xa0
[  345.638293]  [<c1484631>] ? common_interrupt+0x31/0x36
[  345.638347]  [<d08c2c52>] ? br_flood_forward+0x12/0x20 [bridge]
[  345.638364]  [<d08c2d40>] ? br_dev_queue_push_xmit+0x60/0x60 [bridge]
[  345.638381]  [<d08c3b2b>] ? br_handle_frame_finish+0x25b/0x280 [bridge]
[  345.638399]  [<d08c3ce3>] ? br_handle_frame+0x193/0x290 [bridge]
[  345.638416]  [<d08c3b50>] ? br_handle_frame_finish+0x280/0x280 [bridge]
[  345.638431]  [<c13b3c87>] ? __netif_receive_skb_core+0x1d7/0x710
[  345.638442]  [<c13b3b19>] ? __netif_receive_skb_core+0x69/0x710
[  345.638454]  [<c13b41e1>] ? __netif_receive_skb+0x21/0x70
[  345.638464]  [<c13b42b5>] ? process_backlog+0x85/0x130
[  345.638476]  [<c13b4bbb>] ? net_rx_action+0xfb/0x1d0
[  345.638497]  [<c1032768>] ? __do_softirq+0xa8/0x1f0
[  345.638527]  [<c147daad>] ? _raw_spin_unlock+0x1d/0x20
[  345.638538]  [<c10038c0>] ? handle_irq+0x20/0xd0
[  345.638550]  [<c1032f27>] ? irq_exit+0x97/0xa0
[  345.638560]  [<c1003632>] ? do_IRQ+0x42/0xa0
[  345.638580]  [<c104d003>] ? hrtimer_start+0x23/0x30
[  345.638580]  [<c1484631>] ? common_interrupt+0x31/0x36
[  345.638580]  [<c1008703>] ? default_idle+0x33/0xc0
[  345.638580]  [<c10086ac>] ? cpu_idle+0x4c/0x70
[  345.638580]  [<c14787e0>] ? rest_init+0xa0/0xb0
[  345.638580]  [<c1478740>] ? reciprocal_value+0x50/0x50
[  345.638580]  [<c16b5bcf>] ? start_kernel+0x28f/0x320
[  345.638580]  [<c16b54e0>] ? repair_env_string+0x60/0x60
[  345.638580]  [<c16b5269>] ? i386_start_kernel+0x39/0xa0
[  345.638580] ---[ end trace a244264b69b8a7ae ]---
[  345.638580] Mapped at:
[  345.638580]  [<c1221c65>] debug_dma_map_page+0x65/0x110
[  345.638580]  [<cf8355a9>] sis900_start_xmit+0x129/0x210 [sis900]
[  345.638580]  [<c13b2527>] dev_hard_start_xmit+0x1b7/0x530
[  345.638580]  [<c13cc32e>] sch_direct_xmit+0x8e/0x280
[  345.638580]  [<c13b4e39>] dev_queue_xmit+0x1a9/0x5b0

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
This fixes the following lockdep splat:

[   66.460000] =================================
[   66.460000] [ INFO: inconsistent lock state ]
[   66.460000] 3.9.0-rc5-00161-ga48dd49 linux4sam#4 Not tainted
[   66.460000] ---------------------------------
[   66.460000] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   66.460000] swapper/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
[   66.460000]  (timer_lock){+.?...}, at: [<d0006cde>] rs_poll+0x12/0xdc
[   66.460000] {SOFTIRQ-ON-W} state was registered at:
[   66.460000]   [<d00421f0>] lock_acquire+0xec/0x13c
[   66.460000]   [<d01ea036>] _raw_spin_lock+0x3a/0x84
[   66.460000]   [<d0006c8c>] rs_open+0x18/0x58
[   66.460000]   [<d0139ea2>] tty_open+0x262/0x3cc
[   66.460000]   [<d00942e0>] chrdev_open+0x8c/0xe0
[   66.460000]   [<d00907b2>] do_dentry_open$isra$16+0x10e/0x190
[   66.460000]   [<d0091141>] finish_open+0x39/0x48
[   66.460000]   [<d009a0b4>] do_last$isra$34+0x6c4/0x824
[   66.460000]   [<d009a27a>] path_openat+0x66/0x310
[   66.460000]   [<d009a53a>] do_filp_open+0x16/0x44
[   66.460000]   [<d0091445>] do_sys_open+0xd5/0x13c
[   66.460000]   [<d00914be>] sys_open+0x12/0x18
[   66.460000]   [<d0413ffc>] kernel_init_freeable+0xe4/0x12c
[   66.460000]   [<d01e2a9c>] kernel_init+0xc/0x9c
[   66.460000]   [<d00044fc>] ret_from_kernel_thread+0x8/0xc
[   66.460000] irq event stamp: 132542
[   66.460000] hardirqs last  enabled at (132542): [<d01ea2ec>] _raw_spin_unlock_irq+0x30/0x44
[   66.460000] hardirqs last disabled at (132541): [<d01ea11e>] _raw_spin_lock_irq+0xe/0x8c
[   66.460000] softirqs last  enabled at (132234): [<d0017d32>] __do_softirq+0x216/0x2a4
[   66.460000] softirqs last disabled at (132539): [<d0018024>] irq_exit+0x38/0x40
[   66.460000]
[   66.460000] other info that might help us debug this:
[   66.460000]  Possible unsafe locking scenario:
[   66.460000]
[   66.460000]        CPU0
[   66.460000]        ----
[   66.460000]   lock(timer_lock);
[   66.460000]   <Interrupt>
[   66.460000]     lock(timer_lock);
[   66.460000]
[   66.460000]  *** DEADLOCK ***
[   66.460000]
[   66.460000] 1 lock held by swapper/1:
[   66.460000]  #0:  (((&serial_timer))){+.-...}, at: [<d001c65c>] call_timer_fn+0x0/0x1f0
[   66.460000]
Stack: d7c2fac0 00000018 00000004 00000001 d7c2faa0 00000004 00000006 d7c2fa90
       9003e87c d7c2fae0 d7c30000 d025a87c 00000001 0000000f 00000000 d7c2fac0
       9004005d d7c2fb10 d7c30000 d7c30338 00000001 00000001 00000000 d7c30338
[   66.460000] Call Trace:
[   66.460000]  [<d01e4f93>] print_usage_bug$part$26+0x1c3/0x1c8
[   66.460000]  [<d003e87c>] mark_lock+0x2b4/0x440
[   66.460000]  [<d004005d>] __lock_acquire+0x54d/0x16c4
[   66.460000]  [<d00421f0>] lock_acquire+0xec/0x13c
[   66.460000]  [<d01ea036>] _raw_spin_lock+0x3a/0x84
[   66.460000]  [<d0006cde>] rs_poll+0x12/0xdc
[   66.460000]  [<d001c71a>] call_timer_fn+0xbe/0x1f0
[   66.460000]  [<d001cd90>] run_timer_softirq+0x198/0x1f4
[   66.460000]  [<d0017c30>] __do_softirq+0x114/0x2a4
[   66.460000]  [<d0018024>] irq_exit+0x38/0x40
[   66.460000]  [<d00046c0>] do_IRQ+0x44/0x48
[   66.460000]  [<d0005c58>] do_interrupt+0x4c/0x54
[   66.460000]  [<d0003c80>] common_exception_return+0x0/0x5c
[   66.460000]  [<d006682c>] free_pcppages_bulk+0x254/0x308
[   66.460000]

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
This can easily be triggered if a new CPU is added (via
ACPI hotplug mechanism) and from user-space you do:

   echo 1 > /sys/devices/system/cpu/cpu3/online

(or wait for UDEV to do it) on a newly appeared physical CPU.

The deadlock is that the "store_online" in drivers/base/cpu.c
takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up".
"cpu_up" eventually ends up calling "save_mc_for_early"
which also takes the cpu_hotplug_driver_lock() lock.

And here is that lockdep thinks of it:

 smpboot: Stack at about ffff880075c39f44
 smpboot: CPU3: has booted.
 microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25

 =============================================
 [ INFO: possible recursive locking detected ]
 3.9.0upstream-10129-g167af0e linux4sam#1 Not tainted
 ---------------------------------------------
 sh/2487 is trying to acquire lock:
  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20

 but task is already holding lock:
  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(x86_cpu_hotplug_driver_mutex);
   lock(x86_cpu_hotplug_driver_mutex);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 6 locks held by sh/2487:
  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190
  linux4sam#1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160
  linux4sam#2:  (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160
  linux4sam#3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
  linux4sam#4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20
  linux4sam#5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60

Suggested-and-Acked-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: fenghua.yu@intel.com
Cc: xen-devel@lists.xensource.com
Cc: stable@vger.kernel.org # for v3.9
Link: http://lkml.kernel.org/r/1368029583-23337-1-git-send-email-konrad.wilk@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
Cleanup regex_lock and ftrace_lock locking points around
ftrace_ops hash update code.

The new rule is that regex_lock protects ops->*_hash
read-update-write code for each ftrace_ops. Usually,
hash update is done by following sequence.

1. allocate a new local hash and copy the original hash.
2. update the local hash.
3. move(actually, copy) back the local hash to ftrace_ops.
4. update ftrace entries if needed.
5. release the local hash.

This makes regex_lock protect linux4sam#1-linux4sam#4, and ftrace_lock
to protect linux4sam#3, linux4sam#4 and adding and removing ftrace_ops from the
ftrace_ops_list. The ftrace_lock protects linux4sam#3 as well because
the move functions update the entries too.

Link: http://lkml.kernel.org/r/20130509054421.30398.83411.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
… into node number

df2d5ae ("workqueue: map an unbound workqueues to multiple per-node
pool_workqueues") made unbound workqueues to map to multiple per-node
pool_workqueues and accordingly updated workqueue_contested() so that,
for unbound workqueues, it maps the specified @cpu to the NUMA node
number to obtain the matching pool_workqueue to query the congested
state.

Before this change, workqueue_congested() ignored @cpu for unbound
workqueues as there was only one pool_workqueue and some users
(fscache) called it with WORK_CPU_UNBOUND.  After the commit, this
causes the following oops as WORK_CPU_UNBOUND gets translated to
garbage by cpu_to_node().

  BUG: unable to handle kernel paging request at ffff8803598d98b8
  IP: [<ffffffff81043b7e>] unbound_pwq_by_node+0xa1/0xfa
  PGD 2421067 PUD 0
  Oops: 0000 [linux4sam#1] SMP
  CPU: 1 PID: 2689 Comm: cat Tainted: GF            3.9.0-fsdevel+ linux4sam#4
  task: ffff88003d801040 ti: ffff880025806000 task.ti: ffff880025806000
  RIP: 0010:[<ffffffff81043b7e>]  [<ffffffff81043b7e>] unbound_pwq_by_node+0xa1/0xfa
  RSP: 0018:ffff880025807ad8  EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff8800388a2400 RCX: 0000000000000003
  RDX: ffff880025807fd8 RSI: ffffffff81a31420 RDI: ffff88003d8016e0
  RBP: ffff880025807ae8 R08: ffff88003d801730 R09: ffffffffa00b4898
  R10: ffffffff81044217 R11: ffff88003d801040 R12: 0000000064206e97
  R13: ffff880036059d98 R14: ffff880038cc8080 R15: ffff880038cc82d0
  FS:  00007f21afd9c740(0000) GS:ffff88003d100000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: ffff8803598d98b8 CR3: 000000003df49000 CR4: 00000000000007e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Stack:
   ffff8800388a2400 0000000000000002 ffff880025807b18 ffffffff810442ce
   ffffffff81044217 ffff880000000002 ffff8800371b4080 ffff88003d112ec0
   ffff880025807b38 ffffffffa00810b0 ffff880036059d88 ffff880036059be8
  Call Trace:
   [<ffffffff810442ce>] workqueue_congested+0xb7/0x12c
   [<ffffffffa00810b0>] fscache_enqueue_object+0xb2/0xe8 [fscache]
   [<ffffffffa007facd>] __fscache_acquire_cookie+0x3b9/0x56c [fscache]
   [<ffffffffa00ad8fe>] nfs_fscache_set_inode_cookie+0xee/0x132 [nfs]
   [<ffffffffa009e112>] do_open+0x9/0xd [nfs]
   [<ffffffff810e804a>] do_dentry_open+0x175/0x24b
   [<ffffffff810e8298>] finish_open+0x41/0x51

Fix it by using smp_processor_id() if @cpu is WORK_CPU_UNBOUND.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Howells <dhowells@redhat.com>
Tested-and-Acked-by: David Howells <dhowells@redhat.com>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
commit fb745e9 ("net/802/mrp: fix possible race condition when
calling mrp_pdu_queue()") introduced a lockdep splat.

[   19.735147] =================================
[   19.735235] [ INFO: inconsistent lock state ]
[   19.735324] 3.9.2-build-0063 linux4sam#4 Not tainted
[   19.735412] ---------------------------------
[   19.735500] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[   19.735592] rmmod/1840 [HC0[0]:SC0[0]:HE1:SE1] takes:
[   19.735682]  (&(&app->lock)->rlock#2){+.?...}, at: [<f862bb5b>]
mrp_uninit_applicant+0x69/0xba [mrp]

app->lock is normally taken under softirq context, so disable BH to
avoid the splat.

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ward <david.ward@ll.mit.edu>
Cc: Cong Wang <amwang@redhat.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
It is required to enable respective clock-domain before
enabling any clock/module inside that clock-domain.

During common-clock migration, .clkdm_name field got missed
for "clkdiv32k_ick" clock, which leaves "clk_24mhz_clkdm"
unused; so it will be disabled even if childs of this clock-domain
is enabled, which keeps child modules in idle mode.

This fixes the kernel crash observed on AM335xEVM-SK platform,
where clkdiv32_ick clock is being used as a gpio debounce clock
and since clkdiv32k_ick is in idle mode it leads to below crash -

Crash Log:
==========
[    2.598347] Unhandled fault: external abort on non-linefetch (0x1028) at
0xfa1ac150
[    2.606434] Internal error: : 1028 [linux4sam#1] SMP ARM
[    2.611207] Modules linked in:
[    2.614449] CPU: 0    Not tainted  (3.8.4-01382-g1f449cd-dirty linux4sam#4)
[    2.620973] PC is at _set_gpio_debounce+0x60/0x104
[    2.626025] LR is at clk_enable+0x30/0x3c

Cc: stable@vger.kernel.org # v3.9
Signed-off-by: Vaibhav Hiremath <hvaibhav@ti.com>
Cc: Rajendra Nayak <rnayak@ti.com>
Acked-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
Daniel Petre reported crashes in icmp_dst_unreach() with following call
graph:

linux4sam#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
linux4sam#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
linux4sam#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
linux4sam#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
linux4sam#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
linux4sam#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
linux4sam#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
linux4sam#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
linux4sam#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596

Daniel found a similar problem mentioned in
 http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html

And indeed this is the root cause : skb->cb[] contains data fooling IP
stack.

We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
is called. Or else skb->cb[] might contain garbage from GSO segmentation
layer.

A similar fix was tested on linux-3.9, but gre code was refactored in
linux-3.10. I'll send patches for stable kernels as well.

Many thanks to Daniel for providing reports, patches and testing !

Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
Commit be283b2
(    Btrfs: use helper to cleanup tree roots) introduced the following bug,

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
 IP: [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs]
[...]
 Pid: 2463, comm: btrfs-cache-1 Tainted: G           O 3.9.0+ linux4sam#4 innotek GmbH VirtualBox/VirtualBox
 RIP: 0010:[<ffffffffa039368c>]  [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs]
 Process btrfs-cache-1 (pid: 2463, threadinfo ffff880112d60000, task ffff880117679730)
[...]
 Call Trace:
  [<ffffffffa0398a99>] btrfs_search_slot+0x104/0x64d [btrfs]
  [<ffffffffa039aea4>] btrfs_next_old_leaf+0xa7/0x334 [btrfs]
  [<ffffffffa039b141>] btrfs_next_leaf+0x10/0x12 [btrfs]
  [<ffffffffa039ea13>] caching_thread+0x1a3/0x2e0 [btrfs]
  [<ffffffffa03d8811>] worker_loop+0x14b/0x48e [btrfs]
  [<ffffffffa03d86c6>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
  [<ffffffff81068d3d>] kthread+0x8d/0x95
  [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43
  [<ffffffff8151e5ac>] ret_from_fork+0x7c/0xb0
  [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43
RIP  [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs]

We've free'ed commit_root before actually getting to free block groups where
caching thread needs valid extent_root->commit_root.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
This commit fixes a lockdep-detected deadlock by moving a wake_up()
call out from a rnp->lock critical section.  Please see below for
the long version of this story.

On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

> [12572.705832] ======================================================
> [12572.750317] [ INFO: possible circular locking dependency detected ]
> [12572.796978] 3.10.0-rc3+ linux4sam#39 Not tainted
> [12572.833381] -------------------------------------------------------
> [12572.862233] trinity-child17/31341 is trying to acquire lock:
> [12572.870390]  (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12572.878859]
> but task is already holding lock:
> [12572.894894]  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12572.903381]
> which lock already depends on the new lock.
>
> [12572.927541]
> the existing dependency chain (in reverse order) is:
> [12572.943736]
> -> linux4sam#4 (&ctx->lock){-.-...}:
> [12572.960032]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12572.968337]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12572.976633]        [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
> [12572.984969]        [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
> [12572.993326]        [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
> [12573.001652]        [<ffffffff816eacfe>] schedule_user+0x2e/0x70
> [12573.009998]        [<ffffffff816ecd64>] retint_careful+0x12/0x2e
> [12573.018321]
> -> linux4sam#3 (&rq->lock){-.-.-.}:
> [12573.034628]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.042930]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.051248]        [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
> [12573.059579]        [<ffffffff810492f5>] do_fork+0x105/0x470
> [12573.067880]        [<ffffffff81049686>] kernel_thread+0x26/0x30
> [12573.076202]        [<ffffffff816cee63>] rest_init+0x23/0x140
> [12573.084508]        [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
> [12573.092852]        [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
> [12573.101233]        [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
> [12573.109528]
> -> linux4sam#2 (&p->pi_lock){-.-.-.}:
> [12573.125675]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.133829]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.141964]        [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
> [12573.150065]        [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
> [12573.158151]        [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
> [12573.166195]        [<ffffffff81085398>] __wake_up_common+0x58/0x90
> [12573.174215]        [<ffffffff81086909>] __wake_up+0x39/0x50
> [12573.182146]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.190119]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.198023]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.205860]        [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.213656]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
> [12573.221379]
> -> linux4sam#1 (&rsp->gp_wq){..-.-.}:
> [12573.236329]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.243783]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.251178]        [<ffffffff810868f3>] __wake_up+0x23/0x50
> [12573.258505]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.265891]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.273248]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.280564]        [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.287807]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0

Notice the above call chain.

rcu_start_future_gp() is called with the rnp->lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.

You can't do wakeups while holding the rnp->lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).

> [12573.295067]
> -> #0 (rcu_node_0){..-.-.}:
> [12573.309293]        [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.316568]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.323825]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.331081]        [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.338377]        [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.345648]        [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.352942]        [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.360211]        [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.367514]        [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.374816]        [<ffffffff816f4dd4>] tracesys+0xdd/0xe2

Notice the above trace.

perf took its own ctx->lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:

rcu_read_lock();
raw_spin_lock(ctx->lock);
rcu_read_unlock();

Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx->lock, the following rcu_read_unlock()
triggered the "special" code.

The rcu_read_unlock_special() takes the rnp->lock, which gives us a
possible deadlock scenario.

	CPU0		CPU1		CPU2
	----		----		----

				     rcu_nocb_kthread()
    lock(rq->lock);
		    lock(ctx->lock);
				     lock(rnp->lock);

				     wake_up();

				     lock(rq->lock);

		    rcu_read_unlock();

		    rcu_read_unlock_special();

		    lock(rnp->lock);
    lock(ctx->lock);

**** DEADLOCK ****

> [12573.382068]
> other info that might help us debug this:
>
> [12573.403229] Chain exists of:
>   rcu_node_0 --> &rq->lock --> &ctx->lock
>
> [12573.424471]  Possible unsafe locking scenario:
>
> [12573.438499]        CPU0                    CPU1
> [12573.445599]        ----                    ----
> [12573.452691]   lock(&ctx->lock);
> [12573.459799]                                lock(&rq->lock);
> [12573.467010]                                lock(&ctx->lock);
> [12573.474192]   lock(rcu_node_0);
> [12573.481262]
>  *** DEADLOCK ***
>
> [12573.501931] 1 lock held by trinity-child17/31341:
> [12573.508990]  #0:  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12573.516475]
> stack backtrace:
> [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ linux4sam#39
> [12573.545357]  ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> [12573.552868]  ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> [12573.560353]  0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> [12573.567856] Call Trace:
> [12573.575011]  [<ffffffff816e375b>] dump_stack+0x19/0x1b
> [12573.582284]  [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
> [12573.589637]  [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.596982]  [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
> [12573.604344]  [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.611652]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.619030]  [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.626331]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.633671]  [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.640992]  [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
> [12573.648330]  [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
> [12573.655662]  [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
> [12573.662964]  [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.670276]  [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.677622]  [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
> [12573.684981]  [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.692358]  [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.699753]  [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
> [12573.707135]  [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [12573.714599]  [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.721996]  [<ffffffff816f4dd4>] tracesys+0xdd/0xe2

This commit delays the wakeup via irq_work(), which is what
perf and ftrace use to perform wakeups in critical sections.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
alexandermorozov pushed a commit to goodwin-europe/linux-at91 that referenced this pull request Jul 11, 2014
In Steven Rostedt's words:

> I've been debugging the last couple of days why my tests have been
> locking up. One of my tracing tests, runs all available tracers. The
> lockup always happened with the mmiotrace, which is used to trace
> interactions between priority drivers and the kernel. But to do this
> easily, when the tracer gets registered, it disables all but the boot
> CPUs. The lockup always happened after it got done disabling the CPUs.
>
> Then I decided to try this:
>
> while :; do
> 	for i in 1 2 3; do
> 		echo 0 > /sys/devices/system/cpu/cpu$i/online
> 	done
> 	for i in 1 2 3; do
> 		echo 1 > /sys/devices/system/cpu/cpu$i/online
> 	done
> done
>
> Well, sure enough, that locked up too, with the same users. Doing a
> sysrq-w (showing all blocked tasks):
>
> [ 2991.344562]   task                        PC stack   pid father
> [ 2991.344562] rcu_preempt     D ffff88007986fdf8     0    10      2 0x00000000
> [ 2991.344562]  ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
> [ 2991.344562]  ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
> [ 2991.344562]  ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81541750>] schedule_timeout+0xbc/0xf9
> [ 2991.344562]  [<ffffffff8154bec0>] ? ftrace_call+0x5/0x2f
> [ 2991.344562]  [<ffffffff81049513>] ? cascade+0xa8/0xa8
> [ 2991.344562]  [<ffffffff815417ab>] schedule_timeout_uninterruptible+0x1e/0x20
> [ 2991.344562]  [<ffffffff810c980c>] rcu_gp_kthread+0x502/0x94b
> [ 2991.344562]  [<ffffffff81062791>] ? __init_waitqueue_head+0x50/0x50
> [ 2991.344562]  [<ffffffff810c930a>] ? rcu_gp_fqs+0x64/0x64
> [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
> [ 2991.344562]  [<ffffffff81091e31>] ? lock_release_holdtime.part.23+0x4e/0x55
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] kworker/0:1     D ffffffff81a30680     0    47      2 0x00000000
> [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
> [ 2991.344562]  ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
> [ 2991.344562]  ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
> [ 2991.344562]  ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
> [ 2991.344562]  [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff810af7e6>] rebuild_sched_domains_locked+0x6e/0x3a8
> [ 2991.344562]  [<ffffffff810b0ec6>] rebuild_sched_domains+0x1c/0x2a
> [ 2991.344562]  [<ffffffff810b109b>] cpuset_hotplug_workfn+0x1c7/0x1d3
> [ 2991.344562]  [<ffffffff810b0ed9>] ? cpuset_hotplug_workfn+0x5/0x1d3
> [ 2991.344562]  [<ffffffff81058e07>] process_one_work+0x2d4/0x4d1
> [ 2991.344562]  [<ffffffff81058d3a>] ? process_one_work+0x207/0x4d1
> [ 2991.344562]  [<ffffffff8105964c>] worker_thread+0x2e7/0x3b5
> [ 2991.344562]  [<ffffffff81059365>] ? rescuer_thread+0x332/0x332
> [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] bash            D ffffffff81a4aa80     0  2618   2612 0x10000000
> [ 2991.344562]  ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
> [ 2991.344562]  ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
> [ 2991.344562]  ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
> [ 2991.344562]  [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff81091c99>] ? __lock_is_held+0x32/0x53
> [ 2991.344562]  [<ffffffff81548912>] notifier_call_chain+0x6b/0x98
> [ 2991.344562]  [<ffffffff810671fd>] __raw_notifier_call_chain+0xe/0x10
> [ 2991.344562]  [<ffffffff8103cf64>] __cpu_notify+0x20/0x32
> [ 2991.344562]  [<ffffffff8103cf8d>] cpu_notify_nofail+0x17/0x36
> [ 2991.344562]  [<ffffffff815225de>] _cpu_down+0x154/0x259
> [ 2991.344562]  [<ffffffff81522710>] cpu_down+0x2d/0x3a
> [ 2991.344562]  [<ffffffff81526351>] store_online+0x4e/0xe7
> [ 2991.344562]  [<ffffffff8134d764>] dev_attr_store+0x20/0x22
> [ 2991.344562]  [<ffffffff811b3c5f>] sysfs_write_file+0x108/0x144
> [ 2991.344562]  [<ffffffff8114c5ef>] vfs_write+0xfd/0x158
> [ 2991.344562]  [<ffffffff8114c928>] SyS_write+0x5c/0x83
> [ 2991.344562]  [<ffffffff8154c494>] tracesys+0xdd/0xe2
>
> As well as held locks:
>
> [ 3034.728033] Showing all locks held in the system:
> [ 3034.728033] 1 lock held by rcu_preempt/10:
> [ 3034.728033]  #0:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff810c9471>] rcu_gp_kthread+0x167/0x94b
> [ 3034.728033] 4 locks held by kworker/0:1/47:
> [ 3034.728033]  #0:  (events){.+.+.+}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
> [ 3034.728033]  linux4sam#1:  (cpuset_hotplug_work){+.+.+.}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
> [ 3034.728033]  linux4sam#2:  (cpuset_mutex){+.+.+.}, at: [<ffffffff810b0ec1>] rebuild_sched_domains+0x17/0x2a
> [ 3034.728033]  linux4sam#3:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
> [ 3034.728033] 1 lock held by mingetty/2563:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2565:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2569:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2572:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2575:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 7 locks held by bash/2618:
> [ 3034.728033]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8114bc3f>] file_start_write+0x2a/0x2c
> [ 3034.728033]  linux4sam#1:  (&buffer->mutex#2){+.+.+.}, at: [<ffffffff811b3b93>] sysfs_write_file+0x3c/0x144
> [ 3034.728033]  linux4sam#2:  (s_active#54){.+.+.+}, at: [<ffffffff811b3c3e>] sysfs_write_file+0xe7/0x144
> [ 3034.728033]  linux4sam#3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff810217c2>] cpu_hotplug_driver_lock+0x17/0x19
> [ 3034.728033]  linux4sam#4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103d196>] cpu_maps_update_begin+0x17/0x19
> [ 3034.728033]  linux4sam#5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103cfd8>] cpu_hotplug_begin+0x2c/0x6d
> [ 3034.728033]  linux4sam#6:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
> [ 3034.728033] 1 lock held by bash/2980:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
>
> Things looked a little weird. Also, this is a deadlock that lockdep did
> not catch. But what we have here does not look like a circular lock
> issue:
>
> Bash is blocked in rcu_cpu_notify():
>
> 1961		/* Exclude any attempts to start a new grace period. */
> 1962		mutex_lock(&rsp->onoff_mutex);
>
>
> kworker is blocked in get_online_cpus(), which makes sense as we are
> currently taking down a CPU.
>
> But rcu_preempt is not blocked on anything. It is simply sleeping in
> rcu_gp_kthread (really rcu_gp_init) here:
>
> 1453	#ifdef CONFIG_PROVE_RCU_DELAY
> 1454			if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &&
> 1455			    system_state == SYSTEM_RUNNING)
> 1456				schedule_timeout_uninterruptible(2);
> 1457	#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
>
> And it does this while holding the onoff_mutex that bash is waiting for.
>
> Doing a function trace, it showed me where it happened:
>
> [  125.940066] rcu_pree-10      3.... 28384115273: schedule_timeout_uninterruptible <-rcu_gp_kthread
> [...]
> [  125.940066] rcu_pree-10      3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120
>
> The watchdog ran, and then:
>
> [  125.940066] watchdog-38      3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118
>
> Not sure what modprobe was doing, but shortly after that:
>
> [  125.940066] modprobe-2848    3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0
>
> Where the migration thread took down the CPU:
>
> [  125.940066] migratio-40      3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120
>
> which finally did:
>
> [  125.940066]   <idle>-0       3...1 28389282142: arch_cpu_idle_dead <-cpu_startup_entry
> [  125.940066]   <idle>-0       3...1 28389282548: native_play_dead <-arch_cpu_idle_dead
> [  125.940066]   <idle>-0       3...1 28389282924: play_dead_common <-native_play_dead
> [  125.940066]   <idle>-0       3...1 28389283468: idle_task_exit <-play_dead_common
> [  125.940066]   <idle>-0       3...1 28389284644: amd_e400_remove_cpu <-play_dead_common
>
>
> CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
> doing a schedule_timeout_uninterruptible() and it registered it's
> timeout to the timer base for CPU 3. You would think that it would get
> migrated right? The issue here is that the timer migration happens at
> the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
> CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
> held by the thread that just put itself into a uninterruptible sleep,
> that wont wake up until the CPU_DEAD notifier of the timer
> infrastructure is called, which wont happen until the rcu notifier
> finishes. Here's our deadlock!

This commit breaks this deadlock cycle by substituting a shorter udelay()
for the previous schedule_timeout_uninterruptible(), while at the same
time increasing the probability of the delay.  This maintains the intensity
of the testing.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant