aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2012-10-11Merge branch 'next' into for-linusDmitry Torokhov29-813/+1231
Prepare second set of updates for 3.7 merge window (Wacom driver update and patches extending number of input minors).
2012-10-11md: refine reporting of resync/reshape delays.NeilBrown1-7/+18
If 'resync_max' is set to 0 (as is often done when starting a reshape, so the mdadm can remain in control during a sensitive period), and if the reshape request is initially delayed because another array using the same array is resyncing or reshaping etc, when user-space cannot easily tell when the delay changes from being due to a conflicting reshape, to being due to resync_max = 0. So introduce a new state: (curr_resync == 3) to reflect this, make sure it is visible both via /proc/mdstat and via the "sync_completed" sysfs attribute, and ensure that the event transition from one delay state to the other is properly notified. Signed-off-by: NeilBrown <[email protected]>
2012-10-11md/raid5: be careful not to resize_stripes too big.NeilBrown1-1/+2
When a RAID5 is reshaping, conf->raid_disks is increased before mddev->delta_disks becomes zero. This can result in check_reshape calling resize_stripes with a number that is too large. This particularly happens when md_check_recovery calls ->check_reshape(). If we use ->previous_raid_disks, we don't risk this. Signed-off-by: NeilBrown <[email protected]>
2012-10-11md: make sure manual changes to recovery checkpoint are saved.NeilBrown1-0/+2
If you make an array bigger but suppress resync of the new region with mdadm --grow /dev/mdX --size=max --assume-clean then stop the array before anything is written to it, the effect of the "--assume-clean" is lost and the array will resync the new space when restarted. So ensure that we update the metadata in the case. Reported-by: Sebastian Riemer <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md/raid10: use correct limit variableDan Carpenter1-1/+1
Clang complains that we are assigning a variable to itself. This should be using bad_sectors like the similar earlier check does. Bug has been present since 3.1-rc1. It is minor but could conceivably cause corruption or other bad behaviour. Cc: [email protected] Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md: writing to sync_action should clear the read-auto state.NeilBrown1-0/+7
In some cases array are started in 'read-auto' state where in nothing gets written to any device until the array is written to. The purpose of this is to make accidental auto-assembly of the wrong arrays less of a risk, and to allow arrays to be started to read suspend-to-disk images without actually changing anything (as might happen if the array were dirty and a resync seemed necessary). Explicitly writing the 'sync_action' for a read-auto array currently doesn't clear the read-auto state, so the sync action doesn't happen, which can be confusing. So allow any successful write to sync_action to clear any read-auto state. Reported-by: Alexander Kühn <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11Subject: [PATCH] md:change resync_mismatches to atomic64_t to avoid racesJianpeng Ma5-8/+9
Now that multiple threads can handle stripes, it is safer to use an atomic64_t for resync_mismatches, to avoid update races. Signed-off-by: Jianpeng Ma <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-10e1000e: Change wthresh to 1 to avoid possible Tx stallsHiroaki SHIMODA2-4/+4
This patch originated from Hiroaki SHIMODA but has been modified by Intel with some minor cleanups and additional commit log text. Denys Fedoryshchenko and others reported Tx stalls on e1000e with BQL enabled. Issue was root caused to hardware delays. They were introduced because some of the e1000e hardware with transmit writeback bursting enabled, waits until the driver does an explict flush OR there are WTHRESH descriptors to write back. Sometimes the delays in question were on the order of seconds, causing visible lag for ssh sessions and unacceptable tx completion latency, especially for BQL enabled kernels. To avoid possible Tx stalls, change WTHRESH back to 1. The current plan is to investigate a method for re-enabling WTHRESH while not harming BQL, but those patches will be later for net-next if they work. please enqueue for stable since v3.3 as this bug was introduced in commit 3f0cfa3bc11e7f00c9994e0f469cbc0e7da7b00c Author: Tom Herbert <[email protected]> Date: Mon Nov 28 16:33:16 2011 +0000 e1000e: Support for byte queue limits Changes to e1000e to use byte queue limits. Reported-by: Denys Fedoryshchenko <[email protected]> Tested-by: Denys Fedoryshchenko <[email protected]> Signed-off-by: Hiroaki SHIMODA <[email protected]> CC: [email protected] CC: [email protected] Signed-off-by: Jesse Brandeburg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10Merge branch 'uapi-for-3.7' of git://gitorious.org/linux-can/linux-canDavid S. Miller7-5/+5
Marc Kleine-Budde says: ==================== this pull request for net, i.e. the v3.7 release cycle, contains the patch by David Howells to move the UAPI related headers for the CAN subsystem. ==================== Signed-off-by: David S. Miller <[email protected]>
2012-10-10ipv4: fix route mark sparse warningstephen hemminger1-1/+1
Sparse complains about RTA_MARK which is should be host order according to include file and usage in iproute. net/ipv4/route.c:2223:46: warning: incorrect type in argument 3 (different base types) net/ipv4/route.c:2223:46: expected restricted __be32 [usertype] value net/ipv4/route.c:2223:46: got unsigned int [unsigned] [usertype] flowic_mark Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10xen: netback: handle compound page fragments on transmit.Ian Campbell1-5/+35
An SKB paged fragment can consist of a compound page with order > 0. However the netchannel protocol deals only in PAGE_SIZE frames. Handle this in netbk_gop_frag_copy and xen_netbk_count_skb_slots by iterating over the frames which make up the page. Signed-off-by: Ian Campbell <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Sander Eikelenboom <[email protected]> Tested-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10bridge: Pull ip header into skb->data before looking into ip header.Sarveshwar Bandi1-0/+3
If lower layer driver leaves the ip header in the skb fragment, it needs to be first pulled into skb->data before inspecting ip header length or ip version number. Signed-off-by: Sarveshwar Bandi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10isdn: fix a wrapping bug in isdn_ppp_ioctl()Dan Carpenter1-1/+1
"protos" is an array of unsigned longs and "i" is the number of bits in an unsigned long so we need to use 1UL as well to prevent the shift from wrapping around. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-11md/raid5: make sure to_read and to_write never go negative.NeilBrown1-4/+1
to_read and to_write are part of the result of analysing a stripe before handling it. Their use is to avoid some loops and tests if the values are known to be zero. Thus it is not a problem if they are a little bit larger than they should be. So decrementing them in handle_failed_stripe serves little value, and due to races it could cause some loops to be skipped incorrectly. So remove those decrements. Reported-by: "Jianpeng Ma" <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.Alexander Lyakas1-3/+16
Signed-off-by: Alex Lyakas <[email protected]> Suggested-by: Yair Hershko <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md/raid5: protect debug message against NULL derefernce.NeilBrown1-1/+1
The pr_debug in add_stripe_bio could race with something changing *bip, so it is best to hold the lock until after the pr_debug. Reported-by: "Jianpeng Ma" <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md/raid5: add some missing locking in handle_failed_stripe.NeilBrown1-0/+2
We really should hold the stripe_lock while accessing 'toread' else we could race with add_stripe_bio and corrupt a list. Reported-by: "Jianpeng Ma" <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11MD: raid5 avoid unnecessary zero page for trimShaohua Li1-18/+17
We want to avoid zero discarded dev page, because it's useless for discard. But if we don't zero it, another read/write hit such page in the cache and will get inconsistent data. To avoid zero the page, we don't set R5_UPTODATE flag after construction is done. In this way, discard write request is still issued and finished, but read will not hit the page. If the stripe gets accessed soon, we need reread the stripe, but since the chance is low, the reread isn't a big deal. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11MD: raid5 trim supportShaohua Li2-3/+166
Discard for raid4/5/6 has limitation. If discard request size is small, we do discard for one disk, but we need calculate parity and write parity disk. To correctly calculate parity, zero_after_discard must be guaranteed. Even it's true, we need do discard for one disk but write another disks, which makes the parity disks wear out fast. This doesn't make sense. So an efficient discard for raid4/5/6 should discard all data disks and parity disks, which requires the write pattern to be (A, A+chunk_size, A+chunk_size*2...). If A's size is smaller than chunk_size, such pattern is almost impossible in practice. So in this patch, I only handle the case that A's size equals to chunk_size. That is discard request should be aligned to stripe size and its size is multiple of stripe size. Since we can only handle request with specific alignment and size (or part of the request fitting stripes), we can't guarantee zero_after_discard even zero_after_discard is true in low level drives. The block layer doesn't send down correctly aligned requests even correct discard alignment is set, so I must filter out. For raid4/5/6 parity calculation, if data is 0, parity is 0. So if zero_after_discard is true for all disks, data is consistent after discard. Otherwise, data might be lost. Let's consider a scenario: discard a stripe, write data to one disk and write parity disk. The stripe could be still inconsistent till then depending on using data from other data disks or parity disks to calculate new parity. If the disk is broken, we can't restore it. So in this patch, we only enable discard support if all disks have zero_after_discard. If discard fails in one disk, we face the similar inconsistent issue above. The patch will make discard follow the same path as normal write request. If discard fails, a resync will be scheduled to make the data consistent. This isn't good to have extra writes, but data consistency is important. If a subsequent read/write request hits raid5 cache of a discarded stripe, the discarded dev page should have zero filled, so the data is consistent. This patch will always zero dev page for discarded request stripe. This isn't optimal because discard request doesn't need such payload. Next patch will avoid it. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md/bitmap:Don't use IS_ERR to judge alloc_page().Jianpeng Ma1-6/+2
Signed-off-by: Jianpeng Ma <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md/raid1: Don't release reference to device while handling read error.NeilBrown1-4/+5
When we get a read error, we arrange for raid1d to handle it. Currently we release the reference on the device. This can result in conf->mirrors[read_disk].rdev being NULL in fix_read_error, if the device happens to get removed before the read error is handled. So instead keep the reference until the read error has been fully handled. Reported-by: hank <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11raid: replace list_for_each_continue_rcu with new interfaceMichael Wang1-6/+3
This patch replaces list_for_each_continue_rcu() with list_for_each_entry_continue_rcu() to save a few lines of code and allow removing list_for_each_continue_rcu(). Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Michael Wang <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11add further __init annotations to crypto/xor.cJan Beulich1-2/+2
Allow particularly do_xor_speed() to be discarded post-init. Signed-off-by: Jan Beulich <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11DM RAID: Fix for "sync" directive ineffectivenessJonathan Brassow1-0/+13
There are two table arguments that can be given to a DM RAID target that control whether the array is forced to (re)synchronize or skip initialization: "sync" and "nosync". When "sync" is given, we set mddev->recovery_cp to 0 in order to cause the device to resynchronize. This is insufficient if there is a bitmap in use, because the array will simply look at the bitmap and see that there is no recovery necessary. The fix is to skip over the loading of the superblocks when "sync" is given, causing new superblocks to be written that will force the array to go through initialization (i.e. synchronization). Signed-off-by: Jonathan Brassow <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-10vxlan: fix oops when give unknown ifindexstephen hemminger1-6/+10
If vxlan is created and the ifindex is passed; there are two cases which are incorrectly handled by the existing code. The ifindex could be zero (i.e. no device) or there could be no device with that ifindex. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10vxlan: fix receive checksum handlingstephen hemminger1-2/+1
Vxlan was trying to use postpull_rcsum to allow receive checksum offload to work on drivers using CHECKSUM_COMPLETE method. But this doesn't work correctly. Just force full receive checksum on received packet. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10vxlan: add additional headroomstephen hemminger1-0/+1
Tell upper layer protocols to allocate skb with additional headroom. This avoids allocation and copy in local packet sends. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10vxlan: allow configuring port rangestephen hemminger2-5/+63
VXLAN bases source UDP port based on flow to help the receiver to be able to load balance based on outer header flow. This patch restricts the port range to the normal UDP local ports, and allows overriding via configuration. It also uses jhash of Ethernet header when looking at flows with out know L3 header. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10vxlan: associate with tunnel socket on transmitstephen hemminger1-0/+19
When tunnelling a skb, associate it with the tunnel socket. This allows parameters set on tunnel socket (like multicast loop flag), to be picked up by ip_output. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10vxlan: use ip_route_outputstephen hemminger1-4/+8
Select source address for VXLAN packet based on route destination and don't lie to route code. VXLAN is not GRE. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10vxlan: fix byte order in hash functionstephen hemminger1-2/+2
Shift was wrong direction causing packets to hash based on other parts of the ethernet header, not the address. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10vxlan: minor output refactoringstephen hemminger1-11/+20
Move code to find destination to a small function. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10of/mdio: Staticise !CONFIG_OF stubsMark Brown1-10/+10
The !CONFIG_OF stubs aren't static so if multiple files include the header with this configuration then the linker will see multiple definitions of the stubs. Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Mark Brown <[email protected]> Acked-by: Thomas Petazzoni <[email protected]> Acked-by: Srinivas Kandagatla <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-11DM RAID: Fix comparison of index and quantity for "rebuild" parameterJonathan Brassow1-1/+1
DM RAID: Fix comparison of index and quantity for "rebuild" parameter The "rebuild" parameter takes an index argument that starts counting from zero. The conditional used to validate the index was using '>' rather than '>=', leaving the door open for an index value that would be 1 too large. Reported-by: Neil Brown <[email protected]> Signed-off-by: Jonathan Brassow <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11DM RAID: Add rebuild capability for RAID10Jonathan Brassow2-1/+42
DM RAID: Add code to validate replacement slots for RAID10 arrays RAID10 can handle 'copies - 1' failures for each mirror group. This code ensures the user has provided a valid array - one whose devices specified for rebuild do not exceed the amount of redundancy available. Signed-off-by: Jonathan Brassow <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11DM RAID: Move 'rebuild' checking code to its own functionJonathan Brassow1-25/+50
DM RAID: Move chunk of code to it's own function The code that checks whether device replacements/rebuilds are possible given a specific RAID type is moved to it's own function. It will further expand when the code to check RAID10 is added. A separate function makes it easier to read. Signed-off-by: Jonathan Brassow <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11MD RAID10: Prep for DM RAID10 device replacement capabilityJonathan Brassow2-3/+9
MD RAID10: Fix a couple potential kernel panics if RAID10 is used by dm-raid When device-mapper uses the RAID10 personality through dm-raid.c, there is no 'gendisk' structure in mddev and some sysfs information is also not populated. This patch avoids touching those non-existent structures. Signed-off-by: Jonathan Brassow <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md: avoid taking the mutex on some ioctls.NeilBrown1-23/+62
Some ioctls don't need to take the mutex and doing so can cause a delay as it is held during super-block update. So move those ioctls out of the mutex and rely on rcu locking to ensure we don't access stale data. Signed-off-by: NeilBrown <[email protected]>
2012-10-11MD: change the parameter of md threadShaohua Li6-11/+17
Change the thread parameter, so the thread can carry extra info. Next patch will use it. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-10pktgen: replace scan_ip6() with in6_pton()Amerigo Wang1-97/+4
Cc: David S. Miller <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10pktgen: enable automatic IPv6 address settingAmerigo Wang1-13/+5
Cc: David S. Miller <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10pktgen: display IPv4 address in human-readable formatAmerigo Wang1-2/+2
It is weird to display IPv4 address in %x format, what's more, IPv6 address is disaplayed in human-readable format too. So, make it human-readable. Cc: David S. Miller <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10pktgen: set different default min_pkt_size for different protocolsAmerigo Wang1-7/+20
ETH_ZLEN is too small for IPv6, so this default value is not suitable. Cc: David S. Miller <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-10pktgen: fix crash when generating IPv6 packetsAmerigo Wang1-1/+1
For IPv6, sizeof(struct ipv6hdr) = 40, thus the following expression will result negative: datalen = pkt_dev->cur_pkt_size - 14 - sizeof(struct ipv6hdr) - sizeof(struct udphdr) - pkt_dev->pkt_overhead; And, the check "if (datalen < sizeof(struct pktgen_hdr))" will be passed as "datalen" is promoted to unsigned, therefore will cause a crash later. This is a quick fix by checking if "datalen" is negative. The following patch will increase the default value of 'min_pkt_size' for IPv6. This bug should exist for a long time, so Cc -stable too. Cc: <[email protected]> Cc: David S. Miller <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-10-11md/raid10: submit IO from originating thread instead of md thread.NeilBrown1-3/+54
queuing writes to the md thread means that all requests go through the one processor which may not be able to keep up with very high request rates. So use the plugging infrastructure to submit all requests on unplug. If a 'schedule' is needed, we fall back on the old approach of handing the requests to the thread for it to handle. This is nearly identical to a recent patch which provided similar functionality to RAID1. Signed-off-by: NeilBrown <[email protected]>
2012-10-11md: raid 10 supports TRIMShaohua Li1-4/+25
This makes md raid 10 support TRIM. If one disk supports discard and another not, or one has discard_zero_data and another not, there could be inconsistent between data from such disks. But this should not matter, discarded data is useless. This will add extra copy in rebuild though. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md: raid 1 supports TRIMShaohua Li1-2/+21
This makes md raid 1 support TRIM. If one disk supports discard and another not, or one has discard_zero_data and another not, there could be inconsistent between data from such disks. But this should not matter, discarded data is useless. This will add extra copy in rebuild though. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md: raid 0 supports TRIMShaohua Li1-1/+18
This makes md raid 0 support TRIM. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md: linear supports TRIMShaohua Li1-0/+16
This makes md linear support TRIM. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: NeilBrown <[email protected]>
2012-10-11md/linear: rcu_dereference outside read-lock sectionDenis Efremov1-2/+7
According to the comment in linear_stop function rcu_dereference in linear_start and linear_stop functions occurs under reconfig_mutex. The patch represents this agreement in code and prevents lockdep complaint. Found by Linux Driver Verification project (linuxtesting.org) Signed-off-by: Denis Efremov <[email protected]> Signed-off-by: NeilBrown <[email protected]>