Calendar

<<  December 2017  >>
MonTueWedThuFriSatSun
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567

View posts in large calendar

RecentComments

None

 
 
     
 
Here is a simple translet for finding duplicates. function T-FindDuplicates{    param ($inxml)    begin{        . PSlib:\xml\invoke-transform.ps1        [xml]$xslt = @" <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">    <xsl:output method="xml" indent="yes" />    <xsl:key name="file-checksums" match="file" use="@Checksum" />    <xsl:template match="file">        <xsl:copy>            <xsl:attribute name="Duplicate">                <xsl:value-of select="count(key('file-checksums', @Checksum)) &gt; 1" />            </xsl:attribute>            <xsl:apply-templates select="@*|node()" />        </xsl:copy>    </xsl:template>    <xsl:template match="@* | node()">        <xsl:copy>            <xsl:apply-templates select="@* | node()" />        </xsl:copy>    </xsl:template></xsl:stylesheet>"@    }    process{        if ($_ -is [xml]){            [xml](invoke-transform -inxml $_ -inxsl $xslt)        }    }    end{        if ($inxml -is [xml]){           [xml](invoke-transform -inxml $inxml -inxsl $xslt)        }    }} As you can see in the code it adds a Duplicate attribute with a value of true or false depending on whether there is a file with a duplicate @Checksum. It can be used like this PS> . .\T-GetDirAsXml.ps1 PS> . .\T-AddChecksum.ps1 PS> . .\T-FindDuplicates.ps1 PS> Get-DirAsXml | T-AddChecksum | T-FindDuplicates It might produce <root Name="root" Root="True" Date="2008/11/03 01:35:14">    <folder Name="test" Base="D:\powershell\blog\test" Parent="D:\powershell\blog">        <folder Name="test2">            <file Duplicate="true" Name="test.ps1" Checksum="C47313D06C6AADA288AF6D61E03EFD7FA7C52DD73AB097E9D556535D330798D3" />            <file Duplicate="false" Name="test.txt" Checksum="CE217706948A41613FFA00C46B64D48A514D3D80758C8334EE00D6B0786AE47F" />            <file Duplicate="true" Name="test.zip" Checksum="7F2CCA02F17FF0E9458C0777C659D6D00B80F1C9D2921AEC971AE9A82D296AA5" />            <file Duplicate="true" Name="tmp.xml" Checksum="1351245F9834D0406C42DD5AF622FCA691A9A36F440A7C88F389927800292303" />        </folder>        <file Duplicate="true" Name="test.ps1" Checksum="C47313D06C6AADA288AF6D61E03EFD7FA7C52DD73AB097E9D556535D330798D3" />        <file Duplicate="false" Name="test.txt" Checksum="0D7439F5894B4E8EFEC8FB409635D0D8EA7A450E902F6B30B335907B5867DF16" />        <file Duplicate="true" Name="test.zip" Checksum="7F2CCA02F17FF0E9458C0777C659D6D00B80F1C9D2921AEC971AE9A82D296AA5" />        <file Duplicate="true" Name="tmp.xml" Checksum="1351245F9834D0406C42DD5AF622FCA691A9A36F440A7C88F389927800292303" />    </folder></root> Here is the code T-FindDuplicates.zip (745 b) All of the files in folder test2 are copies of the files in test except for test.txt and as you can see only having an @Duplicate indicator doesn't tell you which file the file is a duplicate of so this translet is only useful if you have very few duplicate files. What you do when you find a duplicate is up to you and depends very much on the downstream application. One thing you could do is put a list of the duplicate files into an attribute like this <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">    <xsl:output method="xml" indent="yes" />    <xsl:key name="file-checksums" match="file" use="@Checksum" />    <xsl:template match="file">        <xsl:copy>            <xsl:if test="count(key('file-checksums', @Checksum)) &gt; 1">                <xsl:attribute name="Duplicate">true</xsl:attribute>                <xsl:attribute name="Duplicates">                    <xsl:for-each select="key('file-checksums', @Checksum)">                        <xsl:call-template name="get-path" />                        <xsl:value-of select="'&#xA;'" />                    </xsl:for-each>                </xsl:attribute>            </xsl:if>            <xsl:apply-templates select="@*|node()" />        </xsl:copy>    </xsl:template>    <xsl:template name="get-path">        <xsl:for-each select="ancestor-or-self::*[not(@Root)]">            <xsl:value-of select="@Parent" />            <xsl:text>\</xsl:text>            <xsl:value-of select="@Name" />        </xsl:for-each>    </xsl:template>    <xsl:template match="@* | node()">        <xsl:copy>            <xsl:apply-templates select="@* | node()" />        </xsl:copy>    </xsl:template></xsl:stylesheet> Which will produce <root Name="root" Root="True" Date="2008/11/03 01:35:14">    <folder Name="test" Base="D:\powershell\blog\test" Parent="D:\powershell\blog">        <folder Name="test2">            <file Duplicate="true" Duplicates="D:\powershell\blog\test\test2\test.ps1 D:\powershell\blog\test\test.ps1" Name="test.ps1" Checksum="C47313D06C6AADA288AF6D61E03EFD7FA7C52DD73AB097E9D556535D330798D3" />            <file Name="test.txt" Checksum="CE217706948A41613FFA00C46B64D48A514D3D80758C8334EE00D6B0786AE47F" />            <file Duplicate="true" Duplicates="D:\powershell\blog\test\test2\test.zip D:\powershell\blog\test\test.zip" Name="test.zip" Checksum="7F2CCA02F17FF0E9458C0777C659D6D00B80F1C9D2921AEC971AE9A82D296AA5" />            <file Duplicate="true" Duplicates="D:\powershell\blog\test\test2\tmp.xml D:\powershell\blog\test\tmp.xml" Name="tmp.xml" Checksum="1351245F9834D0406C42DD5AF622FCA691A9A36F440A7C88F389927800292303" />        </folder>        <file Duplicate="true" Duplicates="D:\powershell\blog\test\test2\test.ps1 D:\powershell\blog\test\test.ps1" Name="test.ps1" Checksum="C47313D06C6AADA288AF6D61E03EFD7FA7C52DD73AB097E9D556535D330798D3" />        <file Name="test.txt" Checksum="0D7439F5894B4E8EFEC8FB409635D0D8EA7A450E902F6B30B335907B5867DF16" />        <file Duplicate="true" Duplicates="D:\powershell\blog\test\test2\test.zip D:\powershell\blog\test\test.zip" Name="test.zip" Checksum="7F2CCA02F17FF0E9458C0777C659D6D00B80F1C9D2921AEC971AE9A82D296AA5" />        <file Duplicate="true" Duplicates="D:\powershell\blog\test\test2\tmp.xml D:\powershell\blog\test\tmp.xml" Name="tmp.xml" Checksum="1351245F9834D0406C42DD5AF622FCA691A9A36F440A7C88F389927800292303" />    </folder></root> Here is the code T-FindDuplicatesInfoTip.zip (927 b)
Another way to extend Xslt is to pass an extension object into the transform.  function T-AddChecksum{    param ($inxml)    begin{        . pslib:\xml\invoke-transform.ps1    [xml]$xslt = @' <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt"    xmlns:cjb="cjb" version="1.0">    <xsl:template match="node()|@*">        <xsl:copy>            <xsl:apply-templates select="@*|node()" />        </xsl:copy>    </xsl:template>    <xsl:template match="*[local-name()='file']">        <xsl:variable name="fname">            <xsl:call-template name="get-path" />        </xsl:variable>        <xsl:copy>            <xsl:apply-templates select="@*|node()" />            <xsl:attribute name="Checksum">                <xsl:value-of select="cjb:GetChecksum(string($fname))" />            </xsl:attribute>        </xsl:copy>    </xsl:template>    <xsl:template name="get-path">        <xsl:for-each select="ancestor-or-self::*[not(@Root)]">            <xsl:value-of select="@Parent" />            <xsl:text>\</xsl:text>            <xsl:value-of select="@Name" />        </xsl:for-each>    </xsl:template></xsl:stylesheet>'@$code = @'public class Checksum{    public System.String GetChecksum(System.String file) {        using (System.IO.FileStream stream = System.IO.File.OpenRead(file))        {            System.Security.Cryptography.SHA256Managed sha = new System.Security.Cryptography.SHA256Managed();            byte[] checksum = sha.ComputeHash(stream);            return System.BitConverter.ToString(checksum).Replace("-", System.String.Empty);        }   }}'@        Add-Type -TypeDefinition $code        $cs = new-object Checksum    }    process{        if ($_ -is [xml]){            [xml](invoke-transform -inxml $_ -inxsl $xslt -extensionobjects @{"cjb"=$cs})        }    }    end{        if ($inxml -is [xml]){            [xml](invoke-transform -inxml $inxml -inxsl $xslt -extensionobjects @{"cjb"=$cs})        }    }} It can be used the same way as before PS> . .\T-AddChecksum.ps1 #dot source transletPS> . .\Get-DirAsXml.ps1 #dot source Get-DirAsXml PS> Get-DirAsXml D:\powershell\test -props @{Length=""} | T-AddChecksum Here is the code T-AddChecksumCObject.zip (1.08 kb)
This translet will add a Ratio attribute to a daxml file. It is very useful to find out where all of the space is taken up in folders and if your downstream application is SVG or WPF then Ratio can be used in a lot of places. function T-AddRatio{    param ($inxml)    begin{        . PSlib:\xml\invoke-transform.ps1        [xml]$xslt = @" <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">    <xsl:template match="node()|@*">        <xsl:copy>            <xsl:apply-templates select="@*|node()" />        </xsl:copy>    </xsl:template>    <xsl:template match="folder|file">        <xsl:copy>            <xsl:if test="parent::*/@Length">                <xsl:attribute name="Ratio">                    <xsl:value-of select="@Length div parent::*/@Length" />                </xsl:attribute>            </xsl:if>            <xsl:if test="@Base">                <xsl:attribute name="Ratio">1</xsl:attribute>            </xsl:if>            <xsl:apply-templates select="@*|node()" />        </xsl:copy>    </xsl:template></xsl:stylesheet>"@    }    process{        if ($_ -is [xml]){            [xml](invoke-transform -inxml $_ -inxsl $xslt)        }    }    end{        if ($inxml -is [xml]){           [xml](invoke-transform -inxml $inxml -inxsl $xslt)        }    }} As you can see in the code it checks that the folder node has a @Length attribute. Length property is not added to folders by Get-DirAsXml so first use the T-DirLength translet to add a @Length attribute. It can be used like this PS> . .\Get-DirAsXml PS> . .\T-DirLength PS> . .\T-AddRatio PS> Get-DirAsXml .\test -props @{Length=""}|T-DirLength|T-AddRatio It might produce this <root Name="root" Root="True" Date="2008/12/03 11:15:51">    <folder Ratio="1" Length="19678" Name="test" Base="D:\powershell\blog\test" Parent="D:\powershell\blog">        <folder Ratio="0.5" Length="9839" Name="test2">            <file Ratio="0.3384490293729038" Name="test.ps1" Length="3330" />            <file Ratio="0.08496798455127553" Name="test.txt" Length="836" />            <file Ratio="0.1311108852525663" Name="test.zip" Length="1290" />            <file Ratio="0.4454721008232544" Name="tmp.xml" Length="4383" />        </folder>        <file Ratio="0.1692245146864519" Name="test.ps1" Length="3330" />        <file Ratio="0.04248399227563777" Name="test.txt" Length="836" />        <file Ratio="0.06555544262628315" Name="test.zip" Length="1290" />        <file Ratio="0.2227360504116272" Name="tmp.xml" Length="4383" />    </folder></root> Here is the code T-AddRatio.zip (671 b)
Here are ways to write translets that will do more than the just an Xslt transform. You might want to add an MD5 or SHA256 checksum to all the files within a tree. Xslt doesn't do this natively so you need to extend it. Here is one way it uses C# as the Xslt scripting language function T-AddChecksum{    param ($inxml)    begin{        . .\invoke-transform.ps1       [xml]$xslt = @' <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:ucjb="urn:cjb" version="1.0">    <msxsl:script language="C#" implements-prefix="ucjb"> public string GetChecksum(String file) { using (System.IO.FileStream stream = System.IO.File.OpenRead(file)) { System.Security.Cryptography.SHA256Managed sha = new System.Security.Cryptography.SHA256Managed(); byte[] checksum = sha.ComputeHash(stream); return System.BitConverter.ToString(checksum).Replace("-", System.String.Empty); } }    </msxsl:script>    <xsl:template match="node()|@*">        <xsl:copy>            <xsl:apply-templates select="@*|node()" />        </xsl:copy>    </xsl:template>    <xsl:template match="*[local-name()='file']">        <xsl:variable name="fname">            <xsl:call-template name="get-path" />        </xsl:variable>        <xsl:copy>            <xsl:apply-templates select="@*|node()" />            <xsl:attribute name="Checksum">                <xsl:value-of select="ucjb:GetChecksum(string($fname))" />            </xsl:attribute>        </xsl:copy>    </xsl:template>    <xsl:template name="get-path">        <xsl:for-each select="ancestor-or-self::*[not(@Root)]">            <xsl:value-of select="@Parent" />            <xsl:text>\</xsl:text>            <xsl:value-of select="@Name" />        </xsl:for-each>    </xsl:template></xsl:stylesheet>'@    }    process{        if ($_ -is [xml]){            [xml](invoke-transform -inxml $_ -inxsl $xslt)        }    }    end{        if ($inxml -is [xml]){            [xml](invoke-transform -inxml $inxml -inxsl $xslt)        }    }} It can be used like this PS> . .\T-AddChecksum.ps1 #dot source transletPS> . .\Get-DirAsXml.ps1 #dot source Get-DirAsXml PS> Get-DirAsXml D:\powershell\test -props @{Length=""} | T-AddChecksum or PS> T-AddChecksum [xml](gc .\tmp.xml ) and might produce <root Name="root" Root="True" Date="2009/11/03 05:45:37">    <folder Name="test" Base="D:\powershell\test">        <file Name="test.txt" Length="836" Checksum="0D7439F5894B4E8EFEC8FB409635D0D8EA7A450E902F6B30B335907B5867DF16" />        <file Name="test.ps1" Length="3330" Checksum="C47313D06C6AADA288AF6D61E03EFD7FA7C52DD73AB097E9D556535D330798D3" />        <file Name="test.zip" Length="1290" Checksum="7F2CCA02F17FF0E9458C0777C659D6D00B80F1C9D2921AEC971AE9A82D296AA5" />        <file Name="tmp.xml" Length="4383" Checksum="1351245F9834D0406C42DD5AF622FCA691A9A36F440A7C88F389927800292303" />    </folder></root> Here is the code T-AddChecksumCScript.zip (1.03 kb)
These are small Xslt transforms that modify some Xml in the powershell pipeline. Here is a simple one. It adds a Length attribute to daxml folder nodes. function T-DirLength{    param ($inxml)    begin{        . PSlib:\xml\invoke-transform.ps1        [xml]$xslt = @" <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">    <xsl:template match="node()|@*">        <xsl:copy>            <xsl:apply-templates select="@*|node()" />        </xsl:copy>    </xsl:template>    <xsl:template match="folder">        <xsl:copy>            <xsl:attribute name="Length">                <xsl:value-of select="sum(.//file/@Length)" />            </xsl:attribute>            <xsl:apply-templates select="@*|node()" />        </xsl:copy>    </xsl:template></xsl:stylesheet>"@    process{        if ($_ -is [xml]){            [xml](invoke-transform -inxml $_ -inxsl $xslt)        }    }    end{        if ($inxml -is [xml]){           [xml](invoke-transform -inxml $inxml -inxsl $xslt)        }    }} It does an Xslt identity transform on the Xml except for the 'folder' nodes to which it adds a Length attribute which is the sum of all the file/@Length attributes below the folder. Folders do not have a Length property so do not get a Length attribute even if you specify -props @{Length=""}. You could write a get-dirasxml custom props script to do this but getting that value at shell level is slow. This is much faster. It can be used like this PS> . .\T-DirLength.ps1 #dot source the translet filePS> . .\Get-DirAsXml.ps1 #dot source the Get-DirAsXml filePS> Get-DirAsXml D:\powershell\test -props @{Length=""} | T-DirLength and might produce <root Name="root" Root="True" Date="2008/11/03 09:55:40">    <folder Length="9839" Name="test" Base="D:\powershell\test">        <file Name="test.txt" Length="836" />        <file Name="test.ps1" Length="3330" />        <file Name="test.zip" Length="1290" />        <file Name="tmp.xml" Length="4383" />    </folder></root> Here is the code T-DirLength.zip (635 b)
A more powerful DirToXml. Don't go to sleep yet because this might get a bit more interesting as it goes on. I was learning powershell and decided to work on something that I knew well. Xml and one of the popular downloads from the Xml Xsl Portal, DirToXml. So here is a simple Powershell version dir2xml.zip (1.26 kb)USAGE Get-DirAsXml -Indir <pathToDirectory> SYNOPSIS Formats the directory tree as XML. PARAMETERS -Indir <string[]> The directories to be processed -Outfile <string> Optional output file. -Rootname <string> Optional name of the root node. -Props <object> Optional hash table of file properties to include in the xml attributes. EXAMPLES Get-DirAsXml c:\temp Get-DirAsXml c:\temp xmlFile.xml Get-DirAsXml d:\week\log , z:\arch\log , c:\today\log allSystemLogfiles.xml    gci c:\temp | where {$_ -like '*xml*'} | Get-DirAsXml Get-DirAsXml c:\temp -p @{CreationTime="yyy-MM-dd";                           LastAccessTime="yyy-MM-dd"}    Get-DirAsXml hkcu:\software\microsoft It is a fairly straightforward replacement for DirToXml and it can handle multiple input folders.This is very useful if you have groups of files in many different places that you want to put into a supergroup. MP3 files! I have them scattered all over the place C:\music D:\library\music USB:\music CD:\music \\remoteshare\c$\music The -props parameter takes a hash of name=value pairs seperated by ';'. The value is used for formatting so  CreationTime="yyy-MM-dd" will produce CreationTime="2000-01-01". An empty value will just passthrough the property to the attribute so Length="" will produce Length="101". I started looking at file properties because I wanted more information about media files. You can get a lot of media information from the shell that isn't shown in the 'file' file properties. Things like Author, Dimensions, Camera , Audio Compresion etc. Anything you can see when you rightclick on a file and choose Properties. Some of these are the same as the normal file properties i.e. CreationDate but there are a few more that are useful. I added an ExtendedProps parameter. Another set of property information is the file version information. This contains things like ProductName, ProductVersion, FileBuildPart and refer to executable and dll properties. I added a FileVersionInfo parameter. That is nearly all the bases covered for properties but there are others. EXIF information for image files might be useful if you want the xml to be a base for a photo library. Some of this is available in ExtendedProperties as it is read by the shell but not all. Codec information for video files is useful if the xml will be used in a video library. GSpot is a very useful tool and can be used to output an attributes file for each of the files in a folder. So I added a CustomProps parameter which is a scriptblock that is used like PS> Get-DirAsXml C:\Video -o videolibrary.xml -CustomProps {    param ($element, $directory, $file, $prefix, $namespace)        # read a file on disk and        # calculate $calculatedvalue here        [void]$element.SetAttribute("mycalculatedattribute"            , $namespace            , $calculatedvalue)} I sometimes use naming to indicate the type of file i.e. xxxxxxx.detail.xml This is my details file and it is also an xml file. FileType will tell you is is an XML Document but i want to know that it is a details file. Here is a handy CustomProps script to get that and it is used the same as above. {param ($element, $directory, $file, $prefix, $namespace)    $type = ""    switch -regex ($file.Name){        ".*.detail.xml$" {$type="Detail"}        ".*.html$" {$type="Html"}        ".*.playlist.xml$" {$type="Playlist"}        ".*.summary.xml$" {$type="Summary"}        ".*.ttaf.xml$" {$type="TTAF"}        ".*.smi$" {$type="Subtitle"}        ".*.srt$" {$type="Subtitle"}        ".*.wmv$" {$type="Movie"}    }    [void]$element.SetAttribute("BBCType", $namespace, $type)} Sometimes people name files as MovieX.TS.XVID.AC3.Pirate.avi which indicates the codecs used. Getting this information from the name using a custom script is quicker than using GSpot or ExtendedProps and the shell. With all of these properties coming from different places and with similar names I decided to add namespaces for attributes so I added -EPNamespace -EPPrefix, -VINamespace -VIPrefix, -CPNamespace -CPPrefix. You don't have to use them but if you are getting name clashes then they come in handy. I wanted to know how this xml file was created and what parameters were used so i added an -IncludeMeta parameter. This is useful if you want to keep track of a lot of removable media. The USB stick can have a copy of Get-DirAsXml and autorun it when it is plugged in and save it to a central repository. Or you can run Get-DirAsXml manualy from a central repository each time a USB stick is connected or changed. That way you can keep track of all your removable media. A friend has a whopping USB drive full of movies. I keep putting new stuff there, she watches it and deletes it. By doing xml diffs of her disk with the central repsitory I know what she has, what she has already seen and what I have just given her. It is a sort of sneakernet syncronisation using Get-DirAsXml and a USB stick. Get-DirAsXml.zip (3.16 kb)