biopandas version: 0.4.1
PandasMmcif
PandasMmcif(use_auth: bool = True)
None
Methods
amino3to1(record: str = 'ATOM', residue_col: str = 'auth_comp_id', residue_number_col: str = 'auth_seq_id', chain_col: str = 'auth_asym_id', fillna: str = '?')
Creates 1-letter amino acid codes from DataFrame
Non-canonical amino-acids are converted as follows:
ASH (protonated ASP) => D
CYX (disulfide-bonded CYS) => C
GLH (protonated GLU) => E
HID/HIE/HIP (different protonation states of HIS) = H
HYP (hydroxyproline) => P
MSE (selenomethionine) => M
Parameters
-
record
: str, default: 'ATOM'Specfies the record DataFrame.
-
residue_col
: str, default: 'residue_name'Column in
record
DataFrame to look for 3-letter amino acid codes for the conversion. -
fillna
: str, default: '?'Placeholder string to use for unknown amino acids.
Returns
-
pandas.DataFrame
: Pandas DataFrame object consisting of two columns,'chain_id'
and'residue_name'
, where the former contains the chain ID of the amino acid and the latter contains the 1-letter amino acid code, respectively.
distance(xyz=(0.0, 0.0, 0.0), records=('ATOM', 'HETATM'))
Computes Euclidean distance between atoms and a 3D point.
Parameters
-
xyz
: tuple, default: (0.00, 0.00, 0.00)X, Y, and Z coordinate of the reference center for the distance computation.
-
records
: iterable, default: ('ATOM', 'HETATM')Specify which record sections to consider. For example, to consider both protein and ligand atoms, set
records=('ATOM', 'HETATM')
. This setting is ignored ifdf
is not set to None. For downward compatibility, a string argument is still supported but deprecated and will be removed in future versions.
Returns
-
pandas.Series
: Pandas Series object containing the Euclideandistance between the atoms in the record section and
xyz
.
distance_df(df, xyz=(0.0, 0.0, 0.0))
Computes Euclidean distance between atoms and a 3D point.
Parameters
-
df
: DataFrameDataFrame containing entries in the
PandasPdb.df['ATOM']
orPandasPdb.df['HETATM']
format for the the distance computation to thexyz
reference coordinates. -
xyz
: tuple, default: (0.00, 0.00, 0.00)X, Y, and Z coordinate of the reference center for the distance computation.
Returns
-
pandas.Series
: Pandas Series object containing the Euclideandistance between the atoms in the record section and
xyz
.
fetch_mmcif(pdb_code: Optional[str] = None, uniprot_id: Optional[str] = None, source: str = 'pdb')
Fetches mmCIF file contents from the Protein Databank at rcsb.org or AlphaFold database at https://alphafold.ebi.ac.uk/. .
Parameters
-
pdb_code
: str, optionalA 4-letter PDB code, e.g.,
"3eiy"
to retrieve structures from the PDB. Defaults toNone
. -
uniprot_id
: str, optionalA UniProt Identifier, e.g.,
"Q5VSL9"
to retrieve structures from the AF2 database. Defaults toNone
. -
source
: strThe source to retrieve the structure from (
"pdb"
,"alphafold2-v1"
or"alphafold2-v2"
). Defaults to"pdb"
.
Returns
self
get(s, df=None, invert=False, records=('ATOM', 'HETATM'))
Filter PDB DataFrames by properties
Parameters
-
s
: str in {'main chain', 'hydrogen', 'c-alpha', 'heavy'}String to specify which entries to return.
-
df
: pandas.DataFrame, default: NoneOptional DataFrame to perform the filter operation on. If df=None, filters on self.df['ATOM'].
-
invert
: bool, default: TrueInverts the search query. For example if s='hydrogen' and invert=True, all but hydrogen entries are returned.
-
records
: iterable, default: ('ATOM', 'HETATM')Specify which record sections to consider. For example, to consider both protein and ligand atoms, set
records=('ATOM', 'HETATM')
. This setting is ignored ifdf
is not set to None. For downward compatibility, a string argument is still supported but deprecated and will be removed in future versions.
Returns
-
df
: pandas.DataFrameReturns a DataFrame view on the filtered entries.
read_mmcif(path)
Read MMCIF files (unzipped or gzipped) from local drive
Attributes
-
path
: strPath to the MMCIF file in .cif format or gzipped format (.cif.gz).
Returns
self
read_mmcif_from_list(mmcif_lines)
Reads mmCIF file from a list into DataFrames
Attributes
-
pdb_lines
: listA list of lines containing the mmCIF file contents.
Returns
self
rmsd(df1, df2, s=None, invert=False)
Compute the Root Mean Square Deviation between molecules.
Parameters
-
df1
: pandas.DataFrameDataFrame with HETATM, ATOM, and/or ANISOU entries.
-
df2
: pandas.DataFrameSecond DataFrame for RMSD computation against df1. Must have the same number of entries as df1.
-
s
: {'main chain', 'hydrogen', 'c-alpha', 'heavy', 'carbon'} or None,default: None String to specify which entries to consider. If None, considers all atoms for comparison.
-
invert
: bool, default: FalseInverts the string query if true. For example, the setting
s='hydrogen', invert=True
computes the RMSD based on all but hydrogen atoms.
Returns
-
rmsd
: floatRoot Mean Square Deviation between df1 and df2
Properties
df
Acccess dictionary of pandas DataFrames for PDB record sections.